pdf 167K

... is additionally supposed to profit from ongoing trends in hardware development. E.g. [13] suggests that the transfer rates of disk drives continue to improve much faster than the rotational delay time. As a consequence the optimum page size with respect to I/O will even increase. As we will show in ...

Applied Data Mining for Business Intelligence

Learning temporal relations in smart home data.

... problem of learning temporal relations between time intervals in smart home data, which includes physical activities (such as taking pills while at home) and instrumental activities (such as turning on lamps and electronic devices). The purpose of this work is to identify interesting temporal patter ...

On the Necessary and Sufficient Conditions of a Meaningful

... an adaptive proximity function of two data points In fact, the important issue is how to design a meanon individual attributes separately. The proximity ingful distance (or proximity) function for high dimenof two points is the aggregation of each attributive sional data space. The authors in [1] pr ...

data stream mining algorithms – a review of issues and existing

10101002

An Intergrated Data Mining and Survival Analysis Model for

... (2) Data preparation and Data extraction. Extracting necessary customer behavior attributes from data warehouse. (3) Clustering. Using data mining clustering methods such as K-means to cluster customers based on similar hazard/survival possibility. (4) Appending a new attribute: Cluster Number. Afte ...

R u t c o r Research Logical Analysis of Multi-Class

... entry stores the differentiability rate of class Ci from class Cj , that is, the ratio of observations in Ci covered by patterns in M which do not cover (or only cover a small proportion of) observations in Cj . The authors of [47] observed that their second approach produces less accurate classific ...

IOSR Journal of Electronics and Communication Engineering (IOSR-JECE)

... brief statistics Diabetes data set on various classification algorithms like SVM, KNN, ID3, CART and C5.0 to classify the diabetes data. They have compared the classification accuracy of these models. SVM gives best classification accuracy as 81.77% compare to others. 2.3 Classwise K Nearest Neighbo ...

Generalized k-means based clustering for temporal data under

... an honor to be her Ph.D. student. It is not often that one finds an adviser and colleague that always finds the time for listening to the little problems and roadblocks that unavoidably crop up in the course of performing research. Her technical and editorial advice was essential to the completion o ...

Data Mining - PhD in Information Engineering

... (produces exactly the same predictions) If x ≤ 1.2 then class = b If x > 1.2 and y ≤ 2.6 then class = b ...

Finding Frequent Items in Data Streams

Optimization-based Data Mining Techniques with Applications

Parallel H-Tree Based Data Cubing on Graphics Processors

Extending the Data Mining Software Packages SAS Enterprise

A Survey on Pre-processing and Post-processing

... represents missing value for xp attribute. An instance may contain several missing values. Machine Learning (ML) approaches are not designed to deal with missing values and also produce incorrect results if implemented with this drawback. Before applying machine learning approach, it is essential ei ...

KODAMA - Application Note - accepted - Spiral

Mining Frequent Spatio-Temporal Patterns from

... performed including frequency of events for natural periods like weeks and hours. Also the percentage of users according to the number of events generated and the number of days that have events during the period of data collection was analyzed. The figure 1 shows different information of the events ...

Unsupervised Learning: Clustering

... Hard vs Fuzzy ...

ALADIN: Active Learning of Anomalies to Detect Intrusion

MapReduce-Based Pattern Finding Algorithm

... framework to speed up pattern finding and avoid running-out-of memory in a PC-cluster environment. We design a MapReduce-based Pattern Finding algorithm (MRPF) that provides good efficiency and scalability. We also apply it in prescription network and successfully find some commonly used prescriptio ...

Feature Selection with Integrated Relevance and Redundancy

Frequent Itemset Mining Technique in Data Mining

Interactive textual feature selection for consensus

Paper Topics - NDSU Computer Science

< 1 ... 22 23 24 25 26 27 28 29 30 ... 169 >

K-means clustering

k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.The problem is computationally difficult (NP-hard); however, there are efficient heuristic algorithms that are commonly employed and converge quickly to a local optimum. These are usually similar to the expectation-maximization algorithm for mixtures of Gaussian distributions via an iterative refinement approach employed by both algorithms. Additionally, they both use cluster centers to model the data; however, k-means clustering tends to find clusters of comparable spatial extent, while the expectation-maximization mechanism allows clusters to have different shapes.The algorithm has a loose relationship to the k-nearest neighbor classifier, a popular machine learning technique for classification that is often confused with k-means because of the k in the name. One can apply the 1-nearest neighbor classifier on the cluster centers obtained by k-means to classify new data into the existing clusters. This is known as nearest centroid classifier or Rocchio algorithm.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

K-means clustering