Similarity Processing in Multi-Observation Data

... observations. In contrast to single-observation data, where each object is assigned to exactly one occurrence, multi-observation data is based on several occurrences that are subject to two key properties: temporal variability and uncertainty. When defining similarity between data objects, these pro ...

Discovering Co-location Patterns from Spatial Datasets

Mining asynchronous periodic patterns in time series data

Periodic Pattern Mining – Algorithms and

... time series is scanned for the second time. During the second scan of the time series each period segment is intersected with the maxpattern. The result of the intersection called max sub-pattern is either inserted into the tree if it is not already present or its corresponding node count is increme ...

Malicious URL Detection by Dynamically Mining Patterns without Pre-defined Elements ? Da Huang

... 2. In real applications, the training data used to train the detection model may be biased and contain some noises. In such cases, the human intervention is very important. The human interpretable URL patterns can be easily modified and adapted by network security experts, which can highly prompt th ...

29 Trajectory Data Mining: An Overview

... data mining. However, we are lack of a systematic review that can well shape the field and position existing research. Facing a huge volume of publications, the community is still not very clear about the connections, correlations and difference among these existing techniques. To this end, we condu ...

Trajectory Data Mining: An Overview

Spatio-Temporal Data Mining with Event Logs from High Volume

... encouragement, but also for the hard question which incentivized me to widen my research from various perspectives. This work was performed as part of the DAIPEX project grand funded by Dinalog. ...

PDF

Fuzzy Association Rules

fingerprinting malicious ip traffic - Spectrum

... from dynamic malware analysis, together with benign traces, obtained from trust public repository such as Wisnet [11] and a private corporate network. These features are used by classification ...

Information-theoretic graph mining - PuSH

... brain regions. A graph is a powerful concept to model arbitrary (structural) relationships among objects. In recent years, the prevalence of social networks has made graph mining an important center of attention in the data mining field. There are many important tasks in graph mining, such as graph ...

(PPT, 739KB)

K - Department of Computer Science

Case Studies in Data Mining

... 1. K-means method ...................................................................................................... 2. K-medoids method .................................................................................................. 3. The DBSCAN method ....................................... ...

Zhiyuan Yao Visual Customer Segmentation and Behavior

... The importance of customer relationship management (CRM), a management principle for transforming organizations from being product-oriented to customer centric, has attracted interest from both academia and industry. Today, customers’ behaviors and activities can be easily recorded and stored throug ...

Oracle Data Mining Concepts

... you find any errors, please report them to us in writing. If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, the following notice is applicable: U.S. GOVERNMENT END USERS: Oracle programs, including any opera ...

Data Transformation For Privacy

... The sharing of data is often beneficial in data mining applications. It has been proven useful to support both decision-making processes and to promote social goals. However, the sharing of data has also raised a number of ethical issues. Some such issues include those of privacy, data security, and ...

Chapter 6 A SURVEY OF TEXT CLASSIFICATION

nipals

For Review Only - Universidad de Granada

... • Interpretability: Clarity and credibility, from the human point of view, of the classification model. • Learning time: Time required by the machine learning algorithm to build the classification model. • Robustness: Minimum number of examples needed to obtain a precise and reliable classification ...

Cache Manager Cache Performance Cache

Temporal Patterns Discovery from Multivariate Time Series via

... resolution, we believe that the mining of interval-based abstractions might be potentially more fruitful and the results more expressive (and explainable to a domain expert) than the analysis of a much larger, noisy, set of raw data. The framework that is at the focus of this paper is presented in t ...

Subgroup Discovery Algorithms: A Survey and Empirical Evaluation

... search is faster as compared with other search strategies, it only explores part of the search space and does not guarantee an optimal solution. A number of beam search based algorithms have been developed till now. The most popular ones are described below. SubgroupMiner[25] . This subgroup discove ...

Ensembles for Unsupervised Outlier Detection: Challenges

... also benefit from insights in the area of subspace outlier detection. Complementary to Aggarwal [2], we would like to discuss here the specific challenges, the first steps taken so far in the literature, and overall the important questions in research regarding ensembles for outlier detection. Trans ...

< 1 2 3 4 5 6 ... 169 >

K-means clustering

k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.The problem is computationally difficult (NP-hard); however, there are efficient heuristic algorithms that are commonly employed and converge quickly to a local optimum. These are usually similar to the expectation-maximization algorithm for mixtures of Gaussian distributions via an iterative refinement approach employed by both algorithms. Additionally, they both use cluster centers to model the data; however, k-means clustering tends to find clusters of comparable spatial extent, while the expectation-maximization mechanism allows clusters to have different shapes.The algorithm has a loose relationship to the k-nearest neighbor classifier, a popular machine learning technique for classification that is often confused with k-means because of the k in the name. One can apply the 1-nearest neighbor classifier on the cluster centers obtained by k-means to classify new data into the existing clusters. This is known as nearest centroid classifier or Rocchio algorithm.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

K-means clustering