Infinite Ensemble for Image Clustering

... image retrieval. Conventional image clustering methods use handcraft visual descriptors as basic features via K-means, or build the graph within spectral clustering. Recently, representation learning with deep structure shows appealing performance in unsupervised feature pre-treatment. However, few ...

Chemoinformatics and Drug Discovery Xu and Hagler

abstract - Molecular Diversity Preservation International

Semantically-grounded construction of centroids for datasets with

Comparative Study of Gaussian and Nearest Mean Classifiers for

... error function and the accuracy of the SVM is very high, but the degree of misclassification of legitimate e-mails is high. In order to solve that problem, a method of spam filtering based on weighted support vector machines. Experimental results show that the algorithm can enhance the filtering per ...

Detecting Outliers in High-Dimensional Datasets with Mixed Attributes

... has the problem of obtaining the suitable model for each particular dataset and application [6]. Distance-based approaches (e.g. [7]) essentially compute distances among data points, thus become quickly impractical for large datasets (e.g., a nearest neighbor method has quadratic complexity with res ...

Web Mining for Personalization: A Survey in the Fuzzy Framework

... There is another work which also supports the usage of fuzzy clustering for web usage mining. Neelam Sain and Sitendra Tamrakar in their work in ― A Survey of Web Usage Mining based on Fuzzy Clustering and HMM‖ [10] paper presents a survey of over 34 research papers dealing with Web usage Mining tec ...

Progress Report on “Big Data Mining”

... 2.1.2 Clustering Clustering is the process in which data objects are grouped together in classes based on some measure of similarity. It is different from classification because data classes are not known. It is also known as unsupervised learning and is used to discover hidden structure in datasets ...

A Hash Based Frequent Itemset Mining using Rehashing

... Linear probing, in which the interval between probes is fixed (usually 1)  Quadratic probing, in which the interval between probes is increased by adding the successive outputs of a quadratic polynomial to the starting value given by the original hash computation  Double hashing, in which the inte ...

Agents and Data Mining Interaction - CS

Periodicity Detection in Time Series Databases

... distance-based algorithm only considers the adjacent interarrival times 4, 1, 2, and 3 as candidate periods, which clearly do not include the value 5. Should it be extended to include all possible interarrivals, the complexity of a distance-based algorithm [24], [19] would increase to Oðn2 Þ. Althou ...

Minimizing Spurious Patterns Using Association Rule Mining

... could be some commodities belonging to the same price level. Thus, it could be said that in a shopping mall there is a wide range of commodities belonging to different support levels but few of them may belong to the same support level. In such data sets if we use conventional clustering algorithms ...

Evaluating data mining algorithms using molecular dynamics

... well-known data mining toolkit Weka alone offers 65 different classification algorithms, each equipped with different configuration options (Hall et al., 2009). Facing the challenge of selecting a few algorithms with the potential for yielding good results, we decided to conduct a comprehensive set ...

Improvements on Graph- based Clustering Methods

A Distributed Approach to Extract High Utility Itemsets from XML Data

... Two tree structures, called utility-based WAS tree (UWAStree) and incremental UWAS-tree (IUWAS-tree) proposed for mining WASs in static and incremental databases. III. PROBLEM DEFINITION This work is best explained by using weblog database. In World Wide Web (WWW) and online services, if a user wish ...

ppt - Computer Science

... Stability of Feature Selection: the insensitivity of the result of a feature selection algorithm to variations to the training set. ...

Research Proposal - University of South Australia

... This algorithm was defined by Charu C. Aggarwal and Philip S. Yu in 2001 (Aggarwal et al. 2001). Charu Aggarwal has written extensively on the topic of data mining under high dimensionality since the year 2000. This algorithm is the earliest subspace outlier detection algorithm the author of this pr ...

PEBL: Web Page Classification without Negative

Parallel and Distributed Data Mining: An Introduction

Document

... – Attribute of one of the dimensions – Derived from the measures & attributes # of variables is the data set's dimensionality (not to be confused with dimensions of the original fact table) Copyright © Ellis Cohen 2002-2005 ...

IJSRSET Paper Word Template in A4 Page Size

... in D. Note that, at this point, the information we have is based solely on the proportions of tuples of each class. Info(D) is also known as the entropy of D. Now, suppose we were to partition the tuples in D on some attribute A having v distinct values, fa1, a2…av, as observed from the training dat ...

Document

... how to divide the records. We therefore work with the rids. As we partition the list of the splitting attribute (i.e. Age), we insert the rids of each record into a probe structure (hash table), noting to which child the record was moved. Once we have collected all the rids, we scan the lists of the ...

Artificial Intelligence for Engineering Design, Analysis

PROBABILISTIC CLUSTERING ALGORITHMS FOR FUZZY RULES

... output response of each hierarchical fuzzy model. The original image can be described as the aggregation (equation (4)) of these three clusters surfaces. So, the use of the FCAFR algorithm makes the stratification of the early flat fuzzy system into a PCS structure. The membership values of the fuzz ...

Frequency-aware Similarity Measures - Hasso-Plattner

... their work, we partition data according to frequencies and not based on different sources of the data. Moreover, we employ a set of similar matchers, i.e., we learn one similarity function for each of the partitions – but all of them with the same machine learning technique. Another idea is to use a ...

< 1 ... 15 16 17 18 19 20 21 22 23 ... 169 >

K-means clustering

k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.The problem is computationally difficult (NP-hard); however, there are efficient heuristic algorithms that are commonly employed and converge quickly to a local optimum. These are usually similar to the expectation-maximization algorithm for mixtures of Gaussian distributions via an iterative refinement approach employed by both algorithms. Additionally, they both use cluster centers to model the data; however, k-means clustering tends to find clusters of comparable spatial extent, while the expectation-maximization mechanism allows clusters to have different shapes.The algorithm has a loose relationship to the k-nearest neighbor classifier, a popular machine learning technique for classification that is often confused with k-means because of the k in the name. One can apply the 1-nearest neighbor classifier on the cluster centers obtained by k-means to classify new data into the existing clusters. This is known as nearest centroid classifier or Rocchio algorithm.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

K-means clustering