Finding Non-Redundant, Statistically Signi cant Regions in

... as a pair (X, Y ), where X is a subset of data points, and Y is a subset of data attributes, so that the points in X are “close” when projected on the attributes in Y , but they are “not close” when projected on the remaining attributes. Projected clustering algorithms have an explicit or implicit m ...

Boosted Classification Trees and Class Probability/Quantile Estimation

... that AdaBoost is successful at estimating classification boundaries even though its scores do not estimate conditional class probabilities. Further, we show that the same is true of LogitBoost, despite the fact that estimation of conditional class probabilities is the motivation behind this algorith ...

estimating hash-tree sizes in concurrent processing of frequent

... To the best of our knowledge, apart from Apriori Common Counting, the only multiple-query processing method for data mining queries is Mine Merge [15], which is less predicable and generally oﬀers worse performance than Apriori Common Counting. As an introduction to multiple data mining query optimi ...

Time-series data mining

12 Time-Series Data Mining

... —Data representation. How can the fundamental shape characteristics of a time-series be represented? What invariance properties should the representation satisfy? A representation technique should derive the notion of shape by reducing the dimensionality of data while retaining its essential charact ...

KODAMA: an R package for knowledge discovery

A VISUALIZATION TOOL FOR FMRI DATA MINING by NICU

... looking at the intensity measured in each of the collected 3D images, they try to group voxels with similar behavior into distinct classes and then label them as task-related or not. Clustering, independent component analysis (ICA), principal component analysis (PCA) and neural networks are some of ...

3. Answering Cube Queries Using Statistics Trees

... Arbor software's Essbase [3], Oracle Express [27] and Pilot LightShip [28] are based on MOLAP technology. The latest trend is to combine ROLAP and MOLAP in order to take advantage of the best of both worlds. For example, in PARSIMONY, some of the operations within sparse chunks are relational while ...

A clustering-based visualization of spatial patterns

... mining techniques may be applied to extract spatial interesting patterns such as association rules [16] or emerging patterns [6] ; see also [3, 17]. On the other hand, [12] identified two approaches for colocation mining: transaction-based approaches and eventbased approaches. Transaction-based appr ...

Text Mining and PROC KDE to Rank Nominal Data

... For example, suppose there are fives codes (A,B,C,D,E) used in a logistic regression with the outcome variable of mortality. Then the regression equation can be written P=α0+ α1(if A is present)+ α2(if B is present)+ α3(if C is present)+ α4(if D is present)+ α5(if E is present) P is the predicted pr ...

Association Rule Mining

... • First, the set of frequent 1-itemsets is found by scanning the database to accumulate the count for each item, and collecting those items that satisfy minimum support. The resulting set is denoted L1. • Next, L1 is used to find L2, the set of frequent 2-itemsets, which is used to find L3, and so o ...

Cycle-Time Key Factor Identification and

... depend on this number or on the values of thresholds. Clearly, the fab can easily adapt them to its own needs to meet its common practices and unique constraints. Numerous approaches [23], [24], [26], [27] have previously been utilized for variable discretization, two of which are selected here for ...

Henock Woubishet Tefera - Addis Ababa University Institutional

... Special thanks go to Ato Ermias Alemu, who has always been there when I needed his help, and especially for his assistance with the data collection and preparation work. Ato Hailemelekot Mamo, from the Customer Loyalty Department, was very cooperative, and his ideas were invaluable. ...

DATA GUIDED DISCOVERY OF DYNAMIC CLIMATE DIPOLES

... Corollary 3.1. The graph GR achieves θ(N/K) reduction in the number of edges over GC . This is easy to see. Since every node in GR has at most 2 ∗ K neighbors, the number of edges in GR are θ(N ∗ K). The number of edges in GC are θ(N ∗ N ). Note that building the reciprocal graph is essential to eli ...

Sampling Large Databases for Association Rules

Orange4WS Environment for Service

... Discovery (SoKD) framework, and its implementation that address the challenges discussed earlier. Building such a framework has been recognized as an important aspect of third-generation data mining [1]. A practical implementation of the proposed third-generation knowledge discovery platform, named ...

Review and Comparison of Associative Classification

... frequency of the attribute value and its associated class in the training data set from the size of that data set. Whereas minimum confidence represents the frequency of the attribute value and its related class in the training data set from the frequency of that attributes value in that training da ...

Mining Sequential Patterns with Time Constraints

- IJSRSET

... have been identified. Ensemble learning is one of the ways to improve the classification accuracy. Ensemble methods are learning techniques that builds a set of classifiers and then classify new data sets on the basis of their weighted vote of predictions. The original ensemble method is Bayesian av ...

the Stream Mill Experience

... of tuples, (ii) the number of tuples belonging to each class, and (iii) for each (column, value) pair the number of tuples belonging to each class. Using the statistics so collected in this descriptive phase, we can now perform the predictive task of deciding whether ’Yes’ or ’No’ is more probable f ...

Topics in 0-1 Data

... attributes C serve as probes which are used to measure how similar the sets of rows are. Our algorithm is as follows. Compute distances d(A, B) for all pairs of attributes. (For a data set of n rows and p attributes, this can be done in time O(np2 ).) Again, find a partition of the set U of all attr ...

December 2010 January 2011 February 2011

... sectors of ICT are depending upon the brains of Indian youth. As we are entering in the second decade of the 21st century, the challenges of the Indian dreams are increasingly visible. Much of the global economy is depending upon the Indian software developers, yet there is always a possibility of t ...

IRM ThesisReport

A Comparative Study of Data Mining Algorithms for Image

Finding “Interesting” Trends in Social Networks Using Frequent

< 1 ... 14 15 16 17 18 19 20 21 22 ... 169 >

K-means clustering

k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.The problem is computationally difficult (NP-hard); however, there are efficient heuristic algorithms that are commonly employed and converge quickly to a local optimum. These are usually similar to the expectation-maximization algorithm for mixtures of Gaussian distributions via an iterative refinement approach employed by both algorithms. Additionally, they both use cluster centers to model the data; however, k-means clustering tends to find clusters of comparable spatial extent, while the expectation-maximization mechanism allows clusters to have different shapes.The algorithm has a loose relationship to the k-nearest neighbor classifier, a popular machine learning technique for classification that is often confused with k-means because of the k in the name. One can apply the 1-nearest neighbor classifier on the cluster centers obtained by k-means to classify new data into the existing clusters. This is known as nearest centroid classifier or Rocchio algorithm.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

K-means clustering