Feature Selection: An Ever Evolving Frontier in Data Mining

A Fuzzy FCA-based Approach to Conceptual Clustering for Automatic Generation of Concept Hierarchy on Uncertainty Data

... However, there are many situations in which uncertainty information also occurs. For example, keywords extracted from scientific documents can be used to infer the corresponding research areas, however, it is inappropriate to treat all keywords equally as some keywords may be more significant than o ...

ATLaS: A Native Extension of SQL for Data Mining

... used in Datalog [2], and a stream-oriented computation will 4 Table Functions and Recursive UDAs be discussed in the next section. Example 5 illustrates the typical structure of an ATLaS Recursive queries can be supported in ATLaS without program. The declaration of the table dgraph(start, end) is a ...

A Study of Pattern Prediction in the Monitoring Data of Earthen

... Abstract: An understanding of the changes of the rammed earth temperature of earthen ruins is important for protection of such ruins. To predict the rammed earth temperature pattern using the air temperature pattern of the monitoring data of earthen ruins, a pattern prediction method based on intere ...

Understanding the Crucial Role of Attribute Interaction in Data Mining

Segmentation, Classification, and Clustering of Temporal Data

... a sequence of data points, measured at successive points in time and spaced at uniform time intervals. This thesis is concerned with time series mining, including segmentation, classification, and clustering of temporal data. Many algorithms for these tasks depend upon pairwise (dis)similarity compa ...

An adaptive modular approach to the mining of sensor

... Variance preserving maximization ...

Aggregating Time Partitions - Reality Commons

... Segmentation of multidimensional categorical data: The segmentation aggregation framework gives a natural way of segmenting multidimensional categorical data. Although the problem of segmenting multidimensional numerical data is rather natural, the segmentation problem of multidimensional categorica ...

City Research Online

Preserving Privacy for Interesting Location Pattern Mining from

... There have been many research work on preserving privacy for sensitive information in statistical databases [2] (and the references therein) and privacy preserving knowledge discovery techniques [4] (and the references therein). Decision tree classifier is the most commonly used data mining algorith ...

clinical datasets Discretization of continuous

Fast and accurate text classification via multiple linear discriminant

... which separates the positive and negative instances well. In an early study by Schütze, Hull and Pedersen [19], even single linear discriminants compared favorably with neural networks for the document routing problem. Lewis and others [11] reported accurate prediction using a variety of regression ...

Aggregating Time Partitions

... partitioning a multidimensional sequence into segments such that each segment demonstrates low diversity along the different dimensions. Different segmentation algorithms have been applied to good effect on this problem. However, these algorithms either assume different generative models for the hap ...

Efficient Frequent Pattern Mining

... it can be efficiently solved in the specific context of itemsets and association rules. The original motivation for searching association rules came from the need to analyze so called supermarket transaction data, that is, to examine customer behavior in terms of the purchased products. Association ...

[Mamoulis 2004] Mining, indexing, and querying historical

... Our work is related to two research problems. The first is data mining in spatiotemporal and time-series databases. The second is management of spatiotemporal data. Previous work on spatiotemporal data mining focuses on two types of patterns: (i) frequent movements of objects over time and (ii) evol ...

Mining, Indexing, and Querying Historical Spatiotemporal Data

... Our work is related to two research problems. The first is data mining in spatiotemporal and time-series databases. The second is management of spatiotemporal data. Previous work on spatiotemporal data mining focuses on two types of patterns: (i) frequent movements of objects over time and (ii) evol ...

Privacy Preserving Association Rule Mining by Concept of

... Oliveira and Zaane [3] were the first to present multi rule hiding methodologies. The proposed methods are effective and require two scans of the dataset. In the first scan, an index is made to accelerate the methodology of recognizing the sensitive transactions. In the second scan, the algorithms s ...

Parallel Data Mining for Association Rules on Shared

... can clearly see that this method suffers from a load imbalance problem. Interleaved partitioning: A better way is to do an interleaved partition, which results in the assignment A0 = 0, 3, 6, 9, A1 = 1, 4, 7 and A2 = 2, 5, 8. The workload is now given as W0 = 9 + 6 + 3 = 18, W1 = 8 + 5 + 2 = 15 and ...

Microsoft PowerPoint - AIiFE4-DataMining [tryb

... Naive Bayesian Classification 1) Let D be a training set of tuples associated class labels, C1, C2,…, Cm, where each tuple is X= (x1, x2,…, xn) 2) Given a tuple X, the classifier will predict that X belongs to the class having the highest posterior probability conditioned on X. ...

Aug 11, Chicago, IL, USA - Exploratory Data Analysis

... used to learn a model. The learned model is then applied on the test dataset in order to classify unlabeled records into normal and anomalous records. The second learning approach is semi-supervised, where the algorithm models the normal records only. Records that do not comply with this model are l ...

CLUEBOX: A Performance Log Analyzer for Automated Troubleshooting S. Ratna Sandeep

A review of feature selection methods with applications

SimpliFly: A Methodology for Simplification and

6.034 Artificial Intelligence. Copyright © 2004 by Massachusetts

... So far, we have talked a lot about building systems that have knowledge represented in them explicitly. One way to acquire that knowledge is to build it in by hand. But that can be timeconsuming and error prone. And many times, humans just don't have the relevant information available to them. For i ...

Pattern Based Sequence Classification*

... assigning class labels to new sequences based on the knowledge gained in the training stage. There exist a number of studies integrating pattern mining techniques and classification, such as classification based on association rules (CBA) [2], sequential pattern based sequence classifier [3], the Cl ...

< 1 ... 6 7 8 9 10 11 12 13 14 ... 169 >

K-means clustering

k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.The problem is computationally difficult (NP-hard); however, there are efficient heuristic algorithms that are commonly employed and converge quickly to a local optimum. These are usually similar to the expectation-maximization algorithm for mixtures of Gaussian distributions via an iterative refinement approach employed by both algorithms. Additionally, they both use cluster centers to model the data; however, k-means clustering tends to find clusters of comparable spatial extent, while the expectation-maximization mechanism allows clusters to have different shapes.The algorithm has a loose relationship to the k-nearest neighbor classifier, a popular machine learning technique for classification that is often confused with k-means because of the k in the name. One can apply the 1-nearest neighbor classifier on the cluster centers obtained by k-means to classify new data into the existing clusters. This is known as nearest centroid classifier or Rocchio algorithm.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

K-means clustering