Protecting Individual Information Against Inference Attacks in Data

Exact Primitives for Time Series Data Mining

... 2.10 Comparison of the number of times ptolemaic bound prunes a distance computation to that of linear bound for various values of n and m . . . . . . . . . 34 2.11 (top) A segment of ECG with a query. (middle) All the twelve beats are detected. Plotting the z-normalized distance from the query to t ...

One Click Mining - Polo Club of Data Science

... a corresponding mining algorithms, and post-processing of results. The resulting high number of alternative formalizations of a single analysis task poses a substantial burden on creating and iteratively refining these workflows—especially for users that lack deep technical understanding of data min ...

Importance of Data Preprocessing For Improving Classification

... the way for another concept which is data warehouses. These data warehouses contain enormous amounts of data but huge amounts of data do not necessarily mean valuable information by itself. Data Mining is the extraction of valuable information from the patterns of data and turning it into useful kno ...

Research by Mangasarian,Street, Wolberg

arXiv:cs.DB/0112011 v2 5 Feb 2003

... itemset in potential bodies and heads. This can be done in a similar level-wise manner as in phase 1, based on the observation that if a head-set represents a confident rule for that itemset, then all of its subsets also represent confident rules [24]. For example, if the itemset {1, 2, 3, 4} is a f ...

Association Rule Mining

Mining SQL Injection and Cross Site Scripting

... vulnerabilities. Although these results were encouraging, our earlier work suffers from two major drawbacks—(1) though proposed static attributes are useful predictors, they are limited in terms of the prediction accuracy they can yield (the predictive capability of these attributes is dependent on ...

Mining Complex Data Streams - Journal of Advances in Information

Multivariate Approaches to Classification in Extragalactic

Workshop on Ubiquitous Data Mining

... location-based services for drivers. To the urban planner, the work can help to aggregate driver habits and can uncover alternative routes that could help alleviate traffic. Additionally, it also helps prioritize the maintenance of roads. Our work combines data mining techniques that discover global ...

ePub Institutional Repository

... Notwithstanding its importance, most of this prior research neglects the selection of which category to promote for the derived customer segment. The approach presented in the next section aims to support decision makers in this respect. Our analytical framework shares some common notions with the a ...

Complete Proceedings of the UDM-IJCAI 2013 as One File

... location-based services for drivers. To the urban planner, the work can help to aggregate driver habits and can uncover alternative routes that could help alleviate traffic. Additionally, it also helps prioritize the maintenance of roads. Our work combines data mining techniques that discover global ...

Document

... – Partition the data space by a grid → reduce the number of data objects by making a small error – Apply the wavelet-transformation to the reduced feature space – Find the connected components as clusters ...

seasonal probabilistic forecast of tropical cyclone activity in the north

A Novel Approach for Association Rule Mining using Pattern

... with length one i.e. L1 is obtained from transaction database D. Then the pattern table is derived by using items in L1. The frequency for each pattern from pattern table is counted as logical AND of pattern with transactions in the database which gives output as true. The patterns having frequency ...

Space-Partitioning-Based Bulk-Loading for the NSP-Tree in Non-ordered Discrete Data Spaces," DEXA 2008, September 1-5, Turin, Italy, Gang Quian, Hyun-Jeon Seik,Qiang Zhu,Sakti Pramanik.

... Bulk-loading has been an important research topic for multidimensional index structures. There are a number of bulk-loading algorithms proposed for multidimensional indexes in continuous data spaces (CDS), such as the R-tree [9]. One major category of such bulk-loading algorithms is based on sorting ...

Title A Data Mining and Optimization-based Real

... enhance the economy and energy efficiency of logistics [2, 3]. Reflecting the real-time traffic conditions throughout the city, the system can therefore constantly update the time-optimal routing plan during transportation so the goods can be delivered to the customers as soon as possible. The main ...

Clustering, Dimensionality Reduction, and Side

Scalable Techniques for Mining Causal ...

... useful in preventing erroneous decision making. We conclude that the notion of causal data mining is likely to be a fruitful area of research for the database community at large, and we discuss some possibilities for future work. Let us begin by briefly reviewing the past work involving the market ...

Fast Monte-Carlo Algorithms for Matrix Multiplication

... • They use the relationships between the available data in order to identify components of the underlying physical system generating the data. • Some assumptions on the relationships between the underlying components are necessary. • Very active area of research; some matrix decompositions are more ...

Chapter 2: Association Rules and Sequential Patterns

What is a support vector machine? William S Noble

SD-Map – A Fast Algorithm for Exhaustive Subgroup Discovery

R u t c o r Research A New Approach to Select

< 1 ... 18 19 20 21 22 23 24 25 26 ... 169 >

K-means clustering

k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.The problem is computationally difficult (NP-hard); however, there are efficient heuristic algorithms that are commonly employed and converge quickly to a local optimum. These are usually similar to the expectation-maximization algorithm for mixtures of Gaussian distributions via an iterative refinement approach employed by both algorithms. Additionally, they both use cluster centers to model the data; however, k-means clustering tends to find clusters of comparable spatial extent, while the expectation-maximization mechanism allows clusters to have different shapes.The algorithm has a loose relationship to the k-nearest neighbor classifier, a popular machine learning technique for classification that is often confused with k-means because of the k in the name. One can apply the 1-nearest neighbor classifier on the cluster centers obtained by k-means to classify new data into the existing clusters. This is known as nearest centroid classifier or Rocchio algorithm.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

K-means clustering