Cross-domain Text Classification using Wikipedia

... Cross-domain classification is related to transfer learning, where the knowledge acquired to accomplish a given task is used to tackle another learning task. In [28], the authors built a term covariance matrix using the auxiliary problem, to measure the co-occurrence between terms. The resulting ter ...

Vertical-format Based Frequent Pattern Mining

Life-and-Death Problem Solver in Go

... dead or unsettled. Alive means that the surrounded group does not need to be defended because it cannot be killed, i.e. it is unnecessary (indeed pointless) to play a stone in the surrounded area to secure (or to attack) the surrounded group. Unsettled is a situation where, if the owner of the surro ...

credit card fraud detection based on behavior mining

... K-means clustering algorithm Algorithm: k-means The k-means algorithm for partitioning where each cluster center Is represented by the mean value of the objects in the cluster. ...

Event correlation and data mining for event logs

On Demand Classification of Data Streams

Verifying and Mining Frequent Patterns from Large Windows over

KNN Classification and Regression using SAS

... 1. It is simple to implement. Theoretically, kNN algorithm is very simple to implement. The naive version of the algorithm is easy to implement. For every data point in the test sample, directly computing the desired distances to all stored vectors, and choose those shortest k examples among stored ...

Predicting Workers' Compensation Insurance Fraud Using SAS Enterprise Miner 5.1 and SAS Text Miner

... each case in a cluster. The smallest clusters should be examined first. Ideally, your organization will have domain experts who have some experience in fraud cases. Because the task of examining each case in each cluster can be overwhelming, you should have the domain experts describe cases that imp ...

A Breadth-First Algorithm for Mining Frequent Patterns from Event

Inferring taxonomic hierarchies from 0

... Complex structures in nature and in society are frequently modeled and managed with hierarchies. For engineers and scientists hierarchies are a tool used for abstraction and classification. For example, a software engineer uses hierarchical abstraction to build and manage complex computer programs. ...

OutRank: A GRAPH-BASED OUTLIER DETECTION FRAMEWORK

... According to Hawkins,6 outliers can be defined as follows: Definition 1. (Outlier) An outlier is an observation that deviates so much from other observations as to arouse suspicion that it was generated by a different mechanism. Most outlier detection schemes adopt Hawkin’s definition of outliers an ...

A Survey on Frequent Pattern Mining Methods

... a) Sharding: The database is divided into successive parts and stored on different computers. This distribution and division of data is called sharding. b) Parallel Counting: This step is to count the support values of all the items that appear in database. The input is one shard of database. In thi ...

Document

Massimo Poesio: Text Categorization and

... representation has aimed to describe knowledge independently of the personal and social context in which it is used, with the advantage that we can automate reasoning with such knowledge using mechanisms that also are context independent. This sounds good until you try it on a large scale and find o ...

A Three-Scan Mining Algorithm for High On

... Besides, many studies [1, 3, 5, 9] were proposed to dynamically mine association rules. An example for dynamical mining is to find the patterns for on-shelf products. However, a product may be put on shelf and taken off shelf multiple times in a store. If the entire database is considered for mining ...

isda.softcomputing.net

... when this item appears in the transaction database to the partition when this item no longer exists [2]. That is, the exhibition period is the time duration when the item is available to be purchased. Hence, these works cannot be effectively applied to a temporal transaction database, such as a publ ...

A Rule-Based Classification Algorithm for Uncertain Data

... missing attribute values. However, the problem studied in this paper is different from before. Instead of assuming part of the data has missing or noisy values, we allow the whole dataset to be uncertain. Furthermore, the uncertainty is not shown as missing or erroneous values but represented as unc ...

K-Nearest Neighbor Classification and Regression in SAS®

ADR-Miner - An Ant-Based Data Reduction Algorithm for Classification

A comparative study on principal component analysis and

... From the definition it can be mentioned that the support of an item is a statistical significance of an association rule. Suppose the support of an item is 0.1%, it means only 0.1 percent of the transaction contains purchasing of this item. The retailer will not pay much attention to such kind of it ...

IOSR Journal of Mathematics (IOSR-JM)

... The data to be compressed consist of N data vectors, from k -dimensions. Principal Component Analysis (PCA) searches for c k dimensional orthogonal vectors that can best be used to represent the data, where c  k . The original data set are projected onto a much smaller space, resulting in data comp ...

w - UTK-EECS

... • Select the set of variables such that the current iteration will make progress towards the minimum of W(α) – Use first order approximation, i.e., steepest direction d of descent which has only q non-zero elements ...

A Survey of Outlier Detection Methodologies.

... 1. Type 1 - Determine the outliers with no prior knowledge of the data. This is essentially a learning approach analogous to unsupervised clustering. The approach processes the data as a static distribution, pinpoints the most remote points, and ﬂags them as potential outliers. Type 1 assumes that e ...

Workload-Aware Anonymization Techniques for Large

... on the quality of the resulting data. While much of the previous literature has measured quality through simple one-size-fits-all measures, we argue that quality is best judged with respect to the workload for which the data will ultimately be used. This article provides a suite of anonymization alg ...

< 1 ... 27 28 29 30 31 32 33 34 35 ... 169 >

K-means clustering

k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.The problem is computationally difficult (NP-hard); however, there are efficient heuristic algorithms that are commonly employed and converge quickly to a local optimum. These are usually similar to the expectation-maximization algorithm for mixtures of Gaussian distributions via an iterative refinement approach employed by both algorithms. Additionally, they both use cluster centers to model the data; however, k-means clustering tends to find clusters of comparable spatial extent, while the expectation-maximization mechanism allows clusters to have different shapes.The algorithm has a loose relationship to the k-nearest neighbor classifier, a popular machine learning technique for classification that is often confused with k-means because of the k in the name. One can apply the 1-nearest neighbor classifier on the cluster centers obtained by k-means to classify new data into the existing clusters. This is known as nearest centroid classifier or Rocchio algorithm.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

K-means clustering