Local Outlier Detection with Interpretation⋆

Detecting Blackhole and Volcano Patterns in Directed

MapReduce/Hadoop

... Powerful analytical tools for Some limited built-in analytics. big-data. ...

Problems and Algorithms for Sequence

... a high-level view of the sequence’s structure. Moreover, it provides useful information, directing more detailed studies to focus on homogeneous regions, namely the segments. Finally, there are many sequences that appear to have an inherent segmental structure e.g., haplotypes or other genetic seque ...

Advanced Data Mining Techniques for Compound Objects

Deep web - AllThesisOnline

... Over the years a critical increase in the mass of the web has been observed. Among that a large part comprises of online subject-specific databases, hidden behind query interface forms called as deep web. Existing search engines are unable to completely index this highly relevant information due to ...

Anomaly-Based Online Intrusion Detection System as a Sensor for

... data networks and networked computer systems. That complex data ensemble, the cyber domain, provides great opportunities, but at the same time it offers many possible attack vectors that can be abused for cyber vandalism, cyber crime, cyber espionage or cyber terrorism. Those threats produce require ...

Exploiting Data Mining Techniques in the Design of

... 1.1 Knowledge discovery from large datasets ................................................................. 1 1.2 Integrated use of data mining and data warehousing ................................................ 2 1.3 Unresolved issues and motivation of the thesis ............................... ...

University of Alberta Library Release Form Name of Author Title of Thesis

... In the last three decades, computer-assisted biological technologies have been playing a more and more significant role in our society. For instance, hierarchical clustering of DNA sequences now enables scientists to trace down the origins of contagious diseases that have the potential to evolve int ...

Emerging Topic Detection for Organizations from Microblogs Yan Chen Hadi Amiri

... on the novelty of topics, and they mainly model the novel words based on word co-occurrences within the topics. In this work, we extend the deﬁnition of “emerging” to incorporate temporal aspect of timeliness. In other words, we want to detect emerging topics that are not only novel, but also those ...

Oracle Data Mining Concepts

... The Programs are not intended for use in any nuclear, aviation, mass transit, medical, or other inherently dangerous applications. It shall be the licensee's responsibility to take all appropriate fail-safe, backup, redundancy and other measures to ensure the safe use of such applications if the Pro ...

An overview on subgroup discovery - Soft Computing and Intelligent

... Previous techniques have not been able to achieve this propose. For example, predictive techniques maximise accuracy in order to correctly classify new objects, and descriptive techniques simply search for relations between unlabelled objects. The need for obtaining simple models with a high level o ...

Object-Based Selective Materialization for Efficient Implementation

T._Ravindra_Ba .V._Subrah(BookZZ.org)

... feature or item-support. The concept of support relates to the conventional association rule framework. We consider patterns as sequences, form subsequences of short length, and identify and eliminate repeating subsequences. We represent the pattern by those unique subsequences leading to significan ...

Early Classification on Time Series

A multi-stage decision algorithm to generate interesting rules

... predictive qualities in identifying student attrition. These studies, however, can yield differing and sometimes conflicting indicators due to the use of different data mining techniques that yield different models, even on the same datasets. Even so, models are still useful for drop out prediction ...

Feature Selection and Classification Methods for Decision Making: A

Discovering Conditional Functional Dependencies

... determines STR. It is an FD that only holds on the subset of tuples with the pattern “CC = 44”, rather than on the entire relation r0 . CFD φ1 assures that for any customer in the US (country code 01) with area code 908, the city of the customer must be MH, as enforced by its pattern tuple (01, 908 ...

Locally linear embedding algorithm. Extensions and applications

... retain or highlight meaningful information while reducing the dimensionality of data. Since the nature of real-world data is often nonlinear, linear dimensionality reduction techniques, such as principal component analysis (PCA), fail to preserve a structure and relationships in a highdimensional sp ...

Big Data Technology - Hadoop, MapReduce, and Spark

... Very fast for small-medium size data. ...

Density-based Algorithms for Active and Anytime Clustering

Theorem 1

... The equation has a single unknown and a single root x’i,1 ...

Detection of Outliers in Time Series Data - e

... Scatter plot of flow consumption vs, HDD for a typical JOTO . . . . . . . . Gas flow for a BARIDI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Scatter plot of flow consumption vs, HDD for a BARIDI . . . . . . . . . . . Time series flow outliers as observed by the GasDay project . . . ...

Causal Explorer Software Library - Journal of Machine Learning

... real-world BN-based decision support systems. Unfortunately, the size of the existing known BNs is relatively small in the order of at most a few hundred variables. Thus, typically causal discovery algorithms were so far validated on relatively small networks (e.g., with less than 100 variables), su ...

Statistical Machine Learning for Data Mining and

... (BMAL). In contrast to traditional approaches, the BMAL method searches a batch of informative examples for labeling. To develop an eﬀective algorithm, the BMAL task is formulated into a convex optimization problem and a novel bound optimization algorithm is proposed to eﬃciently solve it with globa ...

< 1 2 3 4 5 6 7 8 9 ... 169 >

K-means clustering

k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.The problem is computationally difficult (NP-hard); however, there are efficient heuristic algorithms that are commonly employed and converge quickly to a local optimum. These are usually similar to the expectation-maximization algorithm for mixtures of Gaussian distributions via an iterative refinement approach employed by both algorithms. Additionally, they both use cluster centers to model the data; however, k-means clustering tends to find clusters of comparable spatial extent, while the expectation-maximization mechanism allows clusters to have different shapes.The algorithm has a loose relationship to the k-nearest neighbor classifier, a popular machine learning technique for classification that is often confused with k-means because of the k in the name. One can apply the 1-nearest neighbor classifier on the cluster centers obtained by k-means to classify new data into the existing clusters. This is known as nearest centroid classifier or Rocchio algorithm.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

K-means clustering