04_CAINE-clustering

Pattern-Preserving k-Anonymization of Sequences and its Application to Mobil- ity Data Mining

... In the work presented in [3], the authors study the problem of anonymity preserving data publishing in moving objects databases. They propose the notion of (k, δ) − anonymity for moving objects databases. In particular, this is a novel concept of k-anonymity based on co-localization that exploits th ...

Data Preprocessing

Information Visualisation and Machine Learning

... representations, and information visualisation practitioners generally resort to processing or filtering the original data by hand. Generally speaking, scalability of visualisation techniques has been a long-standing issue in the field. Regarding the Visually enhanced Mining category, Section 3 show ...

Clustering Algorithms in Hybrid Recommender System on

... One of the ﬁrst Cluster-based approaches, where clustering was used to partition users’ preferences in order to increase neighbour searching eﬃciency is described in (Sarwar, Karypis, Konstan, & Riedl, 2002). One of the recent examples is (Kim, 2005), where k-means clustering with genetic algorithms ...

A Survey on Optimization of Apriori Algorithim for

... generate k+1-itemsets. Each k-itemset must be greater than or equal to minimum support threshold to be frequency. Otherwise, it is called candidate itemsets. In the first, the algorithm scan database to find frequency of 1-itemsets that contains only one item by counting each item in database. The f ...

city

...  Amortize-scans: computing as many as possible cuboids at the same time to amortize disk reads  Share-sorts: sharing sorting costs cross multiple cuboids when sort-based method is used  Share-partitions: sharing the partitioning cost across multiple cuboids when hash-based algorithms are used ...

Data Mining for Business Intelligence

... If many records are missing values on a small set of variables, can drop those variables (or use proxies) If many records have missing values, omission is not practical ...

Big Data and Specific Analysis Methods for Insurance Fraud Detection

... specific to each country, usually based on gaps or weaknesses of legislation. Models are constantly changing fraud, malicious individuals seeking ever new ways to circumvent the law. Consequently, methods for identifying and preventing fraud must always be adjusted and ready to rediscover the fraudu ...

Knowledge Management in CRM using Data mining Technique

... the old customers should be maintained because from previous research it was found that, in the industry it is commonly held that maintaining existing customers is more cost-effective than attracting new ones, and that 20% of customers create 80% of the profit for industry. From this the conclusion ...

Computational and Visual Support for Geographical Knowledge Construction: Filling in Exploration

... classification), or providing a view onto the data from a single perspective (e.g. scatterplot, parallel coordinate plot). By doing so, they implicitly assume that problems in science can be isolated to a single conceptual ‘plane’, which, when correctly understood and represented, can be fixed to fo ...

i COMPARATIVE STATISTICAL ANALYSES OF AUTOMATED BOOLEANIZATION METHODS FOR DATA MINING PROGRAMS

Integration of Classification and Clustering for the Analysis of Spatial

... away from Ooty, received record rainfall of 820mm in 24 hours while Ooty recorded 170mm. Many parts of the Nilgiris continued to remain cut off on Wednesday (11th Nov. 2009) due to landslips. As per another media report as many as 543 landslips has occured in just two days (10-11) in the Nilgiris, a ...

pdf-file - SFU Computing Science

... top-down (starting from the set of all attributes, remove one attribute at a time) e.g. optimizing the discrimination between the different classes too many attributes lead to inefficient and ineffective data mining some transformations can be realized by OLAP-systems ...

IT Applications in Business Analytics

...  Where does the data reside? How is it to be accessed?  What forms of sampling are needed? are possible? are appropriate?  What are the implications of the database or data warehouse structure and constraints on data movement and data preparation? ...

Domain Adaptation for Machine Translation by Mining Unseen Words Jagadeesh Jagarlamudi Abstract

... use context and orthographic features. In the second stage, using the dictionary probabilities of seen words, we identify pairs of words whose feature vectors are used to learn the CCA projection directions. In the final stage, we project all the words into the sub-space identified by CCA and mine t ...

5 DATA MINING IN TIME SERIES DATA MINING U VREMENSKIM

... One of the unavoidable functions that follows literally every step in the data mining process is data visualization, with which the miner, in a simple and efficient way, acquire the necessary guidelines, critical for the selection of direction in the further analysis. Anomaly detection refers to an ...

Job Shop Scheduling

... “It is needless to do more when less will suffice” – William of Occam, died 1349 of the Black plague ...

A Multi-relational Decision Tree Learning Algorithm

... KDD Cup 2001 [5] showed that the execution of queries encoded by such selection graphs is a major bottleneck in terms of the running time of the algorithm. (b) Inability to handle missing attribute values: In multi-relational databases encountered in many real-world applications of data mining, a si ...

1. Which of the following is the most popularly available and rich

Research Methods for the Learning Sciences

Multi-agent based decision Support System using Data Mining and

... interfaces, and control mechanisms to support a s pecific decision problem. Various researches have shown the uses of DSS in order to handle complex decision modeling and management process. We propose a multi-agent architecture in DSS especially for distributed environment. In this paper, multi-age ...

An Incremental Hierarchical Data Clustering Algorithm Based on

... The development of incremental clustering algorithms can be traced back to 1980s [4]. In 1989, Fisher proposed CLASSIT [5], which is an alternative version of COBWEB designed for handling numerical data sets [4]. However, CLASSIT assumes that the attribute values of the clusters are normally distrib ...

2016 OLAP Mining Rules: Association of OLAP with Data Mining

... prediction, it means that if the relative humidity today is low (below 36), wind speed is moderate and temperature is warm then, rain tomorrow maybe light (< 2.5 millimeters per hour). Rules #4, #5 and #6 provide with better understanding for Gaza city weather. These rules give us an indication that ...

Ontology-driven association rules extraction: a case of study

... SHIF(D) and SHOIN (D) description logics (DL), respectively, whereas the third language was designed to provide full compatibility with RDF(S). We focus mainly on the first two variants of OWL because OWL-Full has a nonstandard semantics that makes the language undecidable and therefore difficult to ...

< 1 ... 192 193 194 195 196 197 198 199 200 ... 505 >

Nonlinear dimensionality reduction

High-dimensional data, meaning data that requires more than two or three dimensions to represent, can be difficult to interpret. One approach to simplification is to assume that the data of interest lie on an embedded non-linear manifold within the higher-dimensional space. If the manifold is of low enough dimension, the data can be visualised in the low-dimensional space.Below is a summary of some of the important algorithms from the history of manifold learning and nonlinear dimensionality reduction (NLDR). Many of these non-linear dimensionality reduction methods are related to the linear methods listed below. Non-linear methods can be broadly classified into two groups: those that provide a mapping (either from the high-dimensional space to the low-dimensional embedding or vice versa), and those that just give a visualisation. In the context of machine learning, mapping methods may be viewed as a preliminary feature extraction step, after which pattern recognition algorithms are applied. Typically those that just give a visualisation are based on proximity data – that is, distance measurements.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Nonlinear dimensionality reduction