A Scalable Approach for Statistical Learning in Semantic Graphs

... Gram matrix) for the training instances. In many applications N can be very large, therefore we now follow [52] and use the Nyström approximation to scale up kernel computations to large data sets. The Nyström approximation is based on an approximation to eigen functions and starts with the eigen de ...

LINKING DIFFERENT GEOSPATIAL DATABASES BY EXPLICIT RELATIONS

... degree of similarity of corresponding instances. For this reason, we introduced a so-called MultirepresentationalRelation object to connect multiple representations. Within such an object, all information on how representations are related can be stored. Its attributes contain the general, geometric ...

classification on multi-label dataset using rule mining

... among multiple variables, it may overcome some constraints introduced by a decision-tree induction method which examines one variable at a time. Extensive performance studies [ 14, 15, 16] show that association based classification may have better accuracy in general. ...

Postprocessing in Machine Learning and Data Mining

Clust

1.3 Tasks of Data Mining

... errors, coding and recording errors, and, sometimes, are natural, abnormal values. Such nonrepresentative samples can seriously affect the model produced later. There are two strategies for dealing with outliers: ...

Finding Associations and Computing Similarity via Biased Pair

... RAM, or on a modern SSD that is able to deliver data at a rate of more than a gigabyte per second. One remedy that has been used (to reduce space, but also time) is to require high support, i.e., define “occur frequently together” such that most items can be thrown away initially, simply because the ...

"Modern Trends in Data Mining"(pdf)

... • The bank has a large database of existing and past customers. Some of these defaulted on loans, others frequently made late payments etc. An outcome variable “Status” is defined, taking value “good” or “default”. Each of the past customers is scored with a value for status. • Background informatio ...

Analyzing student inquiry data using process discovery and

... also other forms of data that are not explicitly time-stamped but are still otherwise ordered, such as text or protein sequences. Temporal data is often divided into two categories: sequences that consist of continuous, real-valued data points taken at regular intervals, which are referred to as tim ...

ida-2002

Data Mining - GMU Computer Science

... extraction of implicit, previously unknown and potentially useful information from data (normally large databases)   Exploration & analysis, by automatic or semiautomatic means, of large quantities of data in order to discover meaningful patterns.   Part of the Knowledge Discovery in ...

Using The Techniques Of Data Mining And Text Mining In

... Today there are many specialized ways of data storing and retrieving. Photographs, movies, articles and other similar data can be stored in electronic settings. Given that such storing provides hugh opportunity, it is called information explosion. Search engines scan the related data based on the ke ...

Most time is spent on data extraction, transformation

... If enough data is available, split the data into two samples  The train set to fit the models  The test set to check the model’s performance on observations that have not been used to build it ...

Identifying High-Number-Cluster Structures in RFID Ski Lift Gates

... – nine stopping criteria k = n×{10, 20, 30, 40, 50, 60, 70, 80, 90 %}, and – two distance measures, i.e. cosine distance measure and absolute normalized difference “And” [14] which produces 18 algorithm settings. In addition, for the remaining two algorithm settings, we also propose a stopping crite ...

main title of the paper – style "main title"

... The starting point for process mining is not just any data, but event data (IEEE Task Force on Process Mining, 2012). Data should refer to discrete events that happened in reality. A collection of related events is referred to as an event log. Each event in such a log refers to an activity (i.e., a ...

Detecting Internet Worms Using Data Mining Techniques

Density-Based Clustering over an Evolving Data Stream with Noise

IOSR Journal of Computer Engineering (IOSR-JCE)

... of the Classifier because it eliminates irrelevant attributes. Feature selection with decision tree classification greatly enhances the quality of the data in medical diagnosis. CART algorithm with various feature selection methods to find out whether the same feature selection method may lead to be ...

Predicting school funding requests that deserve an A+

... The goal of this research is to help DonorChoose.org identify (predict) school funding requests that deserve an A+. In other words, which funding requests are, based on certain criteria, the most exciting and most likely to raise money. This problem was made available at Kaggle.com as one of their m ...

Novel User Interfaces: The Digital Desk as the Interface of the Future

EC2016_v2 - Genii Software

... - Choose how to flatten multiple values ...

What is CLIQUE - ugweb.cs.ualberta.ca

... dense units in K-dimensions. Two K-dimensional units u1, u2 are connected if they have a common face, or if there exists other K-dim unit ui, such that u1, ui and u2 are connected consequently. A region in K dimensions is an axisparallel rectangular K-dimensional set. ...

Clustering Large Datasets using Data Stream

... 2 Clustering large data sets Clustering groups objects such that objects in a group are more similar to each other than to the objects in a different group (Kaufman and Rousseeuw (1990)). Formally clustering can be defined as: Definition 1 (Clustering). Partition a set of objects O = {o1 , o2 , . . ...

Multi-core Implementations of the Concurrent Collections

... Scientific Datasets Using Bitmap Indices – SciCSM: Novel Contrast Set Mining over Scientific Datasets Using Bitmap Indices • Data Processing Support – StreamingMATE: A Novel MapReduce-Like Framework Over Scientific Data Stream ...

< 1 ... 235 236 237 238 239 240 241 242 243 ... 505 >

Nonlinear dimensionality reduction

High-dimensional data, meaning data that requires more than two or three dimensions to represent, can be difficult to interpret. One approach to simplification is to assume that the data of interest lie on an embedded non-linear manifold within the higher-dimensional space. If the manifold is of low enough dimension, the data can be visualised in the low-dimensional space.Below is a summary of some of the important algorithms from the history of manifold learning and nonlinear dimensionality reduction (NLDR). Many of these non-linear dimensionality reduction methods are related to the linear methods listed below. Non-linear methods can be broadly classified into two groups: those that provide a mapping (either from the high-dimensional space to the low-dimensional embedding or vice versa), and those that just give a visualisation. In the context of machine learning, mapping methods may be viewed as a preliminary feature extraction step, after which pattern recognition algorithms are applied. Typically those that just give a visualisation are based on proximity data – that is, distance measurements.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Nonlinear dimensionality reduction