
Chapter4 - Department of Computer Science
... • Contains an experimental evaluation on 16 datasets (using crossvalidation to estimate classification accuracy on fresh data) • Required minimum number of instances in majority class was set to 6 after some experimentation ...
... • Contains an experimental evaluation on 16 datasets (using crossvalidation to estimate classification accuracy on fresh data) • Required minimum number of instances in majority class was set to 6 after some experimentation ...
Finding Optimal Decision Trees
... Abstract Decision trees are widely used technique to describe the data. They are also used for predictions. We may find that the same distribution can be described by one or more decision trees. Usually, we are interested in the simplest tree (we will call it the optimal tree). This thesis proposes ...
... Abstract Decision trees are widely used technique to describe the data. They are also used for predictions. We may find that the same distribution can be described by one or more decision trees. Usually, we are interested in the simplest tree (we will call it the optimal tree). This thesis proposes ...
Context Cube
... Our idea behind using a warehouse is to represent relationships within the actual structure of the data. The benefits are threefold: easier representation and processing of queries [3], the inclusion of expanded definitions and relationships, and the creation of new context constructed from analysis ...
... Our idea behind using a warehouse is to represent relationships within the actual structure of the data. The benefits are threefold: easier representation and processing of queries [3], the inclusion of expanded definitions and relationships, and the creation of new context constructed from analysis ...
Data Mining for Web Personalization
... Seldom are all these criteria satisfied in a typical data mining application. Personalization on the Web, and more specifically in e-commerce, has been considered the “killer app” for data mining, in part because many of these elements are indeed present. However, to be able to take full advantage o ...
... Seldom are all these criteria satisfied in a typical data mining application. Personalization on the Web, and more specifically in e-commerce, has been considered the “killer app” for data mining, in part because many of these elements are indeed present. However, to be able to take full advantage o ...
Textual Data Mining Applications for Industrial Knowledge
... techniques serve the purpose through handling structured data formats. If however the data is semi-structured or unstructured the combined efforts of data and text mining technologies may be needed to bring fruitful results. This thesis focuses on issues related to discovery of knowledge from semi-s ...
... techniques serve the purpose through handling structured data formats. If however the data is semi-structured or unstructured the combined efforts of data and text mining technologies may be needed to bring fruitful results. This thesis focuses on issues related to discovery of knowledge from semi-s ...
Data-Driven Simulation Modeling of Construction and
... Figure 2.1: Taxonomy-based state classification of construction fleet......................................... 18 Figure 2.2: Taxonomy of dump truck activities in an earthmoving operation based upon multimodal process data and operational context. .................................................. ...
... Figure 2.1: Taxonomy-based state classification of construction fleet......................................... 18 Figure 2.2: Taxonomy of dump truck activities in an earthmoving operation based upon multimodal process data and operational context. .................................................. ...
Contents - Department of Computer Engineering
... 1. Data cleaning (to remove noise and inconsistent data) 2. Data integration (where multiple data sources may be combined)1 3. Data selection (where data relevant to the analysis task are retrieved from the database) 4. Data transformation (where data are transformed or consolidated into forms appro ...
... 1. Data cleaning (to remove noise and inconsistent data) 2. Data integration (where multiple data sources may be combined)1 3. Data selection (where data relevant to the analysis task are retrieved from the database) 4. Data transformation (where data are transformed or consolidated into forms appro ...
On Propositionalization for Knowledge Discovery in Relational
... preparation for data mining. Such systems have been applied for more than 15 years, often competitive compared to other approaches to relational learning. However, the broad range of approaches to propositionalization suffered from a number of disadvantages. First, the single approaches were not des ...
... preparation for data mining. Such systems have been applied for more than 15 years, often competitive compared to other approaches to relational learning. However, the broad range of approaches to propositionalization suffered from a number of disadvantages. First, the single approaches were not des ...
Mining Social Ties Beyond Homophily - Computing Science
... in a social network, in terms of attributes information on nodes and edges, holds a key to the understanding of how the actors interact and form relationships. We formalize this problem as mining top-k group relationships (GRs), which captures strong social ties between groups of actors. While exist ...
... in a social network, in terms of attributes information on nodes and edges, holds a key to the understanding of how the actors interact and form relationships. We formalize this problem as mining top-k group relationships (GRs), which captures strong social ties between groups of actors. While exist ...
Transaction / Regular Paper Title
... the use of plug-in fitness functions is not very common in traditional clustering; the only exception is the CHAMELEON [6] clustering algorithm. However, fitness functions play a more important role in semi-supervised and supervised clustering [7] and in adaptive clustering [8]. The main contributio ...
... the use of plug-in fitness functions is not very common in traditional clustering; the only exception is the CHAMELEON [6] clustering algorithm. However, fitness functions play a more important role in semi-supervised and supervised clustering [7] and in adaptive clustering [8]. The main contributio ...
Spatial Clustering of Structured Objects
... Clusters obtained with the cluster-based homogeneity evaluation in different configurations: sequential (a), ascending (b) and descending (c) order for seed selection. In the second row, results obtained with sequential (d), ascending (e) and descending (f) order for seed selection and single-neighbor ...
... Clusters obtained with the cluster-based homogeneity evaluation in different configurations: sequential (a), ascending (b) and descending (c) order for seed selection. In the second row, results obtained with sequential (d), ascending (e) and descending (f) order for seed selection and single-neighbor ...
Mapping Text and Data Mining
... which they have legal access to. However, at the moment, researchers do not have recourse to analytical tools that are taken for granted in many other aspects of modern life, which would assist them in doing so. The extraction of facts or individual words is not subject to copyright law – a human be ...
... which they have legal access to. However, at the moment, researchers do not have recourse to analytical tools that are taken for granted in many other aspects of modern life, which would assist them in doing so. The extraction of facts or individual words is not subject to copyright law – a human be ...
Lecture 9 - UNM Computer Science
... ◦ Assumes that the normal data is generated by a parametric distribution with parameter θ ◦ The probability density function of the parametric distribution f(x, θ) gives the probability that object x is generated by the distribution ◦ The smaller this value, the more likely x is an outlier Non-param ...
... ◦ Assumes that the normal data is generated by a parametric distribution with parameter θ ◦ The probability density function of the parametric distribution f(x, θ) gives the probability that object x is generated by the distribution ◦ The smaller this value, the more likely x is an outlier Non-param ...
NAVAL POSTGRADUATE SCHOOL THESIS
... Digital forensics is a growing and important field of research for current intelligence, law enforcement, and military organizations today. As more information is stored in digital form, the need and ability to analyze and process this information for relevant evidence has grown in complexity. Today ...
... Digital forensics is a growing and important field of research for current intelligence, law enforcement, and military organizations today. As more information is stored in digital form, the need and ability to analyze and process this information for relevant evidence has grown in complexity. Today ...
Oracle Data Mining Application Developer`s Guide
... The Programs (which include both the software and documentation) contain proprietary information; they are provided under a license agreement containing restrictions on use and disclosure and are also protected by copyright, patent, and other intellectual and industrial property laws. Reverse engine ...
... The Programs (which include both the software and documentation) contain proprietary information; they are provided under a license agreement containing restrictions on use and disclosure and are also protected by copyright, patent, and other intellectual and industrial property laws. Reverse engine ...
29 Trajectory Data Mining: An Overview
... listing a few key applications that trajectory data can enable in each group. Secondly, before using trajectory data, we need to deal with a number of issues, such as noise filtering, segmentation, and map matching. This stage is called trajectory preprocessing, which is a fundamental step of many t ...
... listing a few key applications that trajectory data can enable in each group. Secondly, before using trajectory data, we need to deal with a number of issues, such as noise filtering, segmentation, and map matching. This stage is called trajectory preprocessing, which is a fundamental step of many t ...
Progressive Skyline Computation in Database Systems
... gregory.c.fu@jpmchase.com; B. Seeger, Department of Mathematics and Computer Science, Philipps University, Hans-Meerwein-Strasse, Marburg, Germany 35032; email: seeger@mathematik.unimarburg.de. Permission to make digital/hard copy of part or all of this work for personal or classroom use is granted ...
... gregory.c.fu@jpmchase.com; B. Seeger, Department of Mathematics and Computer Science, Philipps University, Hans-Meerwein-Strasse, Marburg, Germany 35032; email: seeger@mathematik.unimarburg.de. Permission to make digital/hard copy of part or all of this work for personal or classroom use is granted ...
Nonlinear dimensionality reduction

High-dimensional data, meaning data that requires more than two or three dimensions to represent, can be difficult to interpret. One approach to simplification is to assume that the data of interest lie on an embedded non-linear manifold within the higher-dimensional space. If the manifold is of low enough dimension, the data can be visualised in the low-dimensional space.Below is a summary of some of the important algorithms from the history of manifold learning and nonlinear dimensionality reduction (NLDR). Many of these non-linear dimensionality reduction methods are related to the linear methods listed below. Non-linear methods can be broadly classified into two groups: those that provide a mapping (either from the high-dimensional space to the low-dimensional embedding or vice versa), and those that just give a visualisation. In the context of machine learning, mapping methods may be viewed as a preliminary feature extraction step, after which pattern recognition algorithms are applied. Typically those that just give a visualisation are based on proximity data – that is, distance measurements.