Data Mining Methods for Detection of New Malicious Executables

Managing Large-scale Multimedia Repositories

Data Mining on Empty Result Queries

... detect such a query from the beginning in the DBMS, before any real query evaluation is executed. This will not only provide a quick answer, but it also reduces the load on a busy DBMS. Many data mining approaches deal with mining high density regions (eg: discovering cluster), or frequent data valu ...

Visual Exploration of High-Dimensional Data: Subspace Analysis

... than the ambient dimensions. For example, the number of pixels in an image may be large. However, we typically use only a few parameters such as the geometry or the dynamics to describe the appearance. Data models inferred with such assumptions are often simple, in the number of parameters, and inte ...

distributed incremental data stream mining for wireless sensor

... Declaration “I hereby declare that this submission is my own work and that, to the best of my knowledge and belief, it contains no material previously published or written by another person (except where explicitly defined in the acknowledgements), nor material which to a substantial extent has bee ...

TOWARD ACCURATE AND EFFICIENT OUTLIER DETECTION IN

... large amounts of data every day. Data mining is the process of discovering relationships within data. The identified relationships can be used for scientific discovery, business decision making, or data profiling. Among data mining techniques, outlier detection plays an important role. Outlier detec ...

ROUGH SETS METHODS IN FEATURE REDUCTION AND

... where µ represents the total data mean and the determinant |Sb | denotes a scalar representation of the between-class scatter matrix, and similarly, the determinant |Sw | denotes a scalar representation of the within-class scatter matrix. Criteria based on minimum concept description. Based on the m ...

K - Department of Computer Science

... address the types of these algorithms, the way neighborhoods are calculated and the number of calculations involved. K-Means ...

Distributed and Stream Data Mining Algorithms for

... In several interesting application frameworks, such as wireless network analysis and fraud detection, data are naturally distributed among several entities and/or evolve continuously. In all of the above-indicated data mining tasks, dealing with either of these peculiarities provides additional chal ...

Incrementally Maintaining Classification using an RDBMS

Proceedings of the ECMLPKDD 2015 Doctoral Consortium

... The impetus for this work came from EMSAT Corporation, which specializes in real-time environment monitoring. With the aggregation and visualization components of their software already present, they were interested in further preprocessing and knowledge discovery in these data streams, in particula ...

Mining Health Data for Breast Cancer Diagnosis Using Machine

... based on iterative k nearest neighbours and the distance functions. The approach is an iterative approach until finding the most suitable features values that satisfy classification accuracy. The proposed approach showed improvement of 0.005 of classification accuracy on the constructed dataset than ...

Automatic Document Topic Identification Using Hierarchical

... around the world has led to a greatly increased need for machine understanding of their topics, as well as for automatic grouping of related documents. This constitutes one of the main current challenges in text mining. We introduce in this thesis a novel approach for identifying document topics. In ...

Spatial outlier detection based on iterative self

Mining Moving Object Data for Discovery of Animal Movement Patterns

... collected. Moving object data could be related to human, objects (e.g., airplanes, vehicles and ships), animals, and/or natural forces (e.g., hurricanes and tornadoes). Although most human and man-made object movements are closely associated with social and economic behaviors of people and society, ...

Discovering Highly Reliable Subgraphs in

Construction of Deterministic, Consistent, and Stable Explanations from Numerical Data and Prior Domain Knowledge

... two training sets A and B taken randomly from two populations A and B, respectively, are given. The attributes of the records may be numerical or nominal, and some entries may be missing and presumably cannot be obtained for various reasons. Possibly, partial prior domain knowledge is also given. We ...

Impact of Evaluation Methods on Decision Tree Accuracy Batuhan

... Receiving large amount of data has given companies, governments and private people an opportunity to use these raw data and turn them into valuable information. For instance, companies have started improving their businesses by the help of data. Business intelligence (BI) and business analytics (BA) ...

On Biased Reservoir Sampling in the Presence

Comparative Analysis of Various Approaches Used in Frequent

... dataset, H-struct is not as efficient as FP-Tree because FP-Tree allows compression. E. Incremental Update with Apriori-based Algorithms Complete dataset is normally huge and the incremental portion is relatively small compared to the complete dataset. In many cases, it is not feasible to perform a ...

Algorithms and Applications for Spatial Data Mining

DISC: Data-Intensive Similarity Measure for Categorical Data

... factors like co-occurrence statistics that can be effectively used to define what should be considered more similar and vice-versa. This observation has motivated researchers to come up with data-driven similarity measures for categorical attributes. Such measures take into account the frequency dis ...

Chapter4 - Department of Computer Science

Data Mining with Structure Adapting Neural Networks

... the shape of the network. The new features result in reducing the possibility of twisted maps and achieves convergence with localised self organisation. The localised processing and the optimised shape helps in generating representative maps with smaller number of nodes. The GSOM is also exible in ...

Rule extraction using Recursive-Rule extraction algorithm with

... diabetes, accounts for about 5% of all diagnosed adult cases of diabetes. Although it can occur at any age, the peak age for diagnosis of type 1 diabetes is in the mid-teens. The peak age of onset of type 2 diabetes mellitus (T2DM), which was previously known as non–insulin-dependent diabetes mellit ...

< 1 ... 3 4 5 6 7 8 9 10 11 ... 169 >

K-means clustering

k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.The problem is computationally difficult (NP-hard); however, there are efficient heuristic algorithms that are commonly employed and converge quickly to a local optimum. These are usually similar to the expectation-maximization algorithm for mixtures of Gaussian distributions via an iterative refinement approach employed by both algorithms. Additionally, they both use cluster centers to model the data; however, k-means clustering tends to find clusters of comparable spatial extent, while the expectation-maximization mechanism allows clusters to have different shapes.The algorithm has a loose relationship to the k-nearest neighbor classifier, a popular machine learning technique for classification that is often confused with k-means because of the k in the name. One can apply the 1-nearest neighbor classifier on the cluster centers obtained by k-means to classify new data into the existing clusters. This is known as nearest centroid classifier or Rocchio algorithm.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

K-means clustering