
A Study of Density-Grid based Clustering Algorithms on Data Streams
... C. DD-Stream Jia et al. in [17] proposed a framework called DD-Stream for density-based clustering of data streams in grids. They developed an algorithm, DCQ-means, for improving quality of clustering by considering the border points of the grids. The framework is online-offline phase in which the o ...
... C. DD-Stream Jia et al. in [17] proposed a framework called DD-Stream for density-based clustering of data streams in grids. They developed an algorithm, DCQ-means, for improving quality of clustering by considering the border points of the grids. The framework is online-offline phase in which the o ...
Internet Traffic Identification using Machine Learning
... The unsupervised machine learning approach is based on a classifier built from clusters that are found and labeled in a training set of data. Once the classifier has been built, the classification process consists of the classifier calculating which cluster a connection is closest to, and using the ...
... The unsupervised machine learning approach is based on a classifier built from clusters that are found and labeled in a training set of data. Once the classifier has been built, the classification process consists of the classifier calculating which cluster a connection is closest to, and using the ...
Using formal ontology for integrated spatial data mining
... 2.1 Relation between Data Mining and Ontology Construction Data mining enhances the level of understanding by extracting a high-level knowledge from a low-level data [3]. Ontology construction makes implicit meaning explicit by formalizing how the knowledge is conceptualized. Here we first discuss d ...
... 2.1 Relation between Data Mining and Ontology Construction Data mining enhances the level of understanding by extracting a high-level knowledge from a low-level data [3]. Ontology construction makes implicit meaning explicit by formalizing how the knowledge is conceptualized. Here we first discuss d ...
Association Rule Mining using Apriori Algorithm: A Survey
... in the last few years. Many organizations have collected massive amount data. This data set is usually stored on storage database systems. Two major problems arise in the analysis of the information system. One is reducing unnecessary objects and attributes so as to get the minimum subset of attribu ...
... in the last few years. Many organizations have collected massive amount data. This data set is usually stored on storage database systems. Two major problems arise in the analysis of the information system. One is reducing unnecessary objects and attributes so as to get the minimum subset of attribu ...
Clustering Genes using Gene Expression and Text Literature Data
... cluster. The set of parameters for the i-th model is denoted by λi. Typically, all the models are assumed to be from the same family, e.g., Gaussian or multinomial distribution. In the sample reassignment step, a data point could be assigned to a cluster using three possible approaches: maximum like ...
... cluster. The set of parameters for the i-th model is denoted by λi. Typically, all the models are assumed to be from the same family, e.g., Gaussian or multinomial distribution. In the sample reassignment step, a data point could be assigned to a cluster using three possible approaches: maximum like ...
resume - Navodaya Institute of Technology, Raichur
... Cloud computing is leading the technology development of today’s communication scenario. This is because of its cost-efficiency and flexibility. In Cloud computing vast amounts of data are stored in varied and distributed environment & security to data is of prime concern. The security concept to d ...
... Cloud computing is leading the technology development of today’s communication scenario. This is because of its cost-efficiency and flexibility. In Cloud computing vast amounts of data are stored in varied and distributed environment & security to data is of prime concern. The security concept to d ...
Steven F. Ashby Center for Applied Scientific Computing Month DD
... – Data streams and sensor data – Time-series data, temporal data, sequence data – Structure data, graphs, social networks and multi-linked data – Heterogeneous databases and legacy databases – Spatial, spatiotemporal, multimedia, text and Web data ...
... – Data streams and sensor data – Time-series data, temporal data, sequence data – Structure data, graphs, social networks and multi-linked data – Heterogeneous databases and legacy databases – Spatial, spatiotemporal, multimedia, text and Web data ...
Ant-based clustering: a comparative study of its relative performance
... and sorting of the elements on the grid is obtained. Hence, like ant colony optimisation (ACO, [6]), ant-based clustering and sorting is a distributed process that employs positive feedback. However, in contrast to ACO, no artificial pheromones are used; instead, the environment itself serves as sti ...
... and sorting of the elements on the grid is obtained. Hence, like ant colony optimisation (ACO, [6]), ant-based clustering and sorting is a distributed process that employs positive feedback. However, in contrast to ACO, no artificial pheromones are used; instead, the environment itself serves as sti ...
Data Discretization: Taxonomy and Big Data Challenge
... the associated learning algorithm. Good examples of classical dynamic techniques are ID3 discretizer (73) and ITFP (31). – Univariate vs. Multivariate: Univariate discretizers only operate with a single attribute simultaneously. This means that they sort the attributes independently, and then, the d ...
... the associated learning algorithm. Good examples of classical dynamic techniques are ID3 discretizer (73) and ITFP (31). – Univariate vs. Multivariate: Univariate discretizers only operate with a single attribute simultaneously. This means that they sort the attributes independently, and then, the d ...
10 Challenging Problems in Data Mining Research∗
... problem of “concept drift” or “environment drift”. This problem is particularly hard in the context of large streaming data. How may one compute models that are accurate and useful very efficiently? For example, one cannot presume to have a great deal of computing power and resources to store very m ...
... problem of “concept drift” or “environment drift”. This problem is particularly hard in the context of large streaming data. How may one compute models that are accurate and useful very efficiently? For example, one cannot presume to have a great deal of computing power and resources to store very m ...
Associative Classification Based on Incremental Mining (ACIM)
... great number of important applications in which data are often collected by these applications on daily, weekly or monthly there is a great interest to develop or at least enhance the current classification methods to handle the incremental learning problem. This is the primary motivation of our alg ...
... great number of important applications in which data are often collected by these applications on daily, weekly or monthly there is a great interest to develop or at least enhance the current classification methods to handle the incremental learning problem. This is the primary motivation of our alg ...
Data Mining Approaches for Life Cycle Assessment
... a later time. Such an approach could reduce the cost of creating large-scale environmental databases. To illustrate this approach, we apply it to the ecoinvent 2.0 database. In order that ground truth be available to estimate the accuracy of our methods, we randomly remove up to 10% of the known imp ...
... a later time. Such an approach could reduce the cost of creating large-scale environmental databases. To illustrate this approach, we apply it to the ecoinvent 2.0 database. In order that ground truth be available to estimate the accuracy of our methods, we randomly remove up to 10% of the known imp ...
"Efficient Kernel Clustering using Random Fourier Features"
... eigenvectors of H H and the singular vectors of H can be recovered from the eigenvectors of S S as H V . Using this approximation, the runtime complexity of SVD is reduced to O(s2 m). The time taken to execute k-means on the singular vectors is O(nC 2 l). When max(m, s, l, C) n, the proposed ...
... eigenvectors of H H and the singular vectors of H can be recovered from the eigenvectors of S S as H V . Using this approximation, the runtime complexity of SVD is reduced to O(s2 m). The time taken to execute k-means on the singular vectors is O(nC 2 l). When max(m, s, l, C) n, the proposed ...
Visual mining of moving flock patterns in large
... groups of ‘collective’ patterns in moving object databases: clustering for moving objects, convoy queries and flock patterns. Moving clusters (Jensen et al. 2007) and convoy queries (Jeung et al. 2008) differ from flock patterns mainly because they do not necessarily contain the same objects during ...
... groups of ‘collective’ patterns in moving object databases: clustering for moving objects, convoy queries and flock patterns. Moving clusters (Jensen et al. 2007) and convoy queries (Jeung et al. 2008) differ from flock patterns mainly because they do not necessarily contain the same objects during ...
H0444146
... flow prioritization, traffic shaping/policing, and diagnostic monitoring. Many approaches have been evolved for this purpose. The classical approach such as port number or payload analysis methods has their own limitations. For example, some applications uses dynamic port number and encryption techn ...
... flow prioritization, traffic shaping/policing, and diagnostic monitoring. Many approaches have been evolved for this purpose. The classical approach such as port number or payload analysis methods has their own limitations. For example, some applications uses dynamic port number and encryption techn ...
Mining Stream Data with Data Load Shedding
... set is the number of items it contains, and an item set of length l is called an l-item set. A transaction T consists of a set of ordered items, and T supports an item set X if X€T. The support of an item set X in a group of transactions is the number of occurrences of X within the group. An item se ...
... set is the number of items it contains, and an item set of length l is called an l-item set. A transaction T consists of a set of ordered items, and T supports an item set X if X€T. The support of an item set X in a group of transactions is the number of occurrences of X within the group. An item se ...
Data Mining and Data Visualization
... are found, then relevant sets of three or four. These are then pruned by removing those that occur infrequently. In an environment like a grocery store, where customers commonly buy over 100 items, rules could involve as many as 10 items. © 2003, Prentice-Hall ...
... are found, then relevant sets of three or four. These are then pruned by removing those that occur infrequently. In an environment like a grocery store, where customers commonly buy over 100 items, rules could involve as many as 10 items. © 2003, Prentice-Hall ...
An integrated data mining and data presentation tool
... homogeneity is often made precise by means of a dissimilarity function on objects, having low values at pairs of objects in one cluster. Similarity queries between two objects are frequently used in data exploration and mining, e.g. as a search routine in clustering algorithms, or in the iterative e ...
... homogeneity is often made precise by means of a dissimilarity function on objects, having low values at pairs of objects in one cluster. Similarity queries between two objects are frequently used in data exploration and mining, e.g. as a search routine in clustering algorithms, or in the iterative e ...
Classification and Supervised Learning
... Probabilistic learning: Calculate explicit probabilities for hypothesis, among the most practical approaches to certain types of learning problems Incremental: Each training example can incrementally increase/decrease the probability that a hypothesis is correct. Prior knowledge can be combined with ...
... Probabilistic learning: Calculate explicit probabilities for hypothesis, among the most practical approaches to certain types of learning problems Incremental: Each training example can incrementally increase/decrease the probability that a hypothesis is correct. Prior knowledge can be combined with ...
Nonlinear dimensionality reduction

High-dimensional data, meaning data that requires more than two or three dimensions to represent, can be difficult to interpret. One approach to simplification is to assume that the data of interest lie on an embedded non-linear manifold within the higher-dimensional space. If the manifold is of low enough dimension, the data can be visualised in the low-dimensional space.Below is a summary of some of the important algorithms from the history of manifold learning and nonlinear dimensionality reduction (NLDR). Many of these non-linear dimensionality reduction methods are related to the linear methods listed below. Non-linear methods can be broadly classified into two groups: those that provide a mapping (either from the high-dimensional space to the low-dimensional embedding or vice versa), and those that just give a visualisation. In the context of machine learning, mapping methods may be viewed as a preliminary feature extraction step, after which pattern recognition algorithms are applied. Typically those that just give a visualisation are based on proximity data – that is, distance measurements.