Using text clustering to predict defect resolution time: a conceptual

The Data Mining and Data Usability Challenge

... ITSC and Simpson Weather Associates are applying data mining frameworks for the analysis and extraction of information from numerical model output data generated or archived at the GMAO. The team is conducting experiments focusing on the automated detection and mining of atmospheric phenomena relati ...

Finding Non-Redundant, Statistically Signi cant Regions in

A RESEARCH SUPPORT SYSTEM FRAMEWORK FOR WEB DATA

... comments. The databases provide storage for extracted web information. The key component of a web crawler is the parser, which includes a word extractor, a table extractor and a link extractor. The word extractor is used to extract word information. It should provide string checking functions. Table ...

View PDF - CiteSeerX

References

... new layers will improve detection of fraudulent applications because the detection system can detect more types of attacks, better account for changing legal behaviour, and remove the redundant attributes. The CD and SD algorithms, which monitor the significant increase or decrease in amount of some ...

The KDD process for extracting useful knowledge from volumes of

... The KDD process is outlined in Figure 1. (We did not show all the possible arrows to indicate that loops can, and do, occur between any two steps in the process; also not shown is the system's performance element, which uses knowledge to make decisions or take actions.) The KDD process is interactiv ...

Data Mining and Cluster Organisations

... Even though the concept of clusters received a considerable amount of attention, the literature dedicated to cluster organisations is still very scarce. On the other hand, the widely applicability of data mining to several industries, along with the benefits that it might bring to any organisation, ...

The Great Time Series Classification Bake Off

... There are a group of algorithms that are based on the first order differences of the series, a0i = ai −ai+1 . Various methods that have used a form of differences have been described [19], but the most successful approaches combine distance in the time domain and the difference domain. Complexity In ...

IJCSI International Journal of Computer Science Issues, Vol. 8, Issue

Full Text - Research Publications

The Early Warning Project: Prewarning and Avoiding Problems and Costly Downtime in Complex Industrial Processes

DATA MINING LAB MANUAL Index S.No Experiment Page no

... Step2: Next we select the “classify” tab and click “choose” button t o select the “j48”classifier. Step3: Now we specify the various parameters. These can be specified by clicking in the text box to the right of the chose button. In this example, we accept the default values. The default version doe ...

Steven F. Ashby Center for Applied Scientific Computing Month DD

... Data mining provides earth scientist with tools that allow them to spend more time choosing and exploring interesting families of hypotheses. – By applying the proposed data mining techniques, some of the steps of hypothesis generation and evaluation will be automated, facilitated and improved. ...

Learning Approximate Sequential Patterns for Classification

... space. We describe a clustering method based on a 2-approximate solution of the k-center problem to achieve this goal. In addition to LSH and clustering, we also draw upon sequential statistical methods to make the search for interesting patterns more efficient. The process of identifying patterns w ...

Spatial Data Mining by Decision Trees

... the C4.5 algorithm for spatial data, based on two different approaches Join materialization and Querying on the fly the different tables. Similar works have been done on these two main approaches, the first - Join materialization - favors the processing time in spite of memory space, whereas the sec ...

Cassisi et al InTech

... fundamentals metric properties: non-negativity, symmetry and triangle inequality [29]. In most cases, a metric function is desired, because the triangle inequality can then be used to prune the index during search, allowing speed-up execution for exact matching [28]. In every way, Euclidean distance ...

Massimo Poesio: Text Categorization and

... Wednesday 25th May and the details are as follows Who: Dave Robertson Title: Formal Reasoning Gets Social Abstract: For much of its history, formal knowledge representation has aimed to describe knowledge independently of the personal and social context in which it is used, with the advantage that w ...

Mining Association Rules in OLAP Cubes

... Extended association rules [8] consist of repetitive predicates by involving attributes from user defined non-item dimensions. Tjioe and Taniar [9] extract associations from multiple dimensions by focusing on summarized data. They prepare multidimensional data for the mining process by pruning rows ...

A SURVEY OF STREAM DATA MINING

Improving Classification Accuracy with Discretization on Datasets

Agathe Merceron Educational Data Mining / Learning Analytics

... !   Lexical features: unigram, word ordering, punctuation. !   Dialog-context features: position in the dialog, length, author of previous message (tutor, student), etc.. !   Task features: task before the utterance (writing, compiling), status of most recent coding action, etc.. !   Posture feature ...

Association Rule Mining: An Overview

... encoding of attributes in a record, the enumeration of subset of attributes requires m*2n computational steps. For small value of n traditional algorithms are simple and efficient but for large values of n the computational analysis is infeasible[9]. While the general association rule model describe ...

Classification Performance Using Principal Component Analysis

... former methodology is named feature selection, while the latter is called feature extraction, and it includes linear (PCA, Independent Component Analysis (ICA) etc.) and non-linear feature extraction methods. Finding new features subset are usually intractable and many problem related to feature ext ...

Epsilon Grid Order: An Algorithm for the Similarity Join on

... facilitate the search by similarity, multidimensional feature vectors are extracted from the original objects and organized in multidimensional access methods. The particular property of this feature transformation is that the Euclidean distance between two feature vectors corresponds to the (dis-) ...

< 1 ... 64 65 66 67 68 69 70 71 72 ... 264 >

Cluster analysis

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, and bioinformatics.Cluster analysis itself is not one specific algorithm, but the general task to be solved. It can be achieved by various algorithms that differ significantly in their notion of what constitutes a cluster and how to efficiently find them. Popular notions of clusters include groups with small distances among the cluster members, dense areas of the data space, intervals or particular statistical distributions. Clustering can therefore be formulated as a multi-objective optimization problem. The appropriate clustering algorithm and parameter settings (including values such as the distance function to use, a density threshold or the number of expected clusters) depend on the individual data set and intended use of the results. Cluster analysis as such is not an automatic task, but an iterative process of knowledge discovery or interactive multi-objective optimization that involves trial and failure. It will often be necessary to modify data preprocessing and model parameters until the result achieves the desired properties.Besides the term clustering, there are a number of terms with similar meanings, including automatic classification, numerical taxonomy, botryology (from Greek βότρυς ""grape"") and typological analysis. The subtle differences are often in the usage of the results: while in data mining, the resulting groups are the matter of interest, in automatic classification the resulting discriminative power is of interest. This often leads to misunderstandings between researchers coming from the fields of data mining and machine learning, since they use the same terms and often the same algorithms, but have different goals.Cluster analysis was originated in anthropology by Driver and Kroeber in 1932 and introduced to psychology by Zubin in 1938 and Robert Tryon in 1939 and famously used by Cattell beginning in 1943 for trait theory classification in personality psychology.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Cluster analysis