
Mining Interval Time Series
... If A1 and A2 and … and Ah occur within V units of time, then B occurs within time T. This rule format is different from the containment relationship defined in the current paper. The mining strategies are also different. The technique in [7] uses a sliding window to limit the comparisons to only the ...
... If A1 and A2 and … and Ah occur within V units of time, then B occurs within time T. This rule format is different from the containment relationship defined in the current paper. The mining strategies are also different. The technique in [7] uses a sliding window to limit the comparisons to only the ...
Using text clustering to predict defect resolution time: a conceptual
... Five different algorithms are tested, and logistic regression yielded the best results and provided the best prediction accuracy, i.e., 34.9 %, for the defect reports in the test set. The author concludes that “there are other attributes or metrics that may have greater influence of the resolution t ...
... Five different algorithms are tested, and logistic regression yielded the best results and provided the best prediction accuracy, i.e., 34.9 %, for the defect reports in the test set. The author concludes that “there are other attributes or metrics that may have greater influence of the resolution t ...
New Method to Improve Mining of Multi
... Marwa Fouad Al-Rouby Abstract Class imbalance is one of the challenging problems for data mining and machine learning techniques. The data in real-world applications often has imbalanced class distribution. That is occur when most examples are belong to a majority class and few example belong to a m ...
... Marwa Fouad Al-Rouby Abstract Class imbalance is one of the challenging problems for data mining and machine learning techniques. The data in real-world applications often has imbalanced class distribution. That is occur when most examples are belong to a majority class and few example belong to a m ...
Discovering Frequent Closed Itemsets for Association Rules
... reduced to the problem of determining frequent itemsets and their support. Recent works demonstrated that the frequent itemset discovery is also the key stage in the search for episodes from sequences and in nding keys or inclusion as well as functional dependencies from a relation [12]. All existi ...
... reduced to the problem of determining frequent itemsets and their support. Recent works demonstrated that the frequent itemset discovery is also the key stage in the search for episodes from sequences and in nding keys or inclusion as well as functional dependencies from a relation [12]. All existi ...
An Architecture for High-Performance Privacy-Preserving
... services to ensure flexibility and extensibility. This dissertation first develops a comprehensive example algorithm, a privacy-preserving Probabilistic Neural Network (PNN), which serves a basis for analysis of the difficulties of DDM/PPDM development. The privacy-preserving PNN is the first such ...
... services to ensure flexibility and extensibility. This dissertation first develops a comprehensive example algorithm, a privacy-preserving Probabilistic Neural Network (PNN), which serves a basis for analysis of the difficulties of DDM/PPDM development. The privacy-preserving PNN is the first such ...
Hybrid Self-Organizing Modeling System based on GMDH
... The Group Method of Data Handling (GMDH) was invented by A.G. Ivakhnenko in the late 1960s [18]. He was looking for computational instruments allowing him to model real world systems characterized by data with many inputs (dimensions) and few records. Such ill-posed problems could not be solved trad ...
... The Group Method of Data Handling (GMDH) was invented by A.G. Ivakhnenko in the late 1960s [18]. He was looking for computational instruments allowing him to model real world systems characterized by data with many inputs (dimensions) and few records. Such ill-posed problems could not be solved trad ...
Chi-square-based Scoring Function for Categorization of MEDLINE
... with the SVM penalty parameter C were optimized by nested cross-validation over d values {1, 2, 3} and C values {0.01, 1, 100} [27]. For each learning algorithm we conducted four experiments with the following inputs for each MEDLINE citation: i) title, ii) abstract, iii) title and abstract, and iv) ...
... with the SVM penalty parameter C were optimized by nested cross-validation over d values {1, 2, 3} and C values {0.01, 1, 100} [27]. For each learning algorithm we conducted four experiments with the following inputs for each MEDLINE citation: i) title, ii) abstract, iii) title and abstract, and iv) ...
file (4.3 MB, pdf)
... Such systems are called OLTP systems (OnLine Transaction Processing). • The systems are mostly relational database systems designed for transaction processing. • The performance of OLTP systems is usually very important, since such systems are used to support users(i.e. staff) who provide service to ...
... Such systems are called OLTP systems (OnLine Transaction Processing). • The systems are mostly relational database systems designed for transaction processing. • The performance of OLTP systems is usually very important, since such systems are used to support users(i.e. staff) who provide service to ...
Steven F. Ashby Center for Applied Scientific
... Partitioning of data only – large number of classification tree nodes gives high communication cost ...
... Partitioning of data only – large number of classification tree nodes gives high communication cost ...
On the relationships between user profiles and navigation sessions
... The profile is made of 14 fields: the first is the nickname, i.e. a personal ID characterizing uniquely each single user, while the other 13 fields specify, respectively, the age, the gender, the spoken language, the job, the country, the zodiac sign, the favorite place to live, the favorite music, ...
... The profile is made of 14 fields: the first is the nickname, i.e. a personal ID characterizing uniquely each single user, while the other 13 fields specify, respectively, the age, the gender, the spoken language, the job, the country, the zodiac sign, the favorite place to live, the favorite music, ...
Biclustering Algorithms for Biological Data Analysis: A Survey
... According to this criterion, a perfect bicluster is a sub-matrix with variance equal to ...
... According to this criterion, a perfect bicluster is a sub-matrix with variance equal to ...
Evolutionary Model Tree Induction
... which attempt to take advantage of the unstable induction of models by growing a forest of trees from the data and later averaging their predictions. While presenting very good predictive performance, ensemble methods fail to produce a single-tree solution, operating also in a black-box fashion. We ...
... which attempt to take advantage of the unstable induction of models by growing a forest of trees from the data and later averaging their predictions. While presenting very good predictive performance, ensemble methods fail to produce a single-tree solution, operating also in a black-box fashion. We ...
Density-based Cluster Analysis for Identification of Fire Hot Spots in
... This study identified regions that are fire hot spots in Kenya’s protected areas by performing a density-based cluster analysis on the Moderate Resolution Imaging Spectroradiometer (MODIS) MCD14ML active fire data set for a 12 year period between 2003 and 2014. Feature subset selection was done usin ...
... This study identified regions that are fire hot spots in Kenya’s protected areas by performing a density-based cluster analysis on the Moderate Resolution Imaging Spectroradiometer (MODIS) MCD14ML active fire data set for a 12 year period between 2003 and 2014. Feature subset selection was done usin ...
Application Of Data Mining Technology To Support Fraud Protection
... CONCLUSION AND RECOMMENDATIONS..................................................................................................... 87 6.1 Conclusion ..................................................................................................................................................... ...
... CONCLUSION AND RECOMMENDATIONS..................................................................................................... 87 6.1 Conclusion ..................................................................................................................................................... ...
Mining Frequent Approximate Sequential Patterns.
... REPuter [15] is the closest effort toward mining frequent approximate sequential patterns under the Hamming distance model. Unfortunately, REPuter achieves its efficiency by strictly relying on the suffix tree for constant-time longest common prefix computation in seed extension. Consequently, the t ...
... REPuter [15] is the closest effort toward mining frequent approximate sequential patterns under the Hamming distance model. Unfortunately, REPuter achieves its efficiency by strictly relying on the suffix tree for constant-time longest common prefix computation in seed extension. Consequently, the t ...
Finding Cyclic Frequent Itemsets
... underlying problem is to find frequent sequential patterns in the temporal databases. Manilla et al. [16] discuss about the problem of recognizing frequent episodes in an event sequence where an episode is defined as a collection of events that occur during time intervals of a specific size. The ass ...
... underlying problem is to find frequent sequential patterns in the temporal databases. Manilla et al. [16] discuss about the problem of recognizing frequent episodes in an event sequence where an episode is defined as a collection of events that occur during time intervals of a specific size. The ass ...