* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download PRESENTATION NAME
Survey
Document related concepts
Transcript
Clustering Algorithms Meta
Applier (CAMA) Toolbox
Dmitry S. Shalymov
Kirill S. Skrygan
Dmitry A. Lyubimov
Clustering
• Goals
– To detect the underlying structure in data
– To reduce data set capacity
– To extract unique objects
• Usage
–
–
–
–
–
–
–
Data mining
Machine learning
Financial mathematics
Optimization
Statistics
Pattern recognition
Control strategies development
SYRCoSE’09
Clustering Problem
{x1 , x2 ,..., xn } X
( x, x)
A lg : X Y
W
i j
[ yi y j ] ( xi , x j )
i j
[ yi y j ]
min
B
i j
[ yi y j ] ( xi , x j )
i j
[ yi y j ]
max
Clustering and Classification
SYRCoSE’09
Variety of Clustering Algorithms
• Hierarchical
– Aglomerative
– Partitioning
• Iterative
– Hard (K-means, SVM, SPSA)
– Fuzzy (FCM)
Important parameters
-Distance norm
-Number of clusters
-Initial values of cluster centers
SYRCoSE’09
Cluster Stability Algorithms
• Indexes
• Stability (similarity, merit) functions
• Probabilistic measures assessing the likelihood of a
decision
• Density estimation approaches
SYRCoSE’09
Stochastic Approximation
* : L / 0
Recursive stochastic approximation
k 1 k ak g k ( k )
g ( ) L /
FDSA
y ( k ck ei ) y ( k ck ei )
g ki ( k )
2ck
SPSA
y ( k ck k ) y ( k ck k )
g ki ( k )
2ck ki
k (k1 , k 2 ,..., kp )T
SYRCoSE’09
SYRCoSE’09
Effectiveness of SPSA
SYRCoSE’09
Finding the number of clusters in data set
• Run the SPSA algorithm for different numbers of clusters, K, and
calculate the corresponding distortions d K
• Select a transformation power, Y
• Calculate the “jumps” in transformed distortion J K d
Y
K
d Y K 1
• Estimate the number of clusters in the data set by
K * arg max K J K
SYRCoSE’09
Structure of data set detection
SYRCoSE’09
Examples
• Iris (3 clusters, 4
features, 150 instances)
• Wine (3 clusters, 13
features, 178 instances)
• Breast Cancer (2
clusters, 32 features,
569 instances)
• Image Segmentation (7
clusters, 19 features,
2310 instances)
SYRCoSE’09
Software Tools for Clustering Analysis
•
Research
–
–
–
–
–
–
–
–
•
License software
–
–
•
SPSS
STATISTICA
Characteristics
–
–
–
•
COMPACT
DCPR (Data Clustering & Pattern Recognition)
FCDA (Fuzzy Clustering and Data Analysis Toolbox)
ClusterPack Matlab Toolbox
The Curve Clustering Toolbox
SOM (Self-Organizing Map)
Spectral Clustering Toolbox
Yashil's FCM Clustering
Visualization
Efectiveness analysis with patterns
Tools to check performance
Shortcomings
–
–
–
–
Limited number of data sets and algorithms
No possibilities to load own algorithm
No on-line services
MATLAB
SYRCoSE’09
Clustering Algorithms Meta Applier
SYRCoSE’09
Clustering Algorithms Meta Applier
SYRCoSE’09
CAMA. Kernel
SYRCoSE’09
CAMA. Kernel
SYRCoSE’09
CAMA Toolbox
http://ancient.punklan.net:8084/CAMA2/index.jsp
SYRCoSE’09
CAMA Toolbox
SYRCoSE’09
CAMA Toolbox
SYRCoSE’09
Thank you!
SYRCoSE’09