* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Clustering Time Series Data An Evolutionary
		                    
		                    
								Survey							
                            
		                
		                
                            
                            
								Document related concepts							
                        
                        
                    
						
						
							Transcript						
					
					Clustering Time Series Data An Evolutionary Approach Monica Chiş, Soumya Banerjee, Aboul Ella Hassanien Springer 2009 Outline      Introduction Related Works: Time Series Clustering Evolutionary Time Series Clustering Algorithm (ETSC) Experimental Results Conclusions and Future Work Introduction  Time series are an ordered sequence of values of a variable at equally spaced time intervals.  Traditional time series analysis and modeling tend to be based on non-automatic and trialand-error approaches. (Cont.)  Time series arise in many important areas.  Clustering of time series has attracted the interest of researchers from a wide range of fields. Related Works: Time Series Clustering  The mining of time series data has attracted great attention in the data mining community in recent years.  There are two main categories in time series clustering:  Whole clustering  Subsequence clustering the clustering performed on many individual time series to group similar series into clusters based on sliding window extractions of a single time series and aims to find similarity and differences among different time window of a single time series. (Cont.) Clustering Approach  Hierarchical Clustering  Generality  Only small datasets  K-Means  Distance is minimized (Cont.) Time-Series  Use lower dimensionality that preserves the original information  Describes the original shape of the time-series data as closely as possible. (Cont.)  Another categorization of clustering time series is the classification in model based or model free.  Model based assume some form of the underlying generating process, estimate the model from each data then cluster based on similarity between model parameters.  Markov chain  Hidden Makov Models  Model free  Fast Fourier transforms  Wavelet transforms involve the specification of a specific distance measure and/or a transformation of the data. Evolutionary Time Series Clustering Algorithm (ETSC)  A new evolutionary algorithms focused on whole clustering of time series, called Evolutionary Time Series Clustering- ETSC is presented. (Cont.)  Apply an evolutionary algorithm for solving a problem is To find : 1. A suitable representation for candidate solutions 2. By finding a good representation for clustering time series dataset.  The fitness function will be considered a similarity measure of time series (Cont.)  Two time-series are similarity  if they have enough non overlapping time ordered subsequences that are similar.  Two subsequences are similarity  if one is enclosed within an envelope of a user defined width around another Evolutionary Time Series Clustering 選擇與複製 突變 Experimental Results Conclusions and Future Work  An evolutionary approach to perform clustering on time series data set was presented.  Some different fitness functions are tested. Hierarchical Clustering Calculate the similarity between all possible combinations of two profiles Keys • Similarity • Clustering Two most similar clusters are grouped together to form a new cluster Calculate the similarity between the new cluster and all remaining clusters. 出處:IR講義 (Cont.) 出處:IR講義 The K-Means Clustering Method  Example 10 10 9 9 8 8 7 7 6 6 5 5 10 9 8 7 6 5 4 4 3 2 1 0 0 1 2 3 4 5 6 7 8 K=2 Arbitrarily choose K object as initial cluster center 9 10 Assign each objects to most similar center 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 Update the cluster means 4 3 2 1 0 0 1 2 3 4 5 6 reassign 10 9 9 8 8 7 7 6 6 5 5 4 3 2 1 0 1 2 3 4 5 6 7 8 8 9 10 reassign 10 0 7 9 10 Update the cluster means 4 3 2 1 0 0 1 2 3 4 5 6 出處:IR講義 7 8 9 10 Evoluationary Algorithm  Evolutionary algorithms are randomized search optimization techniques guided by the principles of natural biology evolution processes (Cont.) (Cont.)  They are effective and robust search processes provide a near optimal solution of a fitness function. (Cont.)  Evolutionary algorithms are used to improve the robustness and accuracy of the more traditional techniques used in feature extraction, feature selection, classification and clustering Gene Algorithm  1975年 John Holland 學者所提出  成為非常著名的演算法  Genetic Algorithm (GA)  基因演算法  又稱: 遺傳演算法 出處:基因演算法介紹 基因演算法的概念  將問題編碼成 染色體的型式  基因: 0 or 1  染色體: x = 0 1 0 0 1 1 0  設計 適應函數 (fitness function)  f(x) :適應函數,給一 x 可得一 適應值 f(x)  適應值愈高, 表愈適應環境 出處:基因演算法介紹 基因演算法的三個基本動作  選擇(Selection)與複製 (Reproduction):  X=0100110  選擇 適應值高 之染色體  複製成多個染色體 出處:基因演算法介紹 (Cont.)  交換 (crossover):  父:0 1 0 0 1 1 0  母:1 0 1 1 0 0 1 交換 0111010 1000101 出處:基因演算法介紹 (Cont.)  突變 (mutation):  0100110 1 出處:基因演算法介紹 (Cont.) 產生初始族群 是否滿足 停止條件 否 是 產生最適個體 計算適應値 選擇 複製 交換 突變 產生下一子代 出處:基因演算法介紹 Fitness Function (1) Fitness Function (2) (Cont.)  An ordered set of m realvalued variables  The dataset could be described in matrix as follows  An individual is represented as a vector (Cont.) dj is a time series with m real-valued cj is an integer number from 1 to p  Representation indicates how time series are assigned to classes  The value cj of the gene j indicates the classes to which the object dj is assigned.  All the times series for dataset are assigned to a class (cluster).  The number of classes is determining by the evolutionary algorithms
 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            