Download Orthogonal Decision Trees and Beyond

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Orthogonal Decision Trees and
Beyond
Hillol Kargupta
Department of Computer Science and Electrical Engineering
University of Maryland Baltimore County
http://www.cs.umbc.edu/~hillol
hillol@cs.umbc.edu
&
AGNIK, LLC
http://www.agnik.com
hillol@agnik.com
Acknowledgement: Haimonti Dutta, Byung-Hoon Park, Rajeev Ayyagari
Roadmap
„
Introduction
„
Analysis of models and ensembles
„
„
„
„
Fourier spectrum of Decision tree ensembles
Orthogonal decision trees
Genetic code and Fourier analysis
Conclusions and future work
Research & Development at
UMBC DIADIC Laboratory and AGNIK, LLC
„
Distributed data mining and computation.
„
Supported by NASA, NSF CAREER award, US Air
Force, NSF 0083946, NSF 9803660, TRW Research
Foundation, Maryland Technology Development
Council, and others.
„
Agnik, LLC: A Spin-off from DIADIC Lab,
specializing on mobile and distributed data mining
and management.
The Story of Two Watchmakers
„
“There once was two watchmakers, named Hora and Tempus, who
manufactured very fine watches. Both of them were highly regarded,
and the phones in their workshops rang frequently. New customers
were constantly calling them. However, Hora prospered while Tempus
became poorer and poorer and finally lost his shop. What was the
reason? ………”
H. Simon, 1962, Title: Architecture of Complexity.
One could make more stable ensembles
Exploring and Engineering
Complex Systems
„
Most complex systems are ensembles
„
Examples:
– Ecology
– Large man-made complex systems
– Biology
Ecology: Ant Colony
„
Ensemble of simpler activities produce emergent
behavior
Large Engineered Complex
Systems
„
Ensemble of different functional modules
Biology: Gene Expression
DNA
Alphabet transformation (Transcription)
mRNA
Alphabet transformation (Translation)
Protein sequence
Mapping from sequence to Euclidean space
Folded protein
„
Set of representation transformations:
–
–
–
Transcription (DNA ----> mRNA)
Translation (mRNA ----> Protein
Folding of proteins
Gene Expression: An Ensemble Effect
DNA
Protein 1
Protein 2
Protein 3
„
Different portions of the DNA produce
different proteins in different cell
„
Distributed ensemble-based computation of
gene expression.
Analysis of Model-Ensembles in Science &
Engineering: Few Examples
„
K-Armed Bandit problem and allocation of trial
problem in an ensemble of organisms (Holland,
1975)
Schema theorem, Holland, 1975
More Examples
„
Dimensional Analysis
– Describing the physical process in terms of
an ensemble of dimensionless quantities.
– Find a way to aggregate those quantities.
Another Example
„
Variational techniques and finite elements
– Example: Solving Ax=b is equivalent to minimizing
P(x)=(1/2)xTAx-xTb
– Rayleigh-Ritz principle: Choose n trial functions and
minimize over the subspace defined by the
“ensemble” of trial functions
Fourier Analysis of Complex
Models
Decision Trees
„ Genetic Code-like Transformations
„
Decision Trees
high
x3
+
large
red
+
x2
low
x1
small
blue
red
-
-
x2
blue
+
• A decision tree builds a classification tree from a labeled data-set.
• Nodes correspond to features and links correspond to feature values.
• Leaf nodes correspond to class labels
Ensemble of Decision Trees
„
Ensemble Classifiers:
–
–
–
–
„
Bagging (Breiman, 96)
Random forest (Breiman, 01)
Arcing (Breiman, 97)
SEA (Streaming Ensemble Algorithm) (Street, 2002)
Problems:
–
–
–
Large ensembles are difficult to interpret
Response time of large ensembles can be slow
Can we create a non-redundant effective ensemble?
Classification Function of Ensemble
Classifier
…
f1(x)
f2(x)
f3(x)
∑
fn(x)
Weighted
Sum
f(x) = ∑i ai fi(x)
ai : weight for Tree i
fi(x) : classification of Tree i
Decision Trees as Functions
Outlook
Outlook
Sunny
„
Normal
Yes
0
1
Wind
Wind
Yes
No
2
Rain
Overcast
Humidity
Humidity
High
Outlook
Outlook
Strong
Weak
No
Yes
Humidity
Humidity
Wind
Wind
1
1
0
0
1
Decision tree can be viewed as a numeric function.
1
0
0
1
Fourier Representation of a Decision Tree
Fourier Coefficient (FC)
Outlook
Outlook
2
f(x) =
0
1
Humidity
Humidity
Wind
Wind
1
1
0
0
1
∑j wj Ψj(x)
1
0
0
1
partition
Fourier Basis Function
Discrete Fourier Spectrum of a Decision Tree
•Very sparse representation; polynomial number of
non-zero coefficients. If k is the depth then all
coefficients involving more than k features are zero.
•Higher order coefficients are exponentially smaller
compared to the low order coefficients (Kushlewitz
and Mansour, 1990).
•Can be approximated by the coefficients with
significant magnitude.
Exponential Decay of FCs
Aggregation of Multiple Decision Trees
F1(x) = Σwj ψj (x)
F2(x) = Σwj ψj (x)
F3(x) = Σwj ψj (x)
F(x) = a1*F1(x) + a2*F2(x) + a3*F3(x)
„
Weighted average of decision trees through Fourier
analysis
Fourier Representation to
Decision Tree
Outlook
Outlook
2
?
0
1
Humidity
Humidity
Wind
Wind
1
1
0
0
1
1
0
0
1
Fourier Spectrum
Fourier Spectrum and Decision Trees
Decision Tree
Fourier Spectrum
„
Developed efficient algorithms to
– Compute Fourier spectrum of decision tree
(IEEE TKDE, SIAM Data Mining Conf., IEEE Data Mining Conf, ACM SIGKDD
Explorations)
– Compute tree from the Fourier spectrum
(IEEE Transactions on SMCB)
Fourier Spectrum and Inner
Product of Decision Trees
„
If f1(x) and f2(x) are two decision trees
and W1 and W2 be the corresponding
Fourier spectra then:
<f1(x) , f2(x)> = <W1, W2>
The Fourier Spectra Matrix
and Its Eigenanalysis
„
Consider W, where Wi,j is the Fourier
coefficient of the i-th basis from the
spectrum of the tree Tj.
„
Compute the eigenvectors and
eigenvalues of WTW.
Orthogonal Decision Trees
„
Compute Fourier spectrum of each decision
tree in the ensemble
„
PCA: Eigen analysis of the covariance matrix
„
Eigenvectors represent a Fourier spectra of a
decision tree
„
Construct a tree from each eigenvector
„
These trees are functionally orthogonal to
each other and constitute a redundancy-free
ensemble.
An Ensemble of Decision Trees
An Orthogonal Tree Generated
from the Ensemble
Attribute 15
Attribute 10
Attribute 19
0
0
1
1
Experimental Results: SPECT Data
„
Single Proton Emission Computer Tomography (SPECT)
image data from UC Irvine Repository.
„
267 images, 22 binary features, Boolean classification
Method of Classification
Tree Complexity
Error Percentage
C4.5
13
24.59%
Bagging (40 trees)
202
20.85%
Aggregated Fourier
Trees
3
19.78%
Orthogonal Decision
Trees
3
8.02%
Comparing Random Forests and ODT Ensembles
Method of classification
Error percentage
Random Forest(40 Trees)
23.2
ODT(projection onto the first
principle component capturing
99.67% of variance)
9.09
ODTs(projection onto 40 eigen
vectors)
9.09
Method of Classification
Random Forest(40 Trees)
Tree Complexity
(number of nodes in all trees
in ensemble)
322
ODT(projection onto the first
principle component capturing
99.67% of variance)
7
ODTs(projection onto 40 eigen
vectors)
120
The ODT formed by projection onto the most dominant eigen vector
performs as good as an ensemble of 40 different ODT trees !
Experimental Results: NASDAQ Data
„
Discretized NASDAQ Data.
„
99 stocks for predicting ups and downs in Yahoo stock.
Method of
Classification
Tree Complexity
Error Percentage
C4.5
103
12.6%
Bagging (60 trees)
92.85
11.2%
Aggregated Fourier
Trees
33
9.2%
Orthogonal Decision
Trees
5
9.2%
Experimental Results: DNA Data
„
DNA Data from UC Irvine Repository.
Method of
Classification
Tree Complexity
Error Percentage
C4.5
131
6.5%
Bagging (10 trees)
34
8.9%
Aggregated Fourier
Trees
3
8.3%
Dominant Orthogonal
Decision Tree
3
10.2%
Experimental Results: House of Votes Data
„House
„
of votes data from UC Irvine Repository
435 instances, 16 Boolean valued attributes
Method of Classification
Tree Complexity
Error Percentage
C4.5
9
8.0%
Bagging (40 trees)
79
11.0%
Aggregated Fourier
Trees
5
11.0%
Orthogonal Decision
Trees
15
11.0%
Haar DWT for Representing
Classifiers
„
R. Mulvaney and D. Phatak. (2003).
"Multiclass Multidimensional Modified
Haar DWT for Classification Function
Representation and its Realization via
Fast Fixed-Depth Network" .
Observations
„
Orthogonal decision trees are
– redundancy free
– functionally orthogonal to each other
– efficient but meaningful representation of
large ensemble
„
Bringing linear systems theory for
advanced analysis of classifier
ensembles
– E.g. Stability analysis of an ensemble
Fourier Analysis of Genetic Codelike Transformations
From DNA to Protein
DNA
Alphabet transformation (Transcription)
mRNA
Alphabet transformation (Translation)
Protein sequence
Mapping from sequence to Euclidean space
Folded protein
„
Set of representation transformations:
–
–
–
Transcription (DNA ----> mRNA)
Translation (mRNA ----> Protein
Folding of proteins
Genetic Code That Controls Translation
Alanine
Cysteine
Aspertic acid
Glutamic acid
Phenylalanine
Glycine
Histidine
Isoleucine
Lysine
Leucine
Methionine
Asparagine
Proline
Glutamine
Arginine
Serine
Threonine
Valine
Tryptophan
Tyrosine
STOP
GCA, GCC, GCG, GCU
UGC, UGU
GAC, GAU
GAA, GAG
UUC, UUU
GGA, GGC, GGG, GGU
CAC, CAU
AUA, AUC, AUU
AAA, AAG
UUA, UUG, CUA, CUC, CUG, CUU
AUG
AAC, AAU
CCA, CCC, CCG, CCU
CAA, CAG
AGA, AGG, CGA, CGC, CGG, CGU
AGC, AGU, UCA, UCC, UCG, UCU
ACA, ACC, ACG, ACU
GUA, GUC, GUG, GUU
UGG
UAC, UAU
UAA, UAG, UGA
Genetic Code-like Transformations (GCTs)
„
η maps every feature in x to c features in
the new representation.
„
η: Xn → Xcn is defined by a code book.
„
η is a genetic code-like transformation
(GCT).
An Example GCT
x
z1z2z3
1
1
1
1
100
011
001
110
0
0
0
0
111
101
010
000
Codon
• Every variable x in the Xn is mapped to c variables in Xcn
• Redundant Code; codon size c.
Illustration
Ttranslation Induced
Equivalence (TIE) class
11
100 100
100 011
100 001
100 110
….
110 110
Examples
A 4 bit Mapping
A 3 bit Mapping
A 2 bit Mapping
x
z1z2
1
1
00
11
0
0
01
10
x
z1z2z3
1
1
1
1
001
110
100
011
0
0
0
0
010
101
111
000
x
z1z2z3z4
1
1
1
1
1
1
1
1
0010
1101
1000
0111
0101
1010
1001
0110
0
0
0
0
0
0
0
0
0000
1111
0001
1110
0011
1100
0100
1011
What is the Effect of GCT?
„
Consider some function f(x) in a given
representation
„
What is the effect of a GCT on the
representation of f(x)?
„
Is f(η(x)) “interesting”?
Discrete Fourier Analysis of
Randomized GCTs
„
Higher order Fourier coefficients become less
significant at an exponential rate
„
A “linearizing” effect of randomized GCTs
„
GCTs appear to make functions “more linear”.
For more details: H. Kargupta, R. Ayyagari, and S. Ghosh. (2003). Learning Functions Using Randomized
Expansions: Probabilistic Properties and Experimentations. IEEE Transaction on Knowledge and Data Engineering,
Volume 16, Number 8, pages 894-908.
A Single Perceptron
f(x)
θ
w1
x1
„
„
„
„
w2
x2
Linear classifier; learning algorithm with convergence proof.
Can learn functions with only order-1 Fourier coefficients
Fourier spectrum of a two bit XOR has a constant and
order-2 coefficients
So a perceptron cannot learn XOR
Performance of Perceptron on XOR
Classification error vs. Size of the XOR problem
Perceptron + 2-bit Codons
Classification error vs. Size of the XOR problem
Perceptron + 3-bit Codons
Classification error vs. Size of the XOR problem
Perceptron + 4-bit Codons
Classification error vs. Size of the XOR problem
Conclusions
„
Ensembles play a fundamental role in
many physical processes
„
Analyzing ensemble properties may
provide deeper understanding
„
This may require development of an
algebra for ensembles
References
„
H. Kargupta, R. Ayyagari, and S. Ghosh. (2003).
Learning Functions Using Randomized Expansions:
Probabilistic Properties and Experimentations. IEEE
Transaction on Knowledge and Data Engineering,
Volume 16, Number 8, pages 894-908.
„
H. Kargupta, B. Park, and H. Dutta. (2004).
Orthogonal Decision Trees. ICDM’04. (extended
version in communication)
„
Kargupta and Park. (2003). Mining data streams from
mobile devices using Fourier spectrum of decision
trees. IEEE Transactions of Knowledge and Data
Engineering. Volume 6, Number 2, pages 216-229.
Brief Bio of Hillol Kargupta
Hillol Kargupta is an Associate Professor at the Department of Computer Science and
Electrical Engineering, University of Maryland Baltimore County. He received his Ph.D. in
Computer Science from University of Illinois at Urbana-Champaign in 1996. He is also a cofounder of AGNIK LLC, a ubiquitous data intelligence company. His research interests
include mobile and distributed data mining and computation in gene expression.
Dr. Kargupta won a National Science Foundation CAREER award in 2001 for his research on
ubiquitous and distributed data mining. He along with his co-authors received the best paper
award in the 2003 IEEE International Conference on Data Mining for a paper on privacypreserving data mining. He won the 2000 TRW Foundation Award and the 1997 Los Alamos
Award for Outstanding Technical Achievement. His dissertation earned him the 1996 Society
for Industrial and Applied Mathematics (SIAM) annual best student paper prize.
He has published more than eighty peer-reviewed articles in journals, conferences, and books.
He is an associate editor of the IEEE Transaction on Knowledge and Data Engineering and the
IEEE Transactions on Systems, Man, Cybernetics, Part B. He served as the Associate General
Chair of the 2003 ACM SIGKDD Conference. He is also the Program Co-Chair of the 2005
SIAM Data Mining Conference and the vice-chair for the 2005 IEEE International Conference
on Data Mining. He has co-edited two books: (1) Advances in Distributed and Parallel
Knowledge Discovery, AAAI/MIT Press, and (2) Data Mining: Next Generation Challenges
and Future Directions, AAAI/MIT Press. He is in the program committee of almost every
major data mining conference (e.g. ACM, IEEE, SIAM). He has been a member of the
organizing committee of the SIAM data mining conference every year from 2001 until 2005.
He hosted many workshops and journal special issues on distributed data mining and other
related topics. He regularly serves as an invited speaker in many international conferences and
workshops. More information about him can be found at http://www.cs.umbc.edu/~hillol.