Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Mining Multidimensional
Sequential Patterns
over Data Streams
Chedy Raїssi and Marc Plantevit
DaWak_2008
Outlines
• Introduction
• Problem Definition
• The MDSDS Approach
• Experimental Results
• Conclusions
2
Introduction
• We propose to consider the intrinsic multidimensionality of
the streams for the extraction of more interesting sequential
patterns.
• The search space in multidimensional framework is huge.
• We only focus on the most specific abstraction level for items
instead of mining at all possible levels.
3
Problem Definition
• multidimensional item a = (d1, . . . , dm)
• * : wild-card value that can be interpreted by ALL.
• multidimensional itemset i = {a1, . . . , ak}
• multidimensional sequence s = <i1, . . . , il>
4
Cont.
• We focus on the most specific frequent items to generate the
multidimensional sequential patterns.
• E.g.
▫ If items (LA, ∗, M, ∗) and (∗, ∗, M, Wii) are frequent, we do not
consider the frequent items (LA, ∗, ∗, ∗), (∗, ∗, M, ∗) and
(∗, ∗, ∗, Wii).
5
Cont.
• Data stream DS = B0, B1, . . . , Bn
• Bi = {B1, B2, B3, ..., Bk}
B0
B1
B1
B2
B3
6
7
Cont.
• min_sup = 50%
• <{(∗, ∗, M, ∗)}>
• <{(LA, ∗, Y, ∗)}>
<{(LA, ∗, ∗, ∗)}>
<{(LA, ∗, ∗, iPid)}>
<{(∗, ∗, M, ∗)} {(∗, ∗, M, ∗)}>
<{(∗, ∗, M, Wii)}>
• specialization
The MDSDS Approach
• MDSDS extracts the most specific multidimensional items.
• MDSDS uses a data structure consisting of a prefix-tree and
tilted-time windows tables.
• The patterns are:
(1) frequent patterns,
(2) sub-frequent patterns,
(3) infrequent patterns (not stored in the prefix-tree).
8
Cont.
• Step 1 : mine the most specific multidimensional items
▫ .
10
1
5
8
11
2
3
4
6
7
9
12
13
14
15
▫ Multidimensional representation : (LA, ∗, ∗, ∗), (∗, ∗, M, ∗)
▫ Detecting the specialization or generalization.
9
Cont.
• Step 2 :
▫ Subfrequent sequences may become frequent in future
batches.
▫ Using PrefixSpan algorithm to mine efficiently the
multidimensional sequences.
10
PrefixSpan algorithm
• .
min_sup = 2
• 1. Find length-1 sequential patterns, <a>:4, <b>:4, <c>:4, <d>:3,
<e>:3, <f>:3.
• 2. Divide search space, (1) the ones having prefix<a>;…; and (6) the
ones having prefix<f>.
▫ <a>-projected database:<(abc)(ac)d(cf)>, <(_d)c(bc)(ae)>,
<(_b)(df)cb>, <(_f)cbc>.
▫ The length-2 sequential patterns <aa>:2, <ab>:4, <(ab)>:2, <ac>:4,
<ad>:2, <af>:2.
▫ …
11
Cont.
• 3. Find subsets of sequential patterns.
12
Cont.
• Step 3 :
▫ Tilted-time windows table
▫ The updating operations and pruning techniques are done
after receiving a batch from the data stream.
13
Tilted-time windows
• .
• .
14
Cont.
• .
• .
15
Experimental Results
16
Cont.
17
Cont.
18
Conclusions
• Experiments on real data gathered from TCP/IP network
traffic provide compelling evidence that it is possible to
obtain accurate and fast results for multidimensional
sequential pattern mining.
• We propose to take multidimensional framework into
account in order to detect high-level changes like trends.
19