Download Q. Mei, Discovering Evolutionary Theme Patterns from Text

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Discovering Evolutionary Theme Pattern from Text
Qiaozhu Mei
University of Illinois at Urbana-Champaign
• Many textual collections bear some kinds of time stamps, which potentially suggests temporal patterns
• Existing work in text mining has conceptually focused on one flat collection of text thus is inadequate for temporal text mining
• We aim at developing methods for discovering evolutionary theme patterns from text
Methodology
• Modeling content
evolution of themes:
Research
In US
Aid from
UN
Extracting
global
significant
themes
Personalizad Experiences
Aid for Children
7000
•Use KL Divergence
Statistics
Aid from the world
6000
5000
4000
3000
2000
1000
•HMM:
• SIGIR Full-Texts from 1978 to 2004
Parameters can be estimated with an EM algorithm
Lessons and
Research inspired
Aid and supports
from the world
DC in
Japan
Donation Events
1
29
b.
Fe
26
n.
Ja
23
n.
Ja
20
n.
Ja
17
Ja
n.
14
n.
ec
.
D
D
(III)
Global
Theme
life
cycles of
Tsunami
reports
from
CNN
(IV)
Global
Theme
life
cycles of
Tsunami
reports
from
XINHUA
Time Offsets(days)
0.02
Normalized Strength of Theme
• KDD Abstracts from 1999 to 2004
ec
.
24
0
•Fix the states and output probabilities
• Tsunami News Data from 10 sources
DC In
Hong Kong
Research
8000
•Evaluating Theme Transitions:
•Train the transition probabilities with Baum-Welch algorithm,
with the whole collection as example sequence
Research
In Japan
9000
•Assume that each document is generated by multiple themes
• Experiments:
Feb 08th
• Example: Theme Life Cycles
Ja
Theme Extraction: A Mixture Model
Jan 31st
Political
Criticism
(II) Theme Evolution Graph and threads of Tsunami data set
11
t
…
Aid from U.S.
n.
Theme Life Cycles
• Use a Probabilistic Mixture Model, estimating parameters
with EM algorithm (each theme is a probability distribution, or
unigram language model)
Research
In China
Aid from
Donation
UK
Match
Donation Concerts (DC)
in UK
•Theme Extraction:
s
Jan 23th
Personal Experience from Survivors
New theme
• Strength Change of themes:
Jan 15th
Statistics of death,
Loss and damage
Ja
Collection with
time stamps
Jan 05th
8
….
Dec 24th
n.
…
Theme threads
Theme Evolution
Graph
θ3
• Example: Theme Evolution Graphs
27
…
ending theme
…
…
B
Model
theme
shifts with
HMM
ec
.
1,k
…
θ2
Partitioning
n,1
3,1
…
Theme
sects
extraction
θ1
(I) Global Significant Themes in KDD Abstract Data Set
D
1,2
2,1
Collection with time
stamps
t
…
Decoding
Collection
Theme Strength
1,1
T
…
Ja
• Model Content evolution of themes:
t
5
T
…
3k
Theme Life cycles
n.
….
13
22
s
Ja
• Application: News, Literature, Email, Customer Review, etc
Theme
transitions
12
21
31
2
• Supporting navigation and inferences with time order
11
t
n.
• e.g. Revealing Research Trends
Theme
Evolution
Graph
Computing
Strength and
n.
• Discovering of implicit temporal patterns
• Example: Theme Representations
Ja
• Summarization of topics with temporal structure
• Strength change of themes:
30
• What ETPs can help:
Experiment Results
Ja
Evolutionary Theme Patterns
Biology Data
0.018
Web Information
0.016
Time Series
0.014
Classification
Association Rule
0.012
Clustering
0.01
Bussiness
0.008
0.006
0.004
0.002
0
1999
2000
2001
2002
Time (year)
2003
2004
(V) Global
Themes life
cycles of
KDD
Abstracts
(VI)
Global
Theme
life
cycles of
SIGIR
full-texts
Related documents