Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Knowledge discovery & data mining
Towards KD Support Environments
Fosca Giannotti and
Dino Pedreschi
Pisa KDD Lab
CNUCE-CNR & Univ. Pisa
http://www-kdd.di.unipi.it/
A tutorial @ EDBT2000
Module outline
Data analysis and KD Support Environments
Data mining technology trends
from tools …
… to suites
… to solutions
Towards data mining query languages
DATASIFT: a logic-based KDSE
Future research challenges
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
2
Vertical applications
We outlined three classes of vertical data
analysis applications that can be tackled
using KDD & DM techniques
Fraud detection
Market basket analysis
Customer segmentation
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
3
Why are these applications challenging?
Require manipulation and reasoning over
knowledge and data at different abstraction
levels
conceptual
semantic integration of domain knowledge, expert
(business) rules and extracted knowledge
semantic integration of different analysis paradigms
logical/physical
interoperability with external components: DBMS’s, data
mining tools, desktop tools
querying/mining optimization: loose vs. tight coupling
between query language and specialized mining tools
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
4
Why are these applications challenging?
The associated KDD process
needs to be carefully
Interpretation
specified, tuned and
and Evaluation
controlled
Data Mining
Knowledge
Selection and
Preprocessing
Data
Consolidation
p(x)=0.02
Patterns &
Models
Warehouse
Prepared Data
Consolidated
Data
Data Sources
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
5
Why are these applications challenging?
Still not properly supported by available KDD
technology
what is offered:
horizontal, customizable toolkits/suites of
data mining primitives
what is needed:
KD support environments for vertical
applications
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
6
Datamining vs. traditional Sw development process
Traditional
Data mining
Focus on knowledge
transfer, design and coding
30% - analysis and design
70% - program design,
coding and testing
Prototyping - expensive
Development process has
few loops
Maintenance requires
human analysis
EDBT2000 tutorial - KDSE
Focus on data selection,
representation and search
70% - data preparation
30% - model generation
and testing
Prototyping - cheap
Development process is
inherently iterative
Maintenance requires relearning model
Konstanz, 27-28.3.2000
7
From R. Agrawal’s invited lecture @ KDD’99
Chasm
Early Market
Mainstream Market
The greatest peril in
market lies in making
market dominated by
market dominated by
EDBT2000 tutorial - KDSE
the development of a high-tech
the transition from an early
a few visionaries to a mainstream
pragmatists.
Konstanz, 27-28.3.2000
8
Is data mining in the chasm?
Perceived to be sophisticated technology,
usable only by specialists
Long, expensive projects
Stand-alone, loosely-coupled with data
infrastructures
Difficult to infuse into existing missioncritical applications
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
9
Module outline
Data analysis and KD Support Environments
Data mining technology trends
from tools …
… to suites …
… to solutions
Towards data mining query languages
DATASIFT: a logic-based KDSE
Future research challenges
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
10
Generation 1: data mining tools
~1980: first generation of DM systems
research-driven tools for single tasks, e.g.
build a decision tree - say C4.5
find clusters - say Autoclass (Cheeseman 88)
…
Difficult to use more than one tool on the
same data – lots of data/metadata
transformation
Intended user: a specialist, technically
sophisticated.
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
11
Generation 2: data mining suites
~1995: second generation of DM systems
toolkits for multiple tasks with support for
data preparation and interoperability with
DBMS, e.g.
SPSS Clementine
IBM Intelligent Miner
SAS Enterprise Miner
SFU DBMiner
Intended user: data analyst – suites require
significant knowledge of statistics and
databases
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
12
Growth of DM tools (source: kdnuggets.com)
From G. Piatetsky-Shapiro. The data-mining
industry coming of age. IEEE Intelligent Systems,
Dec. 1999.
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
13
Generation 3: data mining solutions
Beginning end of 1990s
vertical data mining-based applications and
solutions oriented to solving one specific
business problem, e.g.
detecting credit card fraud
customer retention
…
Address entire KDD process, and push
result into a front-end application
Intended user: business user – the
interfaces hid the data mining complexity
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
14
Emerging short-term technology trends
Tighter interoperability by means of standards
which facilitate the integration of data mining
with other applications:
KDD process, e.g. the Cross-Industry Standard
Process for Data Mining model (www.crisp-dm.org)
representation of mining models: e.g., the PMML predictive modeling markup language (www.dmg.org)
DB interoperability: the Microsoft OLE DB for data
mining interface
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
15
Approaches in data mining suites
Database-oriented approach
IBM Intelligent Miner
OLAP-based mining
DBMiner - Jiawei Han’s group @ SFU
Machine learning
CART, ID3/C4.5/C5.0, Angoss Knowledge Studio
Statistical approaches
The SAS Institute Enterprise Miner.
Visualization approach:
SGI MineSet, VisDB (Keim et al. 94).
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
16
Other approaches in data mining suites
Neural network approach:
Cognos 4thoughts, NeuroRule (Lu et al.’95).
Deductive DB integration:
KnowlegeMiner (Shen et al.’96)
Datasift (Pisa KDD Lab - see refs).
Rough sets, fuzzy sets:
Datalogic/R, 49er
Multi-strategy mining:
INLEN, KDW+, Explora
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
17
SFU DBMiner: OLAP-centric mining
Active Object
Elements
Warehouse
Workplace
Active
Object
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
18
IBM Intelligent Miner – DB-centric mining
Contents
Container
Mining Base
Container
Work
Area
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
19
IBM – IM architecture
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
20
Angoss Knowledge Studio: ML-centric mining
Work
Area
Project
Outline
Additional
Visualizations
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
21
KS project outline tool
(Limited) support to
the KDD process
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
22
Support for data consolidation step
DBMiner
ODBC databases – SQL + SmartDrives
Single database – multiple tables
Consolidation of heterogeneous sources
unsupported
Intelligent Miner
DB2 and text – SQL without SmartDrives
Multiple databases
Consolidation of heterogeneous sources supported
Knowledge Studio
ODBC databases and text
Single table
Consolidation of heterogeneous sources
unsupported
Konstanz, 27-28.3.2000
EDBT2000
tutorial - KDSE
23
Support for selection and preprocessing
DBMiner
SQL only
Intelligent Miner
SQL + standard and advanced
statistical functionalities
Knowledge Studio
descriptive statistics
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
24
Support for data mining step
Knowledge Studio
DBMiner
Decision trees
Clustering
Prediction
Association rules
Decision trees
Prediction
Intelligent Miner
Associations rules
Sequential patterns
Clustering
Classification
Prediction
Similar time series
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
25
Support for interpretation and evaluation
Predefined interestingness measures
Emphasis on visualization
Limited export capability of analysis results
Gain charts for comparison of predictive
models (KS and IM)
Limited model combination capabilities (KS)
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
26
Module outline
Data analysis and KD Support Environments
Data mining technology trends
from tools …
… to suites …
… to solutions
Towards data mining query languages
DATASIFT: a logic-based KDSE
Future research challenges
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
27
Data Mining Query Languages
A DMQL can provide the ability to support
ad-hoc and interactive data mining
Hope: achieve the same effect that SQL
had on relational databases.
Various proposals:
DMQL (Han et al 96)
mine operator (Meo et el 96)
M-SQL (Imielinski et al 99)
query flocks (Tsur et al 98)
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
28
MINE operator of (Meo et al 96)
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
29
References - DMQL
J. Han, Y. Fu, W. Wang, K. Koperski, and O. R. Zaiane. DMQL: A Data Mining
Query Language for Relational Databases. In Proc. 1996 SIGMOD'96 Workshop on
Research Issues on Data Mining and Knowledge Discovery (DMKD'96), pp. 27-33,
Montreal, Canada, June 1996.
R. Meo, G. Psaila, S. Ceri. A New SQL-like Operator for Mining Association Rules. In
Proc. VLDB96, 1996 Int. Conf. Very Large Data Bases, Bombay, India, pp. 122-133,
Sept. 1996.
T. Imielinski and A. Virmani. MSQL: a query language for database mining. Data
Mining and Knowledge Discovery, 3:373-408, 1999.
S. Tsur, J. Ulman, S. Abiteboul, C. Clifton, R. Motwani, S. Nestorov. Query flocks:
a generalization of association rule mining. In Proc. 1998 ACM-SIGMOD, p. 1-12,
1998.
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
30
Module outline
Data analysis and KD Support Environments
Data mining technology trends
from tools …
… to suites …
… to solutions
Towards data mining query languages
DATASIFT: a logic-based KDSE
Future research challenges
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
31
DATASIFT - towards a logic-based KDSE
DATASIFT is LDL++ (Logic Data Language,
MCC & UCLA) extended with mining
primitives
(decision trees & association rules)
LDL++ syntax: Prolog-like deductive rules
LDL++ semantics: SQL extended with
recursion (and more)
Integration of deduction and induction
Employed to systematically develop the
methodology for MBA and audit planning
See Pisa KDD Lab references
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
32
Our position
A suitable integration of
deductive reasoning (logic database languages)
inductive reasoning (association rules & decision
trees)
provides a viable solution to high-level
problems in knowledge-intensive data
analysis applications
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
33
Our goal
Demonstrate how we support design and
control of the overall KDD process and the
incorporation of background knowledge
data preparation
knowledge extraction
post-processing and knowledge evaluation
business rules
autofocus datamining
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
34
With respect to other DMQL’s
extending logic query languages yields extra
expressiveness, needed to bridge the gap
between
data mining (e.g., association rule mining)
vertical applications (e.g., market basket
analysis)
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
35
Architecture - client agent
User interface
Access to business rules and visualization
of results through
web browser to control interaction
MS Excel objects (sheets and charts) to
represent output of analysis (association rules)
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
36
Architecture - server agent
A query engine (mediator)
record previous analyses
Metadata/meta knowledge
interaction with other components
LDL++ server
extended with external calls to DBMSs and to …
Inductive modules
Apriori
classifiers (decision trees)
Coupling with DBMS using the Cache-mine approach
Performance comparable with SQL-based approaches
on same mining queries (Giannotti at el 2000)
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
37
Deductive rules in LDL++
A small database of cash register transactions
basket(1,fish).
basket(1,bread).
basket(2,bread).
basket(2,milk).
basket(2,onions).
basket(2,fish).
basket(3,bread).
basket(3,orange).
basket(3,milk).
E.g.: select transactions involving milk
milk_basket(T,I) basket(T,I),basket(T,milk).
Querying ?- milk_basket(T,I)
milk_basket(2,bread).
milk_ basket(2,milk).
milk_ basket(2,onions).
milk_ basket(2,fish).
EDBT2000 tutorial - KDSE
milk_basket(3,bread).
milk_basket(3,orange).
milk_basket(3,milk).
Konstanz, 27-28.3.2000
38
Aggregates in LDL++
A small database of cash register transactions
basket(1,fish).
basket(1,bread).
basket(2,bread).
basket(2,milk).
basket(2,onions).
basket(2,fish).
basket(3,bread).
basket(3,orange).
basket(3,milk).
E.g.:
count occurrences of pairs of distinct
aggregate
items in all transactions
pair(I1,I2,count<T>) basket(T,I1),basket(T,I2),I1 I2.
Querying ?- pair(fish,bread,N)
pair(fish,bread,2) (i.e., N=2)
Aggregates are the logical interface between
deductive and inductive environment.
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
39
Association rules in LDL++
basket(1,fish).
basket(1,bread).
basket(2,bread).
basket(2,milk).
basket(2,onions).
basket(2,fish).
basket(3,bread).
basket(3,orange).
basket(3,milk).
E.g., compute one-to-one association rules with
at least 40% support
rules(patterns<0.4,0,{I1,I2}>)basket(T,I1),basket(T,I2).
patterns is the aggregate interfacing the
computation of association rules
patterns<min_supp, min_conf, trans_set>
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
40
Association rules in LDL++
basket(1,fish).
basket(1,bread).
basket(2,bread).
basket(2,milk).
basket(2,onions).
basket(2,fish).
basket(3,bread).
basket(3,orange).
basket(3,milk).
Result of the query ?- rules(X,Y,S,C)
rules({milk},{bread},0.66,1)
i.e. milk bread [0.66,1]
rules({bread},{milk},0.66,0.66)
rules({fish},{bread},0.66,1)
rules({bread},{fish},0.66,0.66)
Same status for data and induced rules
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
41
Reasoning on item hierarchies
Department
Sector
Family
Product (item)
EDBT2000 tutorial - KDSE
Which rules survive/decay
up/down the item hierarchy?
rules_at_level(I,pattern<S,C,Itemset>)
itemset_abstraction(I,Tid,Itemset).
preserved_rules(Left,Right)
rules_at_level(I,Left,Right,_,_),
rules_at_level(I+1,Left,Right,_,_).
Konstanz, 27-28.3.2000
42
Business rules: reasoning on promotions
Which rules are established by a promotion?
interval(before, -, 3/7/1998).
interval(promotion, 3/8/1998, 3/30/1998).
interval(after, 3/31/1998, +).
established_rules(Left, Right)
not rules_partition(before, Left, Right, _, _),
rules_partition(promotion, Left, Right, _, _),
rules_partition(after, Left, Right, _, _).
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
43
Business rules: temporal reasoning
How does rule support change along time?
35
30
25
20
Support
Pasta => Fresh Cheese 14
15
Bread Subsidiaries => Fresh Cheese 28
Biscuits => Fresh Cheese 14
10
Fresh Fruit => Fresh Cheese 14
Frozen Food => Fresh Cheese 14
5
05/12/97
04/12/97
02/12/97
01/12/97
30/11/97
03/12/97
EDBT2000 tutorial - KDSE
29/11/97
28/11/97
27/11/97
26/11/97
25/11/97
0
Konstanz, 27-28.3.2000
44
Decision tree construction in DATASIFT
construct training and test set using rules
training_set(P,Case_list) ...
test_tuple(ID,F1,...,F20,Rec,Act_rec,CAR)
...
construct classifier using external call to C5.0
tree_rules(Tree_name,P,PF,MC,BO,Rule_list)
training_set(P,Case_list),
tree_induction(Case_list,PF,MC,BO,Rule_list).
parameters
pruning factor PF external call
misclassification costs MC
boosting BO
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
induced classifier
45
Putting decision trees at work
prediction of target variable
prediction(Tree_name,ID,CAR,Predicted_CAR)
tree_rules(Tree_name, _ ,_ , _ , Rule_list),
test_subject(ID, F1, …, F20, _, _, CAR),
classify(Rule_list ,[F1, …, F20], Predicted_CAR).
Model evaluation: actual recovery of a classifier
(=sum recovery of tuples classified as positive)
actual_recovery(Tree_name,sum<Actual_Recovery>)
prediction(Tree_name, ID, _ , pos),
test_subject(ID, F1, …, F20, _,Actual_Recovery, _).
aggregate
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
46
Combining decision trees
Model conjunction:
tree_conjunction(T1,T2,ID,CAR,pos)
prediction(T1, ID, CAR, pos),
prediction(T2, ID, CAR, pos).
tree_conjunction (T1, T2, ID, CAR, neg)
test_subject(ID, F1, …, F20, _, _, CAR),
~ tree_conjunction(T1, T2, ID, CAR, pos).
More interesting combinations readily
expressible:
e.g. meta learning (Chan and Stolfo 93)
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
47
We proposed ...
a KDD methodology for audit planning:
define an audit cost model
monitor training- and test-set construction
assess the quality of a classifier
tune classifier construction to specific policies
and its formalization in a prototype logic-based
KDSE, supporting:
integration of deduction and induction
integration of domain and induced knowledge
separation of conceptual and implementation level
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
48
Module outline
Data analysis and KD Support Environments
Data mining technology trends
from tools …
… to suites …
… to solutions
Towards data mining query languages
DATASIFT: a logic-based KDSE
Future research challenges
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
49
A data mining research agenda
1. Integration with data warehouse and relational DB
2. Scalable, parallel/distributed and incremental mining
3. Data mining query language optimization
4. Multiple, integrated data mining methods
5. KDSE and methodological support for vertical appl.
6. Interactive, exploratory data mining environments
7. Mining on other forms of data:
spatio-temporal databases
text
multimedia
web
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
Scale up!
Scaling up existing algorithms (AI, ML, IR)
Association rules
Correlation rules
Causal relationship
Classification
Clustering
Bayesian networks
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
51
Background knowledge & constraints
Incorporating background knowledge and constraints
into existing data mining techniques
Double benefit for DMQL: semantics and optimization!
traditional algorithms
Disproportionate computational cost for selective
users
Overwhelming volume of potentially useless results
need user-controlled focus in mining process
Association rules containing certain items
Sequential patterns containing certain patterns
Classification?
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
52
Vertical applications of data mining
More success stories needed!
Current data mining systems lack a thick
semantic layer (similarly to the early
relational database systems)
Verticalized data mining systems, e.g.
Market analysis systems
Fraud detection systems
Automated mining and interactive mining: how
far are they?
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
53
Autofocus data mining
policy options, business rules
selection of data mining function
fine parameter tuning of mining function
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
54
DBMS coupling
Tight-coupling with DBMS
Most data mining algorithms are based on flat
file data (i.e. loose-coupling with DBMS)
A set of standard data mining operators
(e.g. sampling operator)
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
55
Web mining – why?
No standards on the web, enormous blob of
unstructured and heterogeneous info
Very dynamic
One new WWW server every 2 hours
5 million documents in 1995
320 million documents in 1998
Indices get obsolete very quickly
Better means needed for discovering
resources and extracting knowledge
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
56
Web mining: challenges
Today`s search engines are plagued by problems
– the abundance problem:
99% of info of no interest to 99% of people!
– limited coverage of the Web
– limited query interface based on keywordoriented search
– limited customization to individual users
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
57
Web mining
Web content mining
mining what Web search engines find
Web document classification (Chakrabarti et al 99)
warehousing a Meta-Web (Zaïane and Han 98)
intelligent query answering in Web search
Web usage mining
Web log mining: find access patterns and trends (Zaiane et
al 98)
customized user tracking and adaptive sites (Perkowitz et
al 97)
Web structure mining
discover authoritative pages: a page is important if
important pages point to it
(Chakrabarti et al 99, Kleinberg 98)
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
58
Warehousing a Meta-Web (Zaïane & Han 98)
Meta-Web: summarizes the contents and structure
of the Web, which evolves with the Web
Layer0: the Web itself
Layer1: the lowest layer of the Meta-Web
an entry: a Web page summary, including class,
time, URL, contents, keywords, popularity, weight,
links, etc.
Layer2 and up: summary/classification/clustering
Meta-Web is warehoused and incrementally updated
Querying and mining is performed on or assisted by
meta-Web
Is it feasible/sustainable? Is XML of any help?
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
59
Meta-Web
from Jiawei Han’s panel talk @ SIGMOD99
Layern
More Generalized Descriptions
...
Layer1
Generalized Descriptions
Layer0
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
60
Weblog mining
Web servers register a log entry for every single access.
A huge number of accesses (hits) are registered and collected in
an ever-growing web log.
Why warehousing/mining web logs?
Enhance server performance by learning access patterns of
general or particular users (guess what user will ask next and
pre-cache!)
Improve system design of web applications
Identify potential prime advertisement locations
Greatest peril: the privacy pitfall
See e.g. (Markoff 99) the rise of the Little Brother.
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
61
Some web mining references
M. Perkowitz and O. Etzioni. Adaptive sites: Automatically learning from user access patterns. In
Proc. 6th Int. World Wide Web Conf., Santa Clara, California, April 1997.
J. Pitkow. In search of reliable usage data on the www. In Proc. 6th Int. World Wide Web Conf.,
Santa Clara, California, April 1997.
T. Sullivan. Reading reader reaction : A proposal for inferential analysis of web server log files. In
Proc. 3rd Conf. Human Factors & the Web, Denver, Colorado, June 1997.
O. R. Zaiane, M. Xin, and J. Han. Discovering Web access patterns and trends by applying OLAP
and data mining technology on Web logs. In Proc. Advances in Digital Libraries Conf. (ADL'98), pages
19-29, Santa Barbara, CA, April 1998.
O. R. Zaiane, and J. Han. Resource and knowledge discovery in global information systems: a
preliminary design and experiment. In Proc. KDD’95, p.331-336, 1995.
O. R. Zaiane, and J. Han. WebML: querying the world-wide web for resources and knowledge. In
Proc. Int. Workshop on Web informtion and Data management (WIDM98), p. 9-12, 1998.
S. Chakrabarti, B. E. Dom, S. R. Kumar, P. Raghavan, et al. Mining the web’s link structure.
COMPUTER, 32:60-67, 1999.
S. Chakrabarti, B. E. Dom, P. Indik. Enhanced hypertext classification using hyperlinks. In Proc.
1998 ACM-SIGMOD, p. 307-318, 1999.
J. Kleinberg. Autohoritative sources in a hyperlinked environment. In Proc. ACM-SIAM Symp. on
Discrete Algorithms, 1998.
J. Markoff. The Rise of Little Brother. Upside, Apr. 1999;
http://www.upside.com/texis/mvm/story?id=36d4613c0
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
62
Pisa KDD Lab references
F. Giannotti and G. Manco. Making Knowledge Extraction and Reasoning Closer. In Proc. PAKDD'99,
The Fourth Pacific-Asia Conference on Knowledge Discovery and Data Mining, Kyoto, 2000.
F. Giannotti and G. Manco. Querying Inductive Databases via Logic-Based User Defined Aggregates.
In Proc. PKDD'99, The Third Europ. Conf. on Principles and Practice of Knowledge Discovery in
Databases. Prague, Sept. 1999.
F. Bonchi, F. Giannotti, G. Mainetto, D. Pedreschi. Using Data Mining Techniques in Fiscal Fraud
Detection. In Proc. DaWak'99, First Int. Conf. on Data Warehousing and Knowledge Discovery.
Florence, Italy, Sept. 1999.
F. Bonchi , F. Giannotti, G. Mainetto, D. Pedreschi. A Classification-based Methodology for Planning
Audit Strategies in Fraud Detection. In Proc. KDD-99, ACM-SIGKDD Int. Conf. on Knowledge
Discovery & Data Mining, San Diego (CA), August 1999.
F. Giannotti, G. Manco, D. Pedreschi and F. Turini. Experiences with a logic-based knowledge
discovery support environment. In Proc. 1999 ACM SIGMOD Workshop on Research Issues in Data
Mining and Knowledge Discovery (SIGMOD'99 DMKD). Philadelphia, May 1999.
F. Giannotti, M. Nanni, G. Manco, D. Pedreschi and F. Turini. Integration of Deduction and
Induction for Mining Supermarket Sales Data. In Proc. PADD'99, Practical Application of Data
Discovery, Int. Conference, London, April 1999.
F. Giannotti, G. Manco, M. Nanni, D. Pedreschi. Nondeterministic, Nonmonotonic Logic Databases.
IEEE Trans. on Knowledge and Data Engineering. 2000.
F. Giannotti, M. Nanni, G. Manco, D. Pedreschi and F. Turini. Using deduction for intelligent data
analysis. Submitted, 2000. http://www-kdd.di.unipi.it/
P. Becuzzi, M. Coppola, S. Ruggieri and M. Vanneschi. Parallelisation of C4.5 as a particular divide
and conquer computation. Proc.3rd Workshop on High Performance Data Mining, Springer-Verlag
LNCS, 2000.
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
63