Download Investigative Data Mining in Fraud Detection

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Investigative Data Mining in Fraud Detection
Transforming Minority Report from Science Fiction to Science
Fact
Clifton Phua
Honours Student
clifton(dot)phua(at)infotech(dot)monash(dot)edu(dot)au
2003
Overview (1)
•
Investigative Data Mining and Problems in Fraud Detection
•
•
•
Existing Fraud Detection Methods
•
•
Definitions
Technical and Practical Problems
Widely used methods
The Crime Detection Method
•
•
•
•
•
Comparisons with Minority Report
Classifiers as Precogs
Combining Output as Integration Mechanisms
Cluster Detection as Analytical Machinery
Visualisation Techniques as Visual Symbols
Investigative Data Mining in Fraud Detection
Overview (2)
•
Implementing the Crime Detection System: Preparation Component
•
•
•
•
Implementing the Crime Detection System: Action Component
•
•
•
•
Investigation objectives
Collected data
Preparation of collected data to achieve objectives
Which experiments generate best predictions?
Which is the best insight?
How can the new models and insights be deployed within an organisation?
Contributions and Recommendations
•
•
Significant research contributions
Proposed solutions
Investigative Data Mining in Fraud Detection
Literature and Acknowledgements
Dick P K (1956) Minority Report,
Orion Publishing Group, London,
Great Britain.
Abagnale F (2001) The Art of the
Steal: How to Protect Yourself
and Your Business from Fraud,
Transworld Publishers, NSW,
Australia.
Mena J (2003) Investigative Data
Mining for Security and Criminal
Detection, Butterworth
Heinemann, MA, USA.
Elkan C (2001) Magical Thinking in Data Mining:
Lessons From CoIL Challenge 2000, Department of
Computer Science and Engineering, University of
California, San Diego, USA.
Prodromidis A (1999) Management of Intelligent
Learning Agents in Distributed Data Mining
Systems, Unpublished PhD thesis, Columbia
University, USA.
Berry M and Linoff G (2000) Mastering Data Mining:
The Art and Science of Customer Relationship
Management, John Wiley and Sons, New York, USA.
Han J and Kamber M (2001) Data Mining: Concepts
and Techniques, Morgan Kaufmann Publishers.
Witten I and Frank E (1999) Data Mining: Practical
Machine Learning Tools and Techniques with Java,
Morgan Kauffman Publishers, CA, USA.
Investigative Data Mining in Fraud Detection
Investigative Data Mining
and
Problems in Fraud Detection
Investigative Data Mining in Fraud Detection
Investigative Data Mining - Definitions
•
Investigative
•
•
Data Mining
•
•
•
Official attempt to extract some truth, or insights, about criminal activity from
data
Process of discovering, extracting and analysing of meaningful patterns,
structure, models, and rules from large quantities of data.
Spans several research areas such as database, machine learning, neural
networks, data visualisation, statistics, and distributed data mining.
Investigative Data Mining
•
•
•
Applied to law enforcement,
Industry, and
Private databases
Investigative Data Mining in Fraud Detection
Fraud Detection - Definitions
•
Fraud
•
•
Diversity of Fraud
•
•
•
•
•
•
Criminal deception, use of false representations to obtain an unjust advantage,
or to injure the rights and interests of another
Against organisations, governments, and individuals
Committed by external parties, internal management, and non-management
employees
Caused by customers, service providers, and suppliers
Prevalent in insurance, credit card, and telecommunications
Most common in automobile, travel, and household contents
Cost of Fraud
•
Automobile insurance fraud alone – AUD$32 million for nine Australian companies
Investigative Data Mining in Fraud Detection
Fraud Detection Problems - Technical
•
Imperfect data
•
•
•
Highly skewed data
•
•
•
Usually not collected for data mining
Inaccurate, incomplete, and irrelevant data attributes
Many more legitimate than fraudulent examples
Higher chances of overfitting
Black-box predictions
•
Numerical outputs incomprehensible to people
Investigative Data Mining in Fraud Detection
Fraud Detection Problems - Practical
•
Lack of domain knowledge
•
•
•
Great variety of fraud scenarios over time
•
•
•
Important attributes, likely relationships, and known patterns
Three types of fraud offenders and their modus operandi
Soft fraud – Cost of investigation > Cost of fraud
Hard fraud – Circumvents anti-fraud measures
Assessing data mining potential
•
Predictive accuracy are useless for skewed data sets
Investigative Data Mining in Fraud Detection
Existing Fraud Detection Methods
Investigative Data Mining in Fraud Detection
Widely Used Methods in Fraud Detection
•
Insurance Fraud
•
•
•
Credit Card Fraud
•
•
Cluster detection -> decision tree induction -> domain knowledge, statistical
summaries, and visualisations
Special case: neural network classification -> cluster detection
Decision tree and naive Bayesian classification -> stacking
Telecommunications Fraud
•
Cluster detection -> scores and rules
Investigative Data Mining in Fraud Detection
The Crime Detection Method
Investigative Data Mining in Fraud Detection
Comparisons with Minority Report
•
Precogs
•
•
•
Integration Mechanisms
•
•
Combine predictions
Analytical Machinery
•
•
•
Foresee and prevent crime
Each precog contains multiple classifiers
Record, study, compare, and represent predictions in simple terms
Single “computer”
Visual Symbols
•
•
Explain the final predictions
Graphical visualisations, numerical scores, and descriptive rules
Investigative Data Mining in Fraud Detection
The Crime Detection Method
Precog
P1 = L1(D)
Main Predictions + Predictions
Examples and
Instances
D
Precog
P2 = L2(D)
Precog
P1 = L1(P1, P2, P3)
Precog
P3 = L3(D)
Attribute Selection
Main Predictions
Final Predictions
Graphs and Scores
Analytical Machinery
CL = L4(D)
Visual Symbols
Rules
Investigative Data Mining in Fraud Detection
Classifiers as Precogs
•
Precog One: Naive Bayesian Classifiers
•
•
•
•
Precog Two: C4.5 Classifiers
•
•
•
•
Computer metaphor
Explain patterns and quite fast
Scalability and efficiency issues*
Precog Three: Backpropagation Classifiers
•
•
•
Statistical paradigm
Simple and Fast
Redundant and not normally distributed attributes*
Brain metaphor
Long training times and extensive parameter tuning*
Advantages and disadvantages
*For details on how the problems were tackled, please refer to the thesis
Investigative Data Mining in Fraud Detection
Combining Output as Integration Mechanisms
•
Cross Validation
Divides training data into eleven data partitions
Each data partition used for training, testing, and evaluation once*
Slightly better success rate
•
•
•
•
Bagging
Unweighted majority voting on each example or instance
Combine predictions from same algorithm or different algorithms*
Increases success rate
•
•
•
1
2
3
4
5
6
7
8
9
10
11
Main Prediction
fraud
fraud
legal
fraud
legal
fraud
legal
fraud
fraud
legal
fraud
fraud
fraud
fraud
fraud
legal
legal
fraud
legal
legal
legal
fraud
legal
legal
*For details on how the technique works, please refer to the thesis
Investigative Data Mining in Fraud Detection
Combining Output as Integration Mechanisms
•
Stacking
•
•
•
Meta-classifier
Base classifiers present predictions to meta-classifier*
Determines the most reliable classifiers
1
Partition
1
Partition
2
Partition
3
Naive Bayesian
Algorithm
1
C4.5
Algorithm
3 NB
Classifi
erss
2
3 NB
Predictio
ns
3 C4.5
Predictio
ns
3 C4.5
Classifi
erss
Backpropagation
Algorithm
3
3 BP
Predictio
ns
3 BP
Classifi
erss
3
4
MetaClassifi
er
Naive Bayesian
Algorithm
4
Combined
Training
Data
*For details on how the technique works, please refer to the thesis
Investigative Data Mining in Fraud Detection
Combining Output as Integration Mechanisms
•
Stacking (2)
1
3 NB
Classifi
erss
2
3 NB
Predictio
ns
3 C4.5
Predictio
ns
3 C4.5
Classifi
erss
Score
Data Set
3 BP
Predictio
ns
3 BP
Classifi
erss
Final
Predictio
n
Investigative Data Mining in Fraud Detection
4
4
MetaClassifi
er
Combined
Training
Data
3
Cluster Detection as Analytical Machinery
Visualisation Techniques as Visual Symbols
•
Analytical Machinery: Self Organising Maps
•
•
•
•
Clusters high dimensional elements into more simple, low dimensional maps
Automatically groups similar instances together
Do not specify an easy-to-understand model*
Visual Symbols: Classification and Clustering Visualisations
•
Classification visualisation
•
Clustering visualisation
– confusion matrix
- naive Bayesian visualisation
- column graph
*For details on how the problems were tackled, please refer to the thesis
Investigative Data Mining in Fraud Detection
Steps in the Crime Detection Method
Precog
P1 = L1(D)
Main Predictions + Predictions
Examples and
Instances
D
Precog
P2 = L2(D)
Precog
P1 = L1(P1, P2, P3)
Precog
P3 = L3(D)
Attribute Selection
Main Predictions
Final Predictions
Graphs and Scores
Analytical Machinery
CL = L4(D)
Visual Symbols
Rules
Investigative Data Mining in Fraud Detection
Implementing the
Crime Detection System:
Preparation Component
Investigative Data Mining in Fraud Detection
The Crime Detection System
Preparation Component
Data Understanding
Problem Understanding
Data Preparation
Crime Detection Method
Data
Modelling
Deployment
Evaluation
Investigative Data Mining in Fraud Detection
Action Component
The Crime Detection System: Preparation Component
•
Problem Understanding
•
•
•
•
Determine investigation objectives
- Choose
- Explain
Assess situation
- Available tools
- Available data set
- Cost model*
Determine data mining objectives
- Max hits/Min false alarms
Produce project plan
- Time
- Tools
*For details, refer to the thesis
Investigative Data Mining in Fraud Detection
The Crime Detection System: Preparation Component
•
Data Understanding
•
•
•
Describe data
- 11550 examples (1994 and 1995)
- 3870 instances (1996)
- 33 attributes
- 6% fraudulent
Explore data
- Claim trends by month
- Age of vehicles
- Age of policy holder
Verify data
- Good data quality
- Duplicate attribute, highly skewed attributes
Investigative Data Mining in Fraud Detection
The Crime Detection System: Preparation Component
•
Data Preparation
•
•
•
Select data
- All, except one attribute, are retained for analysis
Clean data
- Missing values replaced
- Spelling mistakes corrected
Format data
- All characters converted to lowercase
- Underscore symbol
Investigative Data Mining in Fraud Detection
The Crime Detection System: Preparation Component
•
Data Preparation
•
Construct data
- Derived attributes
- weeks_past*
- is_holidayweek_claim*
- age_price_wsum*
- Numerical input
- 14 attributes scaled between 0 and 1
- 19 attributes represented by one-of-N or binary encoding*
*For details, refer to the thesis
Investigative Data Mining in Fraud Detection
The Crime Detection System: Preparation Component
•
Data Preparation
•
Partition data
- Data multiplication or oversampling
- For example, 50/50 distribution
1994 and 1995 Training Data
11550 Examples
1996 Score Data
4083 Examples
923
Fraud Examples
10840
Legal Examples
923
legal
923
examples
legal
923
examples
legal
923
examples
legal
923
examples
legal
923
examples
legal
923
examples
legal
923
examples
legal
923
examples
legal
923
examples
legal
923
examples
Legal Examples
Investigative Data Mining in Fraud Detection
Implementing the
Crime Detection System:
Action Component
Investigative Data Mining in Fraud Detection
The Crime Detection System: Action Component
•
Modelling
•
Generate experiment design (1)
Experiment Number
Technique or Algorithm
Data Distribution
I
Naive Bayes
50/50
II
Naive Bayes
40/60
III
Naive Bayes
30/70
IV
Backpropagation
Determined by Experiments I, II, III
V
C4.5
Determined by Experiments I, II, III
VI
Bagging
-
VII
Stacking
-
VIII
Stacking and Bagging
-
IX
Backpropagation
5/95
X
Self Organising Map
5/95
Investigative Data Mining in Fraud Detection
The Crime Detection System: Action Component
•
Modelling
•
Generate experiment design (2)
Test A
B
C
D
E
F
G
H
I
J
K
Training Set
Partition 1
2
3
4
5
6
7
8
9
10
11
Testing Set
Partition 2
3
4
5
6
7
8
9
10
11
1
Evaluation Set
Partition 3
4
5
6
7
8
9
10
11
1
2
Evaluating
Success Rate A
B
C
D
E
F
G
H
I
J
K
Average W
Bagging
Predictions A
B
C
D
E
F
G
H
I
J
K
Bagged X
Producing
Classifier 1
2
3
4
5
6
7
8
9
10
11
Scoring Set
Success Rate A
B
C
D
E
F
G
H
I
J
K
Average Y
Bagging Main
Score Predictions A
B
C
D
E
F
G
H
I
J
K
Bagged Z
Investigative Data Mining in Fraud Detection
Overall Success
Rate
The Crime Detection System: Action Component
•
Modelling
•
Build models (1)
- Bagged X outperformed Averaged W
- Bagged Z performed marginally better than Averaged Y
- Experiment II achieved highest cost savings than I and III
- 40/60 distribution most appropriate under the cost model
- Experiment V achieved highest cost savings than II and IV
- C4.5 algorithm is the best algorithm for the data set
Investigative Data Mining in Fraud Detection
The Crime Detection System: Action Component
•
Modelling
•
Build models (2)
- Experiment VIII achieved slightly better cost savings than V
- Combining models from different algorithms is better than the single algorithm
- The top 15 classifiers from stacking consisted of 9 C4.5, 4 backpropagation, and
2 naive Bayesian classifiers*
*For details, refer to the thesis
Investigative Data Mining in Fraud Detection
The Crime Detection System: Action Component
•
Modelling
•
Build models (3)
- No scores from D2K software
- Experiment IX demonstrates sorted scores and predefined thresholds result in
focused investigations*
- Satisfies Pareto’s Law
- Rules did not provide insights
- Already in domain knowledge and data attribute exploration*
-
Experiment X requires 5 clusters for visualisation*
age_of_policyholder
weeks_past, is_holidayweek_claim
make, accident_area, vehicle_category, age_price_wsum, number_of_cars,
base_policy
*For details, refer to the thesis
Investigative Data Mining in Fraud Detection
The Crime Detection System: Action Component
•
Modelling
Assess models (1)
- Training and score data sets too small*
- Student’s t-test with k-1 degrees of freedom*
•
Rank
Experiment Number
Technique or Algorithm
Cost Savings
Overall Success Rate
Percentage Saved
1
VIII
Stacking and Bagging
$167,069
60%
29.71%
2
V
C4.5 40/60
$165,242
60%
29.38%
3
VI
Bagging
$127,454
64%
22.66%
4
VII
Stacking
$104,887
70%
18.65%
5
II
Naive Bayes 40/60
$94,734
70%
16.85%
6
IX
Backpropagation 5/95
$89, 232
75%
15.87%
7
IV
Backpropagation 40/60
-$6,488
92%
-1.15%
- McNemar’s hypothesis test*
*For details, refer to the thesis
Investigative Data Mining in Fraud Detection
The Crime Detection System: Action Component
•
Modelling
Assess models (2)
- Clusters 1, 2, and 3 have higher occurrences of fraud in 1996
•
- Clusters 1, 3, and 5 consist of several makes of inexpensive cars
- Utility vehicles, rural areas, and liability policies
- Clusters 2 and 4 contain claims submitted many weeks after the “accidents”
- Toyota, sport cars, and multiple policies
Cluster
Number of instances
Descriptive Cluster Profile
1
215
Cluster 1 contains a large number of 21 to 25 year olds. The insured vehicles are relatively new.
2
166
Cluster 2 also contains a large number of 21 to 25 year olds. The claims are usually reported 10 weeks past the accident. The
insured vehicles are usually sport cars.
3
268
Cluster 3 has almost all 16 to 17 year old fraudsters. The insured vehicles are mainly Acuras, Chevrolets, and Hondas. The
insured vehicles are usually utility cars.
4
103
Cluster 4 has claims are usually reported 20 weeks past the accident. Almost all insured cars are Toyotas and the fraudster has
a high probability of getting 3 to 4 cars insured. Claims are unlikely to be submitted during holiday periods.
5
171
Cluster 5 consists of mainly Fords, Mazdas, and Pontiacs. Higher chances of rural accidents and the base policy type are
likely to be liability.
Investigative Data Mining in Fraud Detection
The Crime Detection System: Action Component
•
Modelling
•
Assess models (3)
- Statistical evaluation of descriptive cluster profiles
- Cluster 4
- 3121 Toyota car claims, 6% or 187 fraudulent
- 2148 Toyota sedan car claims, expect 6% or 129 to be fraudulent with ±10
standard deviation
- Actual 171 fraudulent Toyota sedan car claims, z-score of 3.8 standard deviation
- This is an insight because it is statistically reliable, not known previously, and
actionable
Cluster
Group
Claims
No. and
% of
Fraud
Sub-Group
Claims
Expected
No. of
Fraud
Actual
No. of Fraud
z-Score
1
All claims
15420
923 (6%)
21 to 25 year olds
108
2
16
5
2
Sport cars
5358
84 (1.6%)
21 to 25 year olds +
Sport cars
32
1
10
9.5
3
16 to 17
year olds
320
31 (9.7%)
Honda + 16 to 17 year
olds
31
3
31
9.3
Investigative Data Mining in Fraud Detection
The Crime Detection System: Action Component
•
Modelling
•
Assess models (4)
- Append main predictions from 3 algorithms and final predictions from bagging
to 615 fraudulent instances
- 25 cannot be detected by any algorithms, highest lift in Clusters 1 and 2
- All can be detected by at least 1 algorithm in Cluster 3
- Not all fraudulent instances can be detected
- Domain knowledge, cluster detection, and statistics offer explanation
- 101 cannot be detected by 2 algorithms
- Weakness of bagging
- Other alternatives
Investigative Data Mining in Fraud Detection
The Crime Detection System: Action Component
•
Evaluation
Evaluate results
- Experiment VIII generate the best predictions with cost savings of about $168,
000. This is almost 30% of total cost savings possible
- Most statistically reliable insight is the knowledge of 21 to 25 year olds who
drive sport cars
Review process
- Unsupervised learning to derive clusters first
- More training data partitions
- More skewed distributions
- Cost model too simplistic
- Probabilistic Neural Networks
•
Investigative Data Mining in Fraud Detection
The Crime Detection System: Action Component
•
Deployment
•
•
Plan deployment
- Manage geographically distributed databases using distributed data mining
- Take time into account
Plan monitoring and maintenance
- Determined by rate of change in external environment and organisational
requirements
- Rebuild models when cost savings are below a certain percentage of maximum
cost savings possible
Investigative Data Mining in Fraud Detection
Contributions and Recommendations
Investigative Data Mining in Fraud Detection
Contributions
•
•
•
•
•
•
•
•
New Crime Detection Method
Crime Detection System
Cost Model
Visualisations
Statistics
Score-based Feature
Extensive Literature Review
In-depth Analysis of Algorithms
Investigative Data Mining in Fraud Detection
Recommendations – Technical Problems
•
Imperfect data
•
•
•
•
•
Highly skewed data
•
•
•
Statistical evaluation and confidence intervals
Preparation component of crime detection system
Derived attributes
Cross validation
Partitioned data with most appropriate distribution
Cost model
Black-box predictions
•
•
Classification and clustering visualisation
Sorted scores and predefined thresholds, rules
Investigative Data Mining in Fraud Detection
Recommendations – Practical Problems
•
Lack of domain knowledge
•
•
•
Great variety of fraud scenarios over time
•
•
•
•
Action component of crime detection system
Extensive literature review
SOM
Crime detection method
Choice of algorithms
Assessing data mining potential
•
•
•
Quality and quantity of data
Cost model
z-scores
Investigative Data Mining in Fraud Detection
Transforming Minority Report from
Science Fiction to Science Fact: INVESTIGATIVE DATA MINING IN FRAUD DETECTION
1 INTRODUCTION
• The world is overwhelmed with terabytes of data
but there are only few effective and efficient ways to analyse and interpret it.
• The purpose of the research is to simulate the Precrime System from the
science fiction novel, Minority Report, using data mining methods and
techniques, to extract insights from enormous amounts of data to
detect white-collar crime
3 RESULTS ON AUTOMOBILE
INSURANCE DATA
4 DISCUSSION
• Through the use of integration mechanisms, the highest cost
savings is achieved
• The analytical machinery facilitated the interesting discovery of
21 to 25 year old fraudsters who used sport cars as their crime tool
semi-transparent approach
• Integration Mechanisms are needed. As each
precog outputs its many predictions for each instance, all
are counted and the class with the highest tally is chosen as
the main prediction
• Figure 1 shows that the main predictions can be
combined either by majority count (bagging) or the
predictions can be fed back into one of the precogs
(stacking), to derive a final prediction
2 THE CRIME DETECTION METHOD
to increase the accuracy of the predictions,
without violating competitive and legal
requirements
• The analytical machinery transforms
multidimensional data into two-dimensional
clusters which contain similar data to enable the
data analyst to easily
differentiate the groups of fraud.
It also allows the data analyst to
assess the algorithms’ ability
Preco
g
P1 =
L1(D)
to cope with evolving fraud
• The crime detection method provides a
Main Predictions + Predictions
Examples
and
Instances
D
Preco
g
P2 =
L2(D)
Precog
P1 = L1(P1,
P2, P3)
Preco
g
P3 =
L3(D)
flexible step-by-step approach
to generating predictions from any three
algorithms, and uses some form of integration
mechanisms to increase the likelihood of correct
final predictions
Attribute Selection
Main Predictions
Final Predictions
Analytical
Machinery
CL = L4(D)
Visual
Symbols
Graphs and
Scores
Rules
Figure 1: Predictions using Precogs, Analytical Machinery, and Visual Symbols
• Analytical Machinery, or cluster detection,
records, studies, compares, and represents the precogs’
predictions in easily understood terms
• The analytical machinery is represented by the Self
Organising Map (SOM) which clusters the similar data into
groups
• Figure 1 demonstrates that main predictions and final
predictions are appended to the clustered data to
determine the fraud characteristics which cannot be
detected, and the most important attributes are selected
for visualisation
by using analytical machinery and visual symbols
to analyse and interpret the predictions
• Precogs can be
shared between organisations
• The application is in uncovering fraudulent claims in automobile insurance
• The objectives are to overcome the technical and practical problems of data
mining in fraud detection
• Precogs, or precognitive elements, are entities which
have the knowledge to predict that something will happen.
Figure 1 uses three precogs to foresee and prevent crime by
stopping potentially guilty criminals
• Each precog contains multiple classification models, or
classifiers, trained with one data mining technique to
extrapolate the future
• The three precogs are different from each other because
they are trained by different data mining algorithms. For
example, the first, second, and third precog are trained
using naive Bayesian, C4.5, and backpropagation algorithms.
• The precogs require numerical inputs of past examples
to output corresponding predictions for new instances
• Black-box approach from the precogs are
transformed into a
• Visual Symbols, or visualisations,
integrate human perceptual abilities in the
data analysis process by presenting the data in
some visual and interactive form
• The naive Bayesian and C4.5 visualisations
facilitate analysis of classifier predictions and
performance, and column graphs aid the
interpretation of clustering results
• Scores are numbers with a specified
range, which indicates the relative risk that a
particular data instance maybe fraudulent, to
rank instances
• Rules are expressions in the form of
Body → Head, where Body describes the
conditions under which the rule is generated
and Head is the class label
5 CONCLUSION
• Other possible applications of this crime
detection method are:
-Anti-terrorism
-Burglary
-Customs declaration fraud
-Drug-related homocides
-Drug smuggling
-Government financial transactions
-Sexual offences
REFERENCES
Dick P K (1956) Minority Report, Orion Publishing
Group, London, Great Britain.
Done by Clifton Phua for Honours 2003
Supervised by Dr. Damminda Alahakoon
Questions?
Investigative Data Mining in Fraud Detection