Download Data Mining and Predictive Modeling Workshop

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
2010
Data Mining Workshop 2011
Data Mining and Predictive Modeling
Workshop
Laura Anderson
4/13/2011
www.spss.com/perspectives
Purpose of Workshop
• Introduction to Data Mining
– Stimulate thinking about how data mining
can be applied to your applications
– Get experience in “doing” data mining
– Implementation of models in appropriate
computing environment
– Demonstrate ease of use of powerful
technology
Data Mining Workshop 2011
2010
Welcome
• Data Mining Workshop – Hands on
• May 3rd, Chicago
Data Mining Workshop 2011
2010
Welcome
INSERT
PICTURE
HERE
• Laura Anderson
• Chicago, IL
• Predictive Analytics Specialist, Text
Mining
Data Mining Workshop 2011
2010
SPSS
• SPSS acquired by IBM in 2009
• SPSS is a leading global provider of
predictive analytics software and
solutions
• Customers use SPSS software and
solutions to attract, retain and grow
customers, while reducing fraud and
mitigating risk.
Data Mining Workshop 2011
2010
SPSS
• SPSS helps organizations, regardless of
organizational size or industry to:
– Predict future events
– Proactively act upon that insight to drive better
business outcomes
– Become a Predictive Enterprise
• Use historical data to optimize future decision to meet
business goals and achieve measurable competitive
advantage across all relevant enterprise processes
Data Mining Workshop 2011
2010
SPSS
• Highest customer satisfaction
– Projects delivered on time and under
budget
• Returning the highest ROI
“30 Million Euro
in new revenue”
– 94% achieved positive ROI in 10.7 months
“35% reduction in
mailing cost, 2X
response rate, 29% more
profit”
“Reduced churn
from 19 to 2%”
Data Mining Workshop 2011
2010
“100% increase in
campaign
effectiveness”
What is Data Mining?
•
•
•
“…the exploration and analysis, by automatic or semiautomatic
means, of large quantities of data in order to discover
meaningful patterns and rules” -- Berry & Linoff*
“…the process of discovering meaningful new correlations,
patterns and trends by sifting through large amounts of data
stored in repositories, using pattern recognition technologies as
well as statistical and mathematical techniques.” --Gartner
Group
“Predictive analytics is a set of business intelligence
technologies that uncovers relationships and patterns within
large volumes of data that can be used to predict behavior and
events.” -- TDWI Research**
* From Data Mining Techniques: For Marketing, Sales & Customer Support, Michael J.A. Berry & Gordon LInoff, p.5
** “Predictive Analytics,” What Works in Data Integration, TDWI Research, Vol.23, 2007, p.49
Data Mining Workshop 2011
2010
Data Mining and Text Analytics
Data Mining
Text Analytics
Use advanced analytical
techniques on data
Discover key relationships
between variables
Model effect of variables on
outcomes
Determine influence on outcomes
Predict outcomes
Apply models to new data in realtime
Extract, analyze and create
structure for unstructured data
Integrate analysis results into
operational systems
Integrate analysis results into
Business Intelligence applications
Integrate analysis results with
structured data and use as input
for Data Mining
Improves model accuracy
Data Mining Workshop 2011
2010
IBM SPSS Modeler
• High Performance Data Mining and Text
Analytics Workbench
• Quickly Delivers Positive ROI
• Creates and Operationalizes Predictive
Intelligence
• Used for the Proactive and Repeated…
– Identification of Revenue Opportunities
– Reduction of Costs
– Increase in Productivity
Data Mining Workshop 2011
2010
IBM SPSS Modeler
• Two Editions
• IBM SPSS Modeler Professional
– Modeler Professional is a data mining workbench for the analysis of
structured numerical data to model outcomes and make predictions that
inform business decisions with predictive intelligence.
• IBM SPSS Modeler Premium
– Modeler Premium allows organizations to tap into the predictive intelligence
held in all forms of data. Modeler Premium goes beyond the analysis of
structured numerical data alone and includes information from
unstructured data such as web activity, blog content, customer feedback,
e-mails, articles, and more to create the most accurate predictive models
possible.
Data Mining Workshop 2011
2010
IBM SPSS Modeler
• Available in Multiple Deployments
– Desktop
– Client/Server
– Workgroup
• in combination with IBM SPSS Collaboration
and Deployment Services software
– Enterprise
• Modeler is the analytical engine of IBM SPSS
Decision Management
Data Mining Workshop 2011
2010
Hands-on Session #1
Being Predictive in 15 minutes
•
•
•
•
•
Create a credit risk model for a bank
Connect to data
Define variable roles
Use a modeling technique
Review results
Data Mining Workshop 2011
2010
Data Mining Methodology and
Applications
• CRoss-Industry Standard
Process Model for Data Mining
• Describes Components of
Complete Data Mining Project
Cycle
• Shows Iterative Nature of Data
Mining
• Vendor and Industry Neutral
To learn more, visit: http://www.crisp-dm.org
Data Mining Workshop 2011
2010
Data Mining Methodology and
Applications
• Business Understanding
• Data Understanding
• Data Preparation
• Modeling
• Evaluation
• Deployment
Data Mining Workshop 2011
2010
Data Mining Methodology and
Applications
• Customer Relationship
Management –
“analytical CRM”
– Who are our best customers?
– Can we get more like that?
– What/why do they buy?
– Why do they leave?
• Fraud detection
– Money laundering
– Network intrusion
• Crime analysis
• Industrial process optimization & QA
Data Mining Workshop 2011
2010
• Science:
– Genetics
– Drug discovery
– Medical research
– Food authentication
• Human Capital Management
– Who are our best employees?
– How do we keep our best
employees from leaving?
– Which prospects should we
recruit?
• And many more…
Break
Please Return in 15 Minutes
Data Mining Workshop 2011
2010
Data Mining Techniques
Technique
Algorithms
Predict or
Classify
Auto Classifiers, Used to predict group membership (ie
Decision Trees, will this employee leave?) or a number
Logistic, Time
(ie how many widgets will I sell?)
Series, etc
Group
Auto Clustering,
K-means, SVM,
etc.
Used to classify data points into groups
that are internally homogenous and
externally heterogeneous
Associate
APRIORI,
Carma,
Sequence
Used to find events that occur together
or in a sequence (ie market basket)
Find Outlier
Anomoly
Used to identify cases that don’t follow
expected patterns (ie fraud detection)
Data Mining Workshop 2011
2010
Usage
Hands-on Session #2
Applying Select Data Mining Techniques
• Create a market basket analysis
• Use Auto Cluster to build banking
customer clusters
• Use Auto Classifier for Telco churn
• Add a comment variable for text
analysis
Data Mining Workshop 2011
2010
Hands-on Session #3
Deployment
• Prepare churn model for deployment
– Deploy to marketing department
– Prepare for enterprise deployment
Data Mining Workshop 2011
2010
Getting the most out of your analytic
investment
• How do we keep up with
all of the analytic
requests?
• How do we ensure
accuracy and consistency
in our analytic projects?
• How do we easily integrate
and distribute our analytic
results?
+
Data Mining Workshop 2011
2010
Collaboration & Deployment
Services: Overview
•
Analytic content management repository
– Version control
– Powerful search
– Security and auditing
•
Process management
–
–
–
–
–
•
Multi-step jobs
Conditional job flow
Scheduling
Automated model evaluation
Open integration
Integration & delivery interfaces
– Reporting
– Automatic delivery of analytical output
– Multiple IT infrastructure integration options
Data Mining Workshop 2011
2010
IBM SPSS Decision Management Gets
it Done: Focusing on Outcomes
 Set of tools to automate highvolume decision making
enterprise-wide
 Injects powerful predictive
analytics into core business
processes
 Extends predictive insights to
the business user at the point
of decision
– E.g. Should a claim be ‘fast
tracked’ or evaluated more
closely based on a calculated
risk score?
Maximizes
Maximizesthe
theimpact
impactof
ofanalytics
analyticsininyour
youroperation
operation
Data Mining Workshop 2011
2010
How Predictive Intelligence Gets
Deployed
A call center agent
submits customer
information during
an interaction
Based on the
predictive model,
a single offer is
presented to the
customer
The reaction to the offer
is tracked and used to
refine the model
Data Mining Workshop 2011
2010
Wrap Up
• Summary
• Questions
Data Mining Workshop 2011
2010
Summary of SPSS Key
Differentiators
• Business results delivered for our clients
– Cost effective solution that delivers powerful results across
organization
– Flexible licensing and deployment options
– Full range of algorithms for your business problems
• End-to-end solution
– Data preparation through real time interactions
– Use structured, unstructured and survey data
– Full suite of products, from data collection through
deployment
Data Mining Workshop 2011
2010
Summary of SPSS Key
Differentiators
• Easy to use interface
– Does not require knowledge of programming language
– Short timeframe to be productive
• Flexible architecture
– Leverages the investments already made in technology
• Improved performance
– Does not require data in a proprietary format or DB
• Can manage/combine both structured and unstructured data
– Open architecture (both inputs and outputs)
– SQL Pushback
– Champion – Challenger modeling
Data Mining Workshop 2011
2010
Questions
Data Mining Workshop 2011
2010
Appendix
Data Mining Workshop 2011
2010
Data Mining Overview
• From Amazon.com
– Paperback: 512 pages
– Publisher: Wiley; 1 edition
(December 28, 1999)
– Language: English
– ISBN-10: 0471331236
– ISBN-13: 978-0471331230 ;
• Good introductory text on data
mining for marketing from two
top communicators in the field
Data Mining Workshop 2011
2010
Handbook of Statistical Analysis and
Data Mining Applications
•
•
Handbook of Statistical Analysis and
Data Mining Applications
Robert Nisbet, John Elder IV, and Gary
Miner
Academic Press (2009)
ISBN-10: 0123747651
•
An excellent guide to many aspects of
•
•
data mining including Text mining.
Data Mining Workshop 2011
2010
Data Mining Algorithms
• From Amazon.com
– Data Mining: Practical
Machine Learning Tools and
Techniques with Java
Implementations
– by Eibe Frank, Ian H. Witten
– Paperback - 416 pages
(October 13, 1999)
– Morgan Kaufmann Publishers;
– ISBN: 1558605525;
• Best book I’ve found in between
highly technical and introductory
books. Good coverage of topics,
especially trees and rules, but no
neural networks.
Data Mining Workshop 2011
2010
Thank You
•
•
•
•
•
Laura Anderson
Predictive Analytics
Specialist, Text Mining
IBM SPSS
landerson@us.ibm.com
312.651.3844
33
Data Mining Workshop 2011
2010