Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
2010 Data Mining Workshop 2011 Data Mining and Predictive Modeling Workshop Laura Anderson 4/13/2011 www.spss.com/perspectives Purpose of Workshop • Introduction to Data Mining – Stimulate thinking about how data mining can be applied to your applications – Get experience in “doing” data mining – Implementation of models in appropriate computing environment – Demonstrate ease of use of powerful technology Data Mining Workshop 2011 2010 Welcome • Data Mining Workshop – Hands on • May 3rd, Chicago Data Mining Workshop 2011 2010 Welcome INSERT PICTURE HERE • Laura Anderson • Chicago, IL • Predictive Analytics Specialist, Text Mining Data Mining Workshop 2011 2010 SPSS • SPSS acquired by IBM in 2009 • SPSS is a leading global provider of predictive analytics software and solutions • Customers use SPSS software and solutions to attract, retain and grow customers, while reducing fraud and mitigating risk. Data Mining Workshop 2011 2010 SPSS • SPSS helps organizations, regardless of organizational size or industry to: – Predict future events – Proactively act upon that insight to drive better business outcomes – Become a Predictive Enterprise • Use historical data to optimize future decision to meet business goals and achieve measurable competitive advantage across all relevant enterprise processes Data Mining Workshop 2011 2010 SPSS • Highest customer satisfaction – Projects delivered on time and under budget • Returning the highest ROI “30 Million Euro in new revenue” – 94% achieved positive ROI in 10.7 months “35% reduction in mailing cost, 2X response rate, 29% more profit” “Reduced churn from 19 to 2%” Data Mining Workshop 2011 2010 “100% increase in campaign effectiveness” What is Data Mining? • • • “…the exploration and analysis, by automatic or semiautomatic means, of large quantities of data in order to discover meaningful patterns and rules” -- Berry & Linoff* “…the process of discovering meaningful new correlations, patterns and trends by sifting through large amounts of data stored in repositories, using pattern recognition technologies as well as statistical and mathematical techniques.” --Gartner Group “Predictive analytics is a set of business intelligence technologies that uncovers relationships and patterns within large volumes of data that can be used to predict behavior and events.” -- TDWI Research** * From Data Mining Techniques: For Marketing, Sales & Customer Support, Michael J.A. Berry & Gordon LInoff, p.5 ** “Predictive Analytics,” What Works in Data Integration, TDWI Research, Vol.23, 2007, p.49 Data Mining Workshop 2011 2010 Data Mining and Text Analytics Data Mining Text Analytics Use advanced analytical techniques on data Discover key relationships between variables Model effect of variables on outcomes Determine influence on outcomes Predict outcomes Apply models to new data in realtime Extract, analyze and create structure for unstructured data Integrate analysis results into operational systems Integrate analysis results into Business Intelligence applications Integrate analysis results with structured data and use as input for Data Mining Improves model accuracy Data Mining Workshop 2011 2010 IBM SPSS Modeler • High Performance Data Mining and Text Analytics Workbench • Quickly Delivers Positive ROI • Creates and Operationalizes Predictive Intelligence • Used for the Proactive and Repeated… – Identification of Revenue Opportunities – Reduction of Costs – Increase in Productivity Data Mining Workshop 2011 2010 IBM SPSS Modeler • Two Editions • IBM SPSS Modeler Professional – Modeler Professional is a data mining workbench for the analysis of structured numerical data to model outcomes and make predictions that inform business decisions with predictive intelligence. • IBM SPSS Modeler Premium – Modeler Premium allows organizations to tap into the predictive intelligence held in all forms of data. Modeler Premium goes beyond the analysis of structured numerical data alone and includes information from unstructured data such as web activity, blog content, customer feedback, e-mails, articles, and more to create the most accurate predictive models possible. Data Mining Workshop 2011 2010 IBM SPSS Modeler • Available in Multiple Deployments – Desktop – Client/Server – Workgroup • in combination with IBM SPSS Collaboration and Deployment Services software – Enterprise • Modeler is the analytical engine of IBM SPSS Decision Management Data Mining Workshop 2011 2010 Hands-on Session #1 Being Predictive in 15 minutes • • • • • Create a credit risk model for a bank Connect to data Define variable roles Use a modeling technique Review results Data Mining Workshop 2011 2010 Data Mining Methodology and Applications • CRoss-Industry Standard Process Model for Data Mining • Describes Components of Complete Data Mining Project Cycle • Shows Iterative Nature of Data Mining • Vendor and Industry Neutral To learn more, visit: http://www.crisp-dm.org Data Mining Workshop 2011 2010 Data Mining Methodology and Applications • Business Understanding • Data Understanding • Data Preparation • Modeling • Evaluation • Deployment Data Mining Workshop 2011 2010 Data Mining Methodology and Applications • Customer Relationship Management – “analytical CRM” – Who are our best customers? – Can we get more like that? – What/why do they buy? – Why do they leave? • Fraud detection – Money laundering – Network intrusion • Crime analysis • Industrial process optimization & QA Data Mining Workshop 2011 2010 • Science: – Genetics – Drug discovery – Medical research – Food authentication • Human Capital Management – Who are our best employees? – How do we keep our best employees from leaving? – Which prospects should we recruit? • And many more… Break Please Return in 15 Minutes Data Mining Workshop 2011 2010 Data Mining Techniques Technique Algorithms Predict or Classify Auto Classifiers, Used to predict group membership (ie Decision Trees, will this employee leave?) or a number Logistic, Time (ie how many widgets will I sell?) Series, etc Group Auto Clustering, K-means, SVM, etc. Used to classify data points into groups that are internally homogenous and externally heterogeneous Associate APRIORI, Carma, Sequence Used to find events that occur together or in a sequence (ie market basket) Find Outlier Anomoly Used to identify cases that don’t follow expected patterns (ie fraud detection) Data Mining Workshop 2011 2010 Usage Hands-on Session #2 Applying Select Data Mining Techniques • Create a market basket analysis • Use Auto Cluster to build banking customer clusters • Use Auto Classifier for Telco churn • Add a comment variable for text analysis Data Mining Workshop 2011 2010 Hands-on Session #3 Deployment • Prepare churn model for deployment – Deploy to marketing department – Prepare for enterprise deployment Data Mining Workshop 2011 2010 Getting the most out of your analytic investment • How do we keep up with all of the analytic requests? • How do we ensure accuracy and consistency in our analytic projects? • How do we easily integrate and distribute our analytic results? + Data Mining Workshop 2011 2010 Collaboration & Deployment Services: Overview • Analytic content management repository – Version control – Powerful search – Security and auditing • Process management – – – – – • Multi-step jobs Conditional job flow Scheduling Automated model evaluation Open integration Integration & delivery interfaces – Reporting – Automatic delivery of analytical output – Multiple IT infrastructure integration options Data Mining Workshop 2011 2010 IBM SPSS Decision Management Gets it Done: Focusing on Outcomes Set of tools to automate highvolume decision making enterprise-wide Injects powerful predictive analytics into core business processes Extends predictive insights to the business user at the point of decision – E.g. Should a claim be ‘fast tracked’ or evaluated more closely based on a calculated risk score? Maximizes Maximizesthe theimpact impactof ofanalytics analyticsininyour youroperation operation Data Mining Workshop 2011 2010 How Predictive Intelligence Gets Deployed A call center agent submits customer information during an interaction Based on the predictive model, a single offer is presented to the customer The reaction to the offer is tracked and used to refine the model Data Mining Workshop 2011 2010 Wrap Up • Summary • Questions Data Mining Workshop 2011 2010 Summary of SPSS Key Differentiators • Business results delivered for our clients – Cost effective solution that delivers powerful results across organization – Flexible licensing and deployment options – Full range of algorithms for your business problems • End-to-end solution – Data preparation through real time interactions – Use structured, unstructured and survey data – Full suite of products, from data collection through deployment Data Mining Workshop 2011 2010 Summary of SPSS Key Differentiators • Easy to use interface – Does not require knowledge of programming language – Short timeframe to be productive • Flexible architecture – Leverages the investments already made in technology • Improved performance – Does not require data in a proprietary format or DB • Can manage/combine both structured and unstructured data – Open architecture (both inputs and outputs) – SQL Pushback – Champion – Challenger modeling Data Mining Workshop 2011 2010 Questions Data Mining Workshop 2011 2010 Appendix Data Mining Workshop 2011 2010 Data Mining Overview • From Amazon.com – Paperback: 512 pages – Publisher: Wiley; 1 edition (December 28, 1999) – Language: English – ISBN-10: 0471331236 – ISBN-13: 978-0471331230 ; • Good introductory text on data mining for marketing from two top communicators in the field Data Mining Workshop 2011 2010 Handbook of Statistical Analysis and Data Mining Applications • • Handbook of Statistical Analysis and Data Mining Applications Robert Nisbet, John Elder IV, and Gary Miner Academic Press (2009) ISBN-10: 0123747651 • An excellent guide to many aspects of • • data mining including Text mining. Data Mining Workshop 2011 2010 Data Mining Algorithms • From Amazon.com – Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations – by Eibe Frank, Ian H. Witten – Paperback - 416 pages (October 13, 1999) – Morgan Kaufmann Publishers; – ISBN: 1558605525; • Best book I’ve found in between highly technical and introductory books. Good coverage of topics, especially trees and rules, but no neural networks. Data Mining Workshop 2011 2010 Thank You • • • • • Laura Anderson Predictive Analytics Specialist, Text Mining IBM SPSS landerson@us.ibm.com 312.651.3844 33 Data Mining Workshop 2011 2010