Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Open Source Machine Learning Open Source Probabilistic Network Library Gary Bradski Program Manager Systems Technology Labs - Intel Open Source ML What are we announcing today? Intel is releasing a library of Open Source Software for Machine Learning First library is Probabilistic Network Library (PNL); comprised of code for inference and learning using Bayesian Networks Research and Development was conducted in Intel research labs in US, Russia and China Software is released as part of Intel Open Research Program Tool for research in many application areas Open Source under a BSD license The code is free for academic and commercial use More info: http://www.intel.com/research/mrl/pnl 2 Open Source ML Why is Intel involved? Statistical Computing and Machine Learning can change computing applications in a considerable way Machine Learning requires high-powered processors Ties into Intel’s research in other areas such as wireless networking, sensor networks and Proactive Health 3 Open Source ML What is Machine Learning? Machine Learning allows computers to learn from their experiences and from gathered data We’ve known for > 200 years that probability theory is the right tool to model systems, but it has always been too hard to compute. Recent advances in computing allow calculation of complex models Machines are good at gathering data and performing complex analysis Machine Learning is a sea change in development of applications since it allows computers to be more proactive and predictive 4 Open Source ML Applications of Machine Learning Interface – Audio Visual Speech Recognition (AVSR); natural language processing, etc. AI – robotics, computer games, entertainment, etc. Data Analysis – information retrieval, data mining, etc. Biological – gene sequencing, genomics, computational pharmacology Computer – run time optimization Industrial – fault diagnosis Applications of machine learning cover a broad range Genomics - matching of protein strands Collaborative Filtering - personal “Google” Drug Discovery – shortening of drug discovery cycle Patient and elder care – wireless camera and sensor network help monitor patients 5 Open Source ML Open ML Components & Plan Key: OpenML Statistical Learning OpenSL - 2004 Supervised • Optimized • Implemented • Not implemented • Boosted decision trees • Influence diagrams • BayesNets: Classification • SVM • Decision trees • K-NN • Bayesnet structure learning Unsupervised Bayesian Networks OpenPNL-2003 • K-means • Dependency Nets • Spectral clustering • Agglomerative clustering • PCA Modeless • BayesNets: Parameter fitting Model based 6 Open Source ML Model Based Machine Learning Machine Learning can be based on Models (modelbased) or it could be Model-less In version 1.0 of OpenML Intel is focusing on Bayesian Networks and the Probabilistic Networks which fall under model-based category The Bayesian approach provides a mathematical rule explaining how one should change existing beliefs in the light of new evidence Model-less approaches are used for clustering and classification Intel will release libraries using model-less approaches next year 7 Open Source ML Applications of Model-less ML Machine 18 Fab 11 Tolerance goes out when temperature >87o • Suitable for applications such as Fault Diagnosis • The system does not have a model • It collects data and clusters and classifies them • Recognition is derived from these clusters 8 Open Source ML Applications of Model-based ML Our research has focused on Bayesian Networks Hidden Markov Models (HMM) – a Bayesian Net - are widely used in speech recognition, couple Hidden Markov Models are used in Audio Visual Speech Recognition (use of visual data in speech recognition) Audio Visual Speech Recognition Open Source PNL is an optimized infrastructure for research and development in Model Based Machine Learning Face Recognition & Tracking 9 Open Source ML Example: Vision Applications Image super resolution - Use a Bayesian method to develop a clear image from a small resolution picture 10 Open Source ML Intel Systems Technology Lab Hillsboro, OR, USA Santa Clara, CA, USA Nizhny Novgorod, Russia Beijing, PR China Wireless Systems Media 3D Graphics Tech. Management Graphics Lab Machine Learning Architecture Lab Architecture for Machine Learning, Media, 3D Graphics, China Research Center Speech and Machine Learning Computer Vision •One of three major labs of Intel Corporate Technology Group •300 researchers worldwide •Focus on impact on Intel Architecture •Drive university and industry initiatives 11 Open Source ML Why Open Source..? Expands our research base Allows Intel researchers to collaborate easily with thousands of colleagues worldwide Remove barriers, speed up collaboration Tap into a very large innovative community Ability to get feedback from a large number of developers to design future microprocessors Chance to explore innovative usage models Diffuse new technologies and usage models to a wide group of early adopters 12 Open Source ML Open Research Program Currently four open source projects http://www.intel.com/software/products/opensource/index.htm OpenCV – Computer Vision Library http://www.intel.com/research/mrl/research/opencv/ OpenRC - Open Research Compiler http://ipf-orc.sourceforge.net/ORC-overview.htm OpenLF – Open Light Fields http://www.intel.com/research/mrl/research/lfm/ OpenAVSR – Audio Visual Speech Recognition http://www.intel.com/research/mrl/research/avcsr.htm 13 Open Source ML Example: OpenCV Released in June 2000 A library of 500+ computer vision algorithms, including applications such as Face Recognition, Face Tracking, Stereo Vision, Camera Calibration Highly tuned for IA Windows and Linux Versions Over 500,000 Downloads Broad use in academia (450) and Industry (360) 14 Open Source ML More Information Visit Open Source ML Web page & download at: http://www.intel.com/research/mrl/pnl 15 Open Source ML Backup 16 Open Source ML Modeless and Model Based ML We’ll use an example application from our current research to descibe two basic approaches to machine learning: Model Based Bayesian Networks A Function fitters B Regression Filters Modeless Classifiers AACACB Clustering Kernel estimators CCB C CBABBC AAA CB C ABBC B 17 Open Source ML Quick view of Bayesian networks 18 Open Source ML What is a Bayesian Network? A Bayesian network, or a belief network, is a graph in which the following holds: A set of random variables makes up nodes of the network. A set of directed links connects pairs of nodes to denote causality relations between variables. Each node has a conditional probability distribution (CPD) that quantifies the effects that the parents have on the node Graphical Models are more general, allowing undirected links, mixed directed/undirected connections, and loops within the graph 19 Open Source ML Computational Advantages of Bayesian Networks Bayesian Networks graphically express conditional independence of probability distributions. Independencies can be exploited for large computational savings. EXAMPLE: Joint probability of 3 discrete variable (A,B,C) system with 5 possible values each: A P(A,B,C) = 5x5x5 table: 125 parameters C B But a graphical model factors the probabilities taking advantage of the independencies: A A A B B A C C 55 parameters 20 Open Source ML Causality and Bayesian Nets Think of Bayesian Networks as a “Circuit Diagram” of Probability Models • The Links indicate causal effect, not direction of information flow. • Just as we can predict effects of changes on the circuit diagram, we can predict consequences of “operating” on our probability model diagram. Diode Mains Capac. Transf. Diode Observed Un-Observed Ammeter Battery 21 Open Source ML Quick view of Decision Trees and Statistical Boosting 22 Open Source ML Statistical Classification Cluster data to infer or predict properties Example: Decision trees Find splits that most “purify” the labeled data Prune the tree to minimize complexity AACBAABBCBCC AACACB AACACB CBABBC All the way down … AACACB CCB CC B CCB CBABBC AAA CB C CBABBC AAA CB C ABBC B ABBC B A The split rules are used to classify Future data BBC BB C 23 Open Source ML Statistical Classification Boosting Use a weak classifier such as a 1 level tree: AACBAABBCBCC AACACB CBABBC Re-weight the error cases and classify again; Record weight factor “Wi” for “ith” case. AACBAABBCBCC AAAACB AACBAABBCBCC AACC AAABBB AAAABBBB CCAABBBB AACBAABBCBCC AAABBB AACC AACBAABBCBCC AACBAABBCBCC AACC CCAABBBB Decision2 * W2 AAABBB ACCCCB AAAABBBB CCCC AACBAABBCBCC AAAA CBCCBBBC Weighted Sum Decision CCAABBBB ACCCCB CCCC AACBAABBCBCC ACCCCB AACBAABBCBCC AACBAABBCBCC DecisionN * WN AACBAABBCBCC AACBAABBCBCC CBCCBBBC Decision1 * W1 CCBBBC Repeat until you have a “forest” AAAA Use the error weighted forest to vote on the classification of new data AACBAABBCBCC AACBAABBCBCC AAABBB AAAABBBB CCCC ACCCCB AACBAABBCBCC AAAABBBB CCCC AACBAABBCBCC AAAA AACBAABBCBCC CBCCBBBC AACBAABBCBCC AACC CCAABBBB AAABBB ACCCCB 24 Open Source ML Application areas and libraries 25 Actively working on Open Source ML Applications of ML Key: Ramping Past work External activity Interface AI Data Analysis Biologic Computer Industrial Cognitive Modeling Information Retrieval Genomics Trace Compression Models of Manufacturing Game Play Datamining Proteomics Compiler Optimization Disposition Action Planning Info Filtering Run Time Optimization Supply Chain Binary Trans Adaptation Process Control Biometric ID Audio Models Text Recog. Natural Lang. Vision Models Lips+Speech AVSR Collaborative Filtering Robotics Mapping Sensor Fusion Speech Metabolics Gene Sequencing Epidemiology Computational Pharmacology Fault Diagnosis TOOLS: Neural Nets SVM Adaptive Filters Reinforcement Learning Relational Networks Genetic Algorithms Graphical Models/MRFs Bayesian Networks Decision Theory, Influence Diagrams Statistical Regression, ANOVA, … Stochastic Discrimination Trees, Boosting, Random forest 26 Open Source ML Probabilistic Network Library Application Interface Driven Lips+Speech Audio Models Data Mining Information Retrieval AVSR Trace Compression Gene Sequencing Vision Models Speech Info Filtering Natural Lang. Biometric ID Genomics Intel “AI” Industrial Cognitive Modeling Disposition Robotics Learned Control Epidemiology Universities Game Play Models of Manufacturing Fault Supply Diagnosis Chain Process Control Junction Tree Factor Graph Gibbs Sampling Particle Filter Structure Learning Bayesian Network Engine Decision & Utility theory EM Dynamic BN Reinforcement MRFs Loopy Belief Data Handling Variational Cross Validation Plates Drive into hardware Workload Analysis Architecture Theories & Algorithms Modify Existing Architectures Chipset cache Platform Drive into Future Hardware CPU Instructions Create New Architectures 27 Open Source ML Open Source Computer Vision (OpenCV) 28 Open Source ML Machine Learning Library (OpenMLL) CLASSIFICATION / REGRESSION CART Statistical Boosting MART Random Forests Stochastic Discrimination Logistic SVM K-NN CLUSTERING K-Means Spectral Clustering Agglomerative Clustering LDA, SVD, Fisher Discriminate AACBAABBCBCC AACACB CCB CC B CBABBC AAA CB C ABBC B A BBC BB C TUNING/VALIDATION Cross validation Bootstrapping Sampling methods Alpha Q1’04, Beta Q4’04 29 Open Source ML Optimization (Lib ?) Large-scale Optimizations Combinatorial Optimizations Continuous Constrained Linear LP QP Simplex Unconstrained Nonlinear Mixed Discrete Nonlinear NLP Interior Point Active Set SQP Conjugate Gradient, Newton Branch and Bound Domain Reduction, Constraints Propagation Sim. Anealing, Genetic Alg, Stoch. Search, Network Programming, Dynamic Programming Problems looking at: Circuit layout; Device geometry; Chemical binding synthesis 30