Download Introduc%on to Applied Machine Learning

Center for Complex Engineering Systems Introduc)on to Applied Machine Learning Abdullah Almaatouq (amaatouq@mit.edu) Abdulaziz Alhassan (a.alhassan@cces-‐kacst-‐mit.org) Ahmad Alabdulkareem (kareem@mit.edu) Tutorial Outline •  What is machine learning •  How to learn •  How to learn well •  Prac)cal Session Examples of Machine Learning What is Machine Learning •  Given computers the ability to learn without explicitly being programmed. – Arthur Samuel 1959 •  Machine learning is a computer program that improves its performance in a certain task with experience. – Tom Mitchell (1998) What is Machine Learning •  Making predic)ons, based on learned paSerns from historical data. •  Not to be confused with data mining –  Data mining focuses on exploi)ng paSerns in historical data. •  New field? –  Sta)s)cal Generaliza)on –  Computa)onal Sta)s)cs Why Machine learning? •  Huge amounts of data !!!! •  Humans are expensive, computers are cheap. –  88ECU, 32CPU and 244GB RAM ≈ $3.50/h –  Minimum worker wage in the US is $7.25/h •  Rule based is (oden): –  –  –  –  Difficult to capture all relevant paSerns. You can’t add rules to paSerns you don’t know exist Very hard to maintain (oden) doesn’t work well •  Psychic superpowers (some)mes!) When to use Machine Learning •  There is a paSern to be detected •  You have data Distance •  You can NOT pin it mathema)cally d = r × t Time (t) Types of Problems •  Supervised –  Training from seen labeled examples to generalize to unseen new observa)ons. –  Given some input xi predict ‘class/value’ of yi •  Unsupervised –  Find hidden structures in unlabeled data Types of Problems Machine Learning problems Supervised Regression Classifica)on Unsupervised Clustering Outlier Detec)on Supervised Learning •  ClassificaBon –  Binary: Given xi find yi in {-‐1, 1} –  Mul)category : Given xi find yi in {-‐1, .. K} •  Regression –  Given xi find yi in R (or Rd) Spam VS Not Spam Images to digits Beak depth predic)on Supervised Learning Framework Unknown func)on f f: x ! y generates Training Examples (xi,yi) … .. (xn,yn) input Hypothesis set Op)miza)on rou)nes Error func)on New observa)on x h(x) es s o o ch Machine Learning input produces Known func)on g g ≈ f Predict value/category y Linear Regression Linear form: y = ax+b •  Capture the collec)ve behavior of credit officers •  Predict the credit line for new customers Bias 1 Age 23 years Annual salary $144000 Input x = Years in residence 4 years Years in job 2 years ….. … = [1,23,144000,4,2..] h(x) = θ1 x1 +... + θ d xd + θ 0 x0 d Linear regression output: h(x) = ∑θ x i i=0 i = θT x Linear Regression •  The historical dataset (x1,y1), (x2,y2),…, (xN,yN) •  y n ∈ R is the credit line for customer xn •  How well our h(x) = θTx approximate f(x) –  Squared error ( h(x) − f(x) )2 –  E(h) = 1 N N ∑ (h(x ) − y n n )2 n=1 •  Goal is to choose h that minimizes the E(h) Linear Regression y x1 x2 Illustra)on of 2d and 3d regression line and plane. It works the same in higher dimensions SeMng-‐up the problem E(h) = X= x T 1 T 2 . . . x Tn N ∑ (h(x ) − y n n )2 n=1 Features E(θ) = 1 N N ∑ (θ T xn − yn )2 n=1 1 || Xθ − y ||2 N = X= Observa)ons x 1 N Minimizing E(h) E(θ) = 1 || Xθ − y ||2 N E(θ) = 2 T X (Xθ − y) N Python: theta = linalg.inv (X.T*X)*X.T*y = 2 T X (Xθ − y) = 0 N = 2 (X T Xθ − X T y) = 0 N θ = (X T X )−1 X T y Inverse here is pseudo-‐inverse MATLAB: theta = inv(X'*X)*X'*y Linear Regression Supervised Learning Framework Credit line officers Unknown func)on f f: x ! y generates Training Examples (xi,yi) … .. (xn,yn) Historical credit line h(x) = ax+ b Error func)on Squared error New observa)on x input Hypothesis set Gradient Op)miza)on decent rou)nes New applica)ons h(x) es s o o ch Machine Learning Linear Regression input produces Known func)on g g ≈ f Predict Credit line value/category y Support Vector Machines •  Separa)ng plane with maximum margin •  Margins are configurable with parameters to allow for mistakes •  Gold standard blackbox for many prac))oners Support Vector Machines •  Effec)ve in high dimensional spaces •  effec)ve even when #features > #observa)ons •  It can uses different kernel func)ons Polynomial Linear RBF Supervised Learning Framework Unknown func)on f f: x ! y generates Training Examples (xi,yi) … .. (xn,yn) input Hypothesis set Op)miza)on rou)nes Error func)on New observa)on x h(x) es s o o ch Machine Learning input produces Known func)on g g ≈ f Predict value/category y Types of Problems Machine Learning problems Supervised Regression Classifica)on Unsupervised Clustering Outlier Detec)on Unsupervised Learning •  Trying to find hidden structure in unlabeled data. •  Clustering –  Group similar data points in “clusters” •  k-‐means •  Mixture models •  hierarchical clustering Unsupervised Learning •  K-‐means •  “Birds of a feather flock together” •  Group samples around a “mean” –  Start with random “means” and assign each point to the nearest “mean” –  Calculate new “means” –  Re-‐assign points to new “means”… Random ini)al means Assignment Mean Recalcula)on Re-‐assign points to new means Mean Recalcula)on Re-‐assign points to new means Unsupervised Learning •  Mixture Models –  Presence of subpopula)ons within an overall popula)on Anomaly DetecBon •  Detect abnormal data •  Whitelis)ng good behavior Anomaly DetecBon •  Sta)s)cal Anomaly Detec)on 0.06 # X1 X2 … Xn 0.05 1 12 8 … 48 0.04 2 14 7 … 52 0.03 3 16 8 … 57 4 11 6 … 53 … … … … … m 13 7 … 56 0.02 0 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100 0.01 P(x) = p(x1, α1, μ1) p(x2, α2, μ2)… p(xn, αn, μn) Anomaly DetecBon •  Local Outlier Factor (LOF) –  Local density –  Locality is given by nearest neighbors How to learn (well) IniBal criBcal obstacles •  Choosing Features: –  By domain experts knowledge and exper)se –  Or by Feature extrac)on algorithms •  Choosing Parameters, for example: –  Degree of the regression equa)on –  SVM parameters (c, gamma) –  Number of clusters •  K-‐mean (pre) •  Agglomera)ve (post) A simple example: FiMng a polynomial •  The green curve is the true func)on (which is not a polynomial) •  We will use a loss func)on that measures the squared error in the predic)on of y(x) from x. Some fits to the data: which is best? which is best? Occam's Razor •  “The simplest explanaBon for some phenomenon is more likely to be accurate than more complicated explanaBons.” which is best? A simple way to reduce model complexity Add penal)es from Bishop Example •  Mixture Models Using a validaBon set •  Divide the total dataset into three subsets: –  Training data: for learning the parameters of the model. –  Valida)on data for deciding what type of model and what amount of regulariza)on works best. –  Test data is used to get a final, unbiased es)mate of how well the algorithm works. We expect this es)mate to be worse than on the valida)on data. PCA •  Data visualiza)on •  Noise reduc)on •  Data reduc)on •  Can help with the curse of dimensionality •  And more Original Variable B What are PCA PC 2 PC 1 Original Variable A •  Orthogonal direc)ons of greatest variance in data Principal Components •  First PC is direc)on of maximum variance from origin Wavelength 2 30 25 PC 1 20 15 10 5 0 0 5 10 15 20 Wavelength 1 25 30 10 15 20 Wavelength 1 25 30 •  Subsequent PCs are orthogonal to 1st PC and describe maximum residual variance Wavelength 2 30 25 20 15 PC 2 10 5 0 0 5 Principal Components 5 2nd Principal Component, y2 1st Principal Component, y1 4 3 2 4.0 4.5 5.0 5.5 6.0 Principal Components 5 xi2 yi,2 4 yi,1 3 2 4.0 4.5 5.0 xi1 5.5 6.0 PCA on real example Original PCA r=4 64 256 3600 Thank you •  Make sure you have python and IPython notebook –  Easiest way to get this running hSp://con)nuum.io/downloads •  Download the tutorial IPython notebook hSp://www.amaatouq.com/notebook.zip

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Introduc%on to Applied Machine Learning