Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lecture 19: Conclusion Roadmap Natural language understanding Computer vision Summary of CS221 Next courses History of AI Food for thought CS221 / Autumn 2016 / Liang CS221 / Autumn 2016 / Liang 39 Paradigm Course plan [utterance: user input] semantic parsing Search problems [program] Markov decision processes Constraint satisfaction problems Adversarial games Bayesian networks States Variables Reflex execute ”Low-level intelligence” Logic ”High-level intelligence” Machine learning [denotation: user output] Induce hidden program to accomplish end goal CS221 / Autumn 2016 / Liang 19 CS221 / Autumn 2016 / Liang State-based models S 41 Variable-based models F D NT Q WA B C G SA E NSW V Key idea: state T A state is a summary of all the past actions sufficient to choose future actions optimally. Key idea: factor graphs Graph structure captures conditional independence. CS221 / Autumn 2016 / Liang 42 CS221 / Autumn 2016 / Liang 43 Logic-based models Syntax Machine learning Semantics Objective: loss minimization formula min X Loss(x, y, w) w models (x,y)∈Dtrain Algorithm: stochastic gradient descent Inference rules w → w − ηt ∇Loss(x, y, w) | {z } prediction−target Applies to many models! Key idea: logic Formulas enable more powerful models (infinite). CS221 / Autumn 2016 / Liang 44 CS221 / Autumn 2016 / Liang Tools 7 Roadmap • CS221 provides a set of tools to solve real-world problems Natural language understanding Computer vision Summary of CS221 Next courses History of AI • Start with the nail, and figure out what tool to use Food for thought CS221 / Autumn 2016 / Liang 75 CS221 / Autumn 2016 / Liang 46 Other AI-related courses Other AI-related courses http://ai.stanford.edu/courses/ http://ai.stanford.edu/courses/ Foundations: Applications: • CS228: Probabilistic Graphical Models • CS229: Machine Learning • CS224N: Natural Language Processing (with Deep Learning) • CS229T: Statistical Learning Theory • CS224U: Natural Language Understanding • CS334A: Convex Optimization • CS231A: Introduction to Computer Vision • ... • CS231N: Convolutional Neural Networks for Visual Recognition Applications: • CS224N: Natural Language Processing • CS223A: Introduction to Robotics • CS276: Information Retrieval and Web Search • CS227B: General Game Playing • CS231A: Introduction to Computer Vision • CS224W: Social and Information Network Analysis • CS223A: Introduction to Robotics CS221 / Autumn 2016 / Liang 47 CS221 / Autumn 2016 / Liang 11 Probabilistic graphical models (CS228) Machine learning (CS229) • Boosting, bagging, feature selection h • Discrete ⇒ continuous x • Forward-backward, variable elimination ⇒ belief propagation, variational inference • K-means ⇒ mixture of Gaussians • Gibbs sampling ⇒ Markov Chain Monte Carlo (MCMC) • Q-learning ⇒ policy gradient • Learning the structure CS221 / Autumn 2016 / Liang 12 CS221 / Autumn 2016 / Liang Statistical learning theory (CS229T) 48 Cognitive science Question: what are the mathematical principles behind learning? Question: How does the human mind work? • Cognitive science and AI grew up together Uniform convergence: with probability at least 0.95, your algorithm will return a predictor h ∈ H such that • Humans can learn from few examples on many tasks Computation and cognitive science (PSYCH204): TestError(h) ≤ TrainError(h) + q Complexity(H) n CS221 / Autumn 2016 / Liang • Cognition as Bayesian modeling — probabilistic program [Tenenbaum, Goodman, Griffiths] 14 CS221 / Autumn 2016 / Liang Neuroscience 51 Online materials • Online courses (Coursera, edX) • Videolectures.net: tons of recorded talks from major leaders of AI (and other fields) • Neuroscience: hardware; cognitive science: software • Artificial neural network as computational models of the brain • Modern neural networks (GPUs + backpropagation) not biologically plausible • arXiv.org: latest research (pre-prints) • Analogy: birds versus airplanes; what are principles of intelligence? CS221 / Autumn 2016 / Liang 16 CS221 / Autumn 2016 / Liang 52 Conferences Roadmap • AI: IJCAI, AAAI Natural language understanding • Machine learning: ICML, NIPS, UAI, COLT Computer vision • Data mining: KDD, CIKM, WWW Summary of CS221 • Natural language processing: ACL, EMNLP, NAACL Next courses • Computer vision: CPVR, ICCV, ECCV History of AI • Robotics: RSS, ICRA CS221 / Autumn 2016 / Liang Food for thought 53 CS221 / Autumn 2016 / Liang Pre-AI developments 54 Birth of AI 1956: Workshop at Dartmouth College; attendees: John McCarthy, Marvin Minsky, Claude Shannon, etc. Philosophy: intelligence can be achieved via mechanical computation (e.g., Aristotle) Church-Turing thesis (1930s): any computable function is computable by a Turing machine Real computers (1940s): actual hardware to do it: Heath Robinson, Z-3, ABC/ENIAC Aim for general principles: Every aspect of learning or any other feature of intelligence can be so precisely described that a machine can be made to simulate it. CS221 / Autumn 2016 / Liang 55 CS221 / Autumn 2016 / Liang Birth of AI, early successes 56 Overwhelming optimism... Machines will be capable, within twenty years, of doing any work a man can do. —Herbert Simon Checkers (1952): Samuel’s program learned weights and played at strong amateur level Within 10 years the problems of artificial intelligence will be substantially solved. —Marvin Minsky Problem solving (1955): Newell & Simon’s Logic Theorist: prove theorems in Principia Mathematica using search + heuristics; later, General Problem Solver (GPS) CS221 / Autumn 2016 / Liang I visualize a time when we will be to robots what dogs are to humans, and I’m rooting for the machines. —Claude Shannon 57 CS221 / Autumn 2016 / Liang 58 ...underwhelming results Implications of early era Example: machine translation Problems: • Limited computation: search space grew exponentially, outpacing hardware (100! ≈ 10157 > 1080 ) • Limited information: complexity of AI problems (number of words, objects, concepts in the world) The spirit is willing but the flesh is weak. (Russian) Contributions: The vodka is good but the meat is rotten. • Lisp, garbage collection, time-sharing (John McCarthy) • Key paradigm: separate modeling (declarative) and algorithms (procedural) 1966: ALPAC report cut off government funding for MT CS221 / Autumn 2016 / Liang 59 Knowledge-based systems (70-80s) CS221 / Autumn 2016 / Liang 60 Knowledge-based systems (70-80s) DENDRAL: infer molecular structure from mass spectrometry MYCIN: diagnose blood infections, recommend antibiotics Expert systems: elicit specific domain knowledge from experts in form of rules: XCON: convert customer orders into parts specification; save DEC $40 million a year by 1986 if [premises] then [conclusion] CS221 / Autumn 2016 / Liang 61 CS221 / Autumn 2016 / Liang Knowledge-based systems The Complexity Barrier Contributions: • First real application that impacted industry • Knowledge helped curb the exponential growth Problems: • Knowledge is not deterministic rules, need to model uncertainty • Requires considerable manual effort to create rules, hard to maintain CS221 / Autumn 2016 / Liang 62 63 A number of people have suggested to me that large programs like the SHRDLU program for understanding natural language represent a kind of dead end in AI programming. Complex interactions between its components give the program much of its power, but at the same time they present a formidable obstacle to understanding and extending it. In order to grasp any part, it is necessary to understand how it fits with other parts, presents a dense mass, with no easy footholds. Even having written the program, I find it near the limit of what I can keep in mind at once. — Terry Winograd (1972) CS221 / Autumn 2016 / Liang 29 Modern AI (90s-present) Modern AI (90s-present) • Probability: Pearl (1988) promote Bayesian networks in AI to model uncertainty (based on Bayes rule from 1700s) model predictions • Machine learning: Vapnik (1995) invented support vector machines to tune parameters (based on statistical models in early 1900s) data reasoning perception model CS221 / Autumn 2016 / Liang 30 CS221 / Autumn 2016 / Liang 31 Roadmap Roadmap Summary of CS221 Natural language understanding Next courses Computer vision Summary of CS221 History of AI Next courses My research History of AI Food for thought CS221 / Autumn 2016 / Liang Food for thought 32 CS221 / Autumn 2016 / Liang 1 Property 1: compositionality Property 2: ambiguity What is the largest city? What is largest city in Europe? What is second largest city in Europe? What is the population of the second largest city in Europe? What is largest city in Europe? What is largest grossing movie in 2016? Who’s in town? Who’s in? build complex meanings from primitives infer full meaning from context CS221 / Autumn 2016 / Liang 34 CS221 / Autumn 2016 / Liang 35 A communication system Language as programs [database] What is the largest city in Europe by population? semantic parsing argmax(Cities ∩ ContainedBy(Europe), Population) execute compositionality ambiguity Istanbul CS221 / Autumn 2016 / Liang 36 CS221 / Autumn 2016 / Liang Language as programs 37 Language as programs [calendar] [context] Remind me to buy milk after my last meeting on Monday. [sentence] semantic parsing semantic parsing Add(Buy(Milk), argmax(Meetings ∩ HasDate(2016-07-18), EndTime)) [program] execute execute [reminder added] CS221 / Autumn 2016 / Liang [behavior] 38 CS221 / Autumn 2016 / Liang A brief history of semantic parsing 39 Compositional semantics Richard Montague cities in Europe GeoQuery [Zelle & Mooney 1996] Cities ∩ ContainedBy(Europe) Inductive logic programming [Tang & Mooney 2001] Cities CCG [Zettlemoyer & Collins 2005] String kernels [Kate & Mooney 2006] Synchronous grammars [Wong & Mooney 2007] Relaxed CCG [Zettlemoyer & Collins 2007] Higher-order unification [Kwiatkowski et al. Language + vision [Matsusek et al. Large-scale KBs [Berant et al.; Kwiatkowski et al. Reduction to paraphrasing [Berant & Liang Compositionality on tables [Pasupat & Liang, CS221 / Autumn 2016 / Liang cities Learning from world [Clarke et al. 2010] 2011] Learning from answers [Liang et al. 2011] 2012] Regular expressions [Kushman et al. 2013] 2013] Instruction following [Artzi & Zettlemoyer 2013] 2014] 2015] Dataset from logical forms [Wang et al. 2015] ContainedBy(Europe) ContainedBy Europe in Europe Cities ∩ ContainedBy(Europe) 40 CS221 / Autumn 2016 / Liang 41 Language variation Deep learning cities in Europe European cities cities that are in Europe cities located in Europe cities on the European continent Object recognition: Krizhevsky/Sutskever/Hinton (2012) car Machine translation: Sutskever/Vinyals/Le (2014) This is a blue house no...! C’est une Cities ∩ ContainedBy(Europe) maison bleue accuracy CS221 / Autumn 2016 / Liang 42 simplicity CS221 / Autumn 2016 / Liang 43 [Robin Jia, ACL 2016] [Robin Jia, ACL 2016] Neural semantic parsing argmax Type City , Neural semantic parsing Population Dataset: US Geography dataset (Zelle & Mooney, 1996) What is the highest point in Florida? 100 is the largest city accuracy what • Learn semantic composition without predefined grammar • Encode compositionality through data recombination what’s the capital of Germany? what countries border France? what’s the capital of France? what countries border Germany? CapitalOf(Germany) Borders(France) CapitalOf(France) Borders(Germany) 90 89.3 88.9 86.1 86.6 WM07 ZC07 85 80 70 KZGS11 RNN RNN+recomb state-of-art, simpler CS221 / Autumn 2016 / Liang 44 CS221 / Autumn 2016 / Liang 45 [Zelle & Mooney, 1996; Zettlemoyer & Collins, 2005; Clarke et al. 2010; Liang et al., 2011] Summary so far [context] Training data for semantic parsing What is the largest city in Europe by population? Heavy supervision semantic parsing argmax(Cities ∩ ContainedBy(Europe), Population) execute Istanbul What’s Bulgaria’s capital? What’s Bulgaria’s capital? CapitalOf(Bulgaria) Sofia When was Walmart started? When was Walmart started? DateFounded(Walmart) 1962 What movies has Tom Cruise been in? What movies has Tom Cruise been in? Movies ∩ Starring(TomCruise) TopGun,VanillaSky,... ... • Language encodes computation Light supervision ... • Semantic parsing represents language as programs • Recurrent neural networks: conceptually simpler CS221 / Autumn 2016 / Liang 46 CS221 / Autumn 2016 / Liang 47 Training intuition Searching... Where did Mozart tupress? Greece held its last Summer Olympics in which year? PlaceOfBirth(WolfgangMozart) ⇒ Salzburg PlaceOfDeath(WolfgangMozart) ⇒ Vienna R[Date].R[Year].argmax(Country.Greece, Index) PlaceOfMarriage(WolfgangMozart) ⇒ Vienna Vienna 2004 Year City 1896 Athens Greece 14 ⇒ London 1900 Paris France 24 ⇒ London 1904 St. Louis USA 12 ... ... ... ... PlaceOfMarriage(WilliamHogarth) ⇒ Paddington 2004 Athens Greece 201 London 2008 Beijing China 204 2012 London UK 204 Where did Hogarth tupress? PlaceOfBirth(WilliamHogarth) PlaceOfDeath(WilliamHogarth) CS221 / Autumn 2016 / Liang 48 Country Nations CS221 / Autumn 2016 / Liang 49 Fighting combinatorial explosion Search from points of certainty [Berant & Liang, 2014]: What was Tom Cruise in? MoviesOf(TomCruise) Dynamic programming [Pasupat & Liang, 2015]: PopulationOf.CapitalOf.Colorado PopulationOf.argmax(Type.City u ContainedBy.Colorado, Population) ] PopulationOf.Denver Reinforcement learning [Berant & Liang 2015]: s0 , a1 , s1 , a2 , s2 , . . . CS221 / Autumn 2016 / Liang 50 CS221 / Autumn 2016 / Liang 51 [Bollacker, 2008; Google, 2013] [Pasupat & Liang 2015] Freebase 100M entities (nodes) Tables 1B assertions (edges) 154 million tables on the web [Cafarella et al. 2008] MichelleObama Gender Female PlacesLived USState 1992.10.03 Spouse Type StartDate Event21 Event8 Hawaii ContainedBy Location Type Marriage UnitedStates ContainedBy ContainedBy Chicago BarackObama Location PlaceOfBirth Honolulu PlacesLived Event3 Person CS221 / Autumn 2016 / Liang Type DateOfBirth 1961.08.04 Profession Politician Type City 52 CS221 / Autumn 2016 / Liang 53 [Pasupat & Liang 2014] Web pages 4.7 billion web pages CS221 / Autumn 2016 / Liang 54 CS221 / Autumn 2016 / Liang WikiTableQuestions 55 Natural language understanding • 22000 question/answers, 2100 tables answer accuracy 50 37.1 40 30 20 12.7 How long is the CS221 lecture? 10 2:50 PM - 1:30 PM = 1:20 0 Predict table cell Semantic parser CS221 / Autumn 2016 / Liang 56 CS221 / Autumn 2016 / Liang Natural language understanding 57 Summary so far [context] language What is the largest city in Europe by population? semantic parsing ? execute Istanbul • Learn without annotated programs • Scale up to large knowledge bases, tables, web pages • Search is hard, needed even to get training signal • Programs connect language with the world world CS221 / Autumn 2016 / Liang 58 CS221 / Autumn 2016 / Liang 59 Language game SHRDLURN Wittgenstein (1953): Language derives its meaning from use. add(hascolor(red)) remove red add(hascolor(brown)) remove(hascolor(red)) remove(hascolor(brown)) CS221 / Autumn 2016 / Liang 60 sees goal perform actions has language no language CS221 / Autumn 2016 / Liang SHRDLURN 61 Experiments 100 players from Amazon Mechanical Turk 6 hours ⇒ 10K utterances CS221 / Autumn 2016 / Liang 62 Results: top players (rank 1-20) (3.01) (9.17) Remove the center block Remove the red block Remove all red blocks Remove the first orange block Put a brown block on the first brown block Add blue block on first blue block reinsert pink take brown put in pink remove two pink from second layer Add two red to second layer in odd intervals Add five pink to second layer Remove one blue and one brown from bottom layer (2.78) (7.18) remove the brown block remove all orange blocks put brown block on orange blocks put orange blocks on all blocks put blue block on leftmost blue block in top row CS221 / Autumn 2016 / Liang 63 Results: average players (rank 21-50) (2.72) rem cy pos 1 stack or blk pos 4 rem blk pos 2 thru 5 rem blk pos 2 thru 4 stack bn blk pos 1 thru 2 fill bn blk stack or blk pos 2 thru 6 rem cy blk pos 2 fill rd blk CS221 / Autumn 2016 / Liang 64 move second cube double red with blue double first red with red triple second and fourth with orange add red remove orange on row two add blue to column two add brown on first and third CS221 / Autumn 2016 / Liang (8.37) remove red remove 1 red remove 2 4 orange add 2 red add 1 2 3 4 blue emove 1 3 5 orange add 2 4 orange add 2 orange remove 2 3 brown add 1 2 3 4 5 red remove 2 3 4 5 6 remove 2 add 1 2 3 4 6 red 65 Results: worst players (rank 51-100) (12.6) Results: interesting players (14.15) ‘add red cubes on center left center right far left and far right’ ‘remove blue blocks on row two column two row two column four’ remove red blocks in center left and center right on second row (14.32) laugh with me red blocks with one aqua aqua red alternate brown red red orange aqua orange red brown red brown red brown space red orange red second level red space red space red space CS221 / Autumn 2016 / Liang (Polish) holdleftmost holdbrown holdleftmost blueonblue brownonblue1 blueonorange holdblue holdorange2 blueonred2 holdends1 holdrightend hold2 orangeonorangerightmost (Polish notation) usuń brazowe klocki postaw pomarańczowy klocek na pierwszym klocku postaw czerwone klocki na pomarańczowych usuń pomarańczowe klocki w górnym rzedzie 66 rm scat + 1 c +1c rm sh + 1 2 4 sh +1c -4o rm 1 r +13o full fill c rm o full fill sh -13 full fill sh rm sh rm r +23r rm o + 3 sh + 2 3 sh CS221 / Autumn 2016 / Liang Summary so far 67 worksheets.codalab.org • Downstream goal drives language learning • Both human and computer learn and adapt CS221 / Autumn 2016 / Liang 68 CS221 / Autumn 2016 / Liang 69 [van den Oord et al., 2016] Roadmap WaveNet for audio generation Summary of CS221 • Work with raw audio (16K observations / second) Next courses History of AI My research Food for thought • Key idea: dilated convolutions captures multiple scales of resolution, not recurrent CS221 / Autumn 2016 / Liang 70 CS221 / Autumn 2016 / Liang 71 [Szegedy et al., 2013; Goodfellow et al., 2014] [Isola et al., 2016] Conditional adversarial networks Adversaries AlexNet predicts correctly on the left Key idea: game between • Generator: generates fake images • Discriminator: distinguishes between fake/real images AlexNet predicts ostrich on the right CS221 / Autumn 2016 / Liang 72 CS221 / Autumn 2016 / Liang 73 [Mykel Kochdorfer] Safety guarantees Privacy • For air-traffic control, threshold level of safety: probability 10−9 for a catastrophic failure (e.g., collision) per flight hour • Not reveal sensitive information (income, health, communication) • Compute average statistics (how many people have cancer?) • Move from human designed rules to a numeric Q-value table? yes CS221 / Autumn 2016 / Liang 74 yes no yes no no no no yes no yes CS221 / Autumn 2016 / Liang 75 [Warner, 1965] Randomized response Fairness Do you have a sibling? Prediction task: personal information ⇒ grant insurance? Fallacy 1: it’s just math, right? Method: • Flip two coins. • If both heads: answer yes/no randomly • Otherwise: answer yes/no truthfully no... learning algorithms ⇐ data ⇐ reflection of our society Fallacy 2: remove race, gender, etc. no... Analysis: true-prob = 4 3 CS221 / Autumn 2016 / Liang zipcode is still there, but correlated with race × (observed-prob − 18 ) 76 CS221 / Autumn 2016 / Liang 77 Causality Feedback loops Goal: figure out the effect of a treatment on survival Data Data: Hypothesis Predictions For untreated patients, 80% survive For treated patients, 30% survive Does the treatment help? Nothing! Sick people are more likely to undergo treatment... CS221 / Autumn 2016 / Liang 78 CS221 / Autumn 2016 / Liang 79 CS221 / Autumn 2016 / Liang 80 CS221 / Autumn 2016 / Liang 81 CS221 / Autumn 2016 / Liang 82 CS221 / Autumn 2016 / Liang 83 Search problems Markov decision processes Constraint satisfaction problems Adversarial games Bayesian networks States Variables Reflex ”Low-level intelligence” Logic ”High-level intelligence” Machine learning Please fill out course evaluations on Axess. Thanks for an exciting quarter! CS221 / Autumn 2016 / Liang 84