Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Amit Sethi, EEE, IIT G @ Cepstrum, Oct 16, 2011
1
Objectives:

Understand what is machine learning

Motivate why it has become so important

Identify Types of learning and salient
frameworks, algorithms and their utility

Take a sneak peak at the next set of problems
2

What is learning?

Why learn?

Types of learning and salient frameworks

Frontiers
3

Example: Learning to ride a bicycle
 T: Task of learning to ride a bicycle
 P: Performance of balancing while moving
 E: Experience of riding in many situations

Is it wise to memorize all situations and
appropriate responses by observing an
expert?
4
Improve on task, T, with respect to
performance metric, P, based on experience, E.
T: Playing checkers
P: Percentage of games won against an arbitrary opponent
E: Playing practice games against itself
T: Recognizing hand-written words
P: Percentage of words correctly classified
E: Database of human-labeled images of handwritten words
T: Driving on four-lane highways using vision sensors
P: Average distance traveled before a human-judged error
E: A sequence of images and steering commands recorded while
observing a human driver.
T: Categorize email messages as spam or legitimate.
P: Percentage of email messages correctly classified.
E: Database of emails, some with human-given labels
Source: Introduction to Machine Learning by Raymond J. Mooney
5


Determine f such that yn=f(xn) and g(y, x) is
minimized for unseen values of y and x pairs.
Form of f is fixed, but some parameters can
be tuned:
 So, y=fθ(x), where, x is observed, and y needs to
be inferred
 e.g. y=1, if mx > c, 0 otherwise, so θ = (m,c)

Machine Learning is concerned with designing
algorithms that learn “better” values of θ
given “more” x (and y) for a given problem
6



What is the scope of the task?
How will performance be measured?
How should learning be approached?
 Scalability:
 How can we learn fast?
 How much resources are needed to learn?
 Generalization:
 How will it perform in unseen situations?
 Online learning:
 Can it learn and improve while performing the task?
7









Artificial Intelligence
Data Mining
Probability and Statistics
Information theory
Numerical optimization
Adaptive Control Theory
Neurobiology
Psychology (cognitive, perceptual, dev.)
Linguistics
8

What is learning?

Why learn?

Types of learning and salient frameworks

Frontiers
9


Develop systems that are too difficult/expensive to construct
manually because they require specific detailed skills or
knowledge tuned to a specific task (knowledge engineering
bottleneck).
Develop systems that can automatically adapt and
customize themselves to individual users.
 Personalized news or mail filter
 Personalized tutoring

Discover new knowledge from large databases (data
mining).
 Market basket analysis (e.g. diapers and beer)
 Medical text mining (e.g. migraines to calcium channel blockers to
magnesium)
Source: Introduction to Machine Learning by Raymond J. Mooney
10

Computational studies of learning may help us
understand learning in humans and other
biological organisms.
 Hebbian neural learning
▪ “Neurons that fire together, wire together.”
log(perf. time)
 Power law of practice
log(# training trials)
Source: Introduction to Machine Learning by Raymond J. Mooney
11

Many basic effective and efficient algorithms
available

Large amounts of data available

Large amounts of computational resources
available
Source: Introduction to Machine Learning by Raymond J. Mooney
12
Automatic vehicle navigation
• Road recognition
• Automatic navigation
Speech recognition
• Speech to text
• Automated services over the phone
Face detection
• Facebook face tagging suggestions
• Camera autofocus for portraits
13

What is learning?

Why learn?

Types of learning and salient frameworks

Frontiers
14

Remember, y=fθ(x)?
 y can be continuous or categorical
 y may be known for some x or none at all
 f can be simple (e.g. linear) or complex
 f can incorporate some knowledge of how x was
generated or be blind to the generation
 etc…
15

Supervised learning:
 For, y=fθ(x), a set of xi, yi (usually classes) are known
 Now predict yj for new xj

Examples:
 Two classes of protein with given amino acid sequences
 Labeled male and female face images
16

In a nutshell:
 Input is non-linearly transformed by
hidden layers usually a “fuzzy” linearly
classified combination
 Output is a linear combination of the
hidden layer

Use when:
 Want to model a non-linear function
 Labeled data is available
 Don’t want to write new s/w

Variations:
 Competitive learning for classification
 Many more…
17

In a nutshell:
 Learns optimal boundary
between two classes (red line)

Use when:
 Labeled class data is available
 Want to minimize chance of
error in the test case

Variations:
 Non-linear mapping of the input
vectors using “Kernels”
18

Unsupervised learning:
 For, y=fθ(x), only a set of xi
are known
 Predict y, such that y is
simpler than x but retains its
essence

Examples:
 Clustering (when y is a class
label)
 Dimensionality reduction
(when y is continuous)
19

In a nutshell:
 Grouping a similar objects based

on a definition of similarity
 That is, intra vs. inter cluster
similarity, e.g. distance from
center of the cluster
Use when:
 Class labels are not available, but

you have a desired number of
clusters in mind
Variations:
 Different similarity measures
 Automatic detection of number of
clusters
 Online clustering
20

In a nutshell:
 High dimensional data, where not
all dimensions are independent,
e.g. (x1, x2, x3), where x3=ax1+bx2+c

Use when:
 You want to perform linear
dimensionality reduction

Variations:
 ICA
 Online PCA
21

In a nutshell:
 Learning a lower-dimensional
manifold (e.g. surface) close to
which the data lies

Use when:
 You want to perform non-
linear dimensionality reduction

Variations:
 SOM
22

Generative models:
 For, y=fθ(x), we have some idea of how x was generated given
x and θ

Examples:
 HMMs: Given phonemes and {age, gender}, we know how the
speech can be generated
 Bayesian Networks: Given {gender, age, race} we have some
idea of what a face will look like for different emotions
23

Discriminative Models:
 Do not care about how
the data was generated
 Finding the right
features is of prime
importance
 Followed by finding the
right classifier

Examples:
 SVM
 MLP
Source: “Automatic Recognition of Facial Actions in Spontaneous Expressions” by Bartlett et al in Journal of Multimedia, Sep 2006
24

What is learning?

Why learn?

Types of learning and salient frameworks

Frontiers
25

1980s:










Advanced decision tree and rule learning
Explanation-based Learning (EBL)
Learning and planning and problem solving
Utility problem
Analogy
Cognitive architectures
Resurgence of neural networks (connectionism, backpropagation)
Valiant’s PAC Learning Theory
Focus on experimental methodology
1990s







Data mining
Adaptive software agents and web applications
Text learning
Reinforcement learning (RL)
Inductive Logic Programming (ILP)
Ensembles: Bagging, Boosting, and Stacking
Bayes Net learning
Source: Introduction to Machine Learning by Raymond J. Mooney
26

2000s








Support vector machines
Kernel methods
Graphical models
Statistical relational learning
Transfer learning
Sequence labeling
Collective classification and structured outputs
Computer Systems Applications
▪
▪
▪
▪
Compilers
Debugging
Graphics
Security (intrusion, virus, and worm detection)
 E mail management
 Personalized assistants that learn
 Learning in robotics and vision
Source: Introduction to Machine Learning by Raymond J. Mooney
27
Bioinformatics
• Gene expression prediction (just scratched the surface)
• Automated drug discovery
Speech recognition
• Context recog., e.g. for digital personal assistants (SiRi?)
• Better than Google translate; imagine visiting Brazil
Image and video processing
• Automatic event detection in video
• “Seeing” software for the blind
28
Robotics
• Where is my iRobot?
• Would you raise a “robot” child and make it learn?
Advanced scientific calculations
• Weather modeling through prediction
• Vector field or FEM calculation through prediction
Who knows…
• Always in search of new problems
29

Learning the structure of classifiers

Automatic feature discovery and active
learning

Discovering the limits of learning
 Information theoretic bounds?

Learning that never ends

Explaining human learning

Computer languages with ML primitives
Adapted from: “The Discipline of Machine Learning” by Tom Mitchell, 2006
30
Thank you!
31






Inference: Using a system to get the output variable for a
given input variable
Learning: Changing parameters according to an algorithm
to improve performance
Training: Using machine learning algorithm to learn
function parameters based on input and (optionally)
output dataset known as “training set”
Validation and Testing: Using inference (without training)
to test the performance of the learned system on data
Offline learning: When all training happens prior to
testing, and no learning takes place during testing
Online learning: When learning and testing happen for
the same data
32