Download ML course, 2016 fall What you should know:

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
ML course, 2016 fall
What you should know:
Week 1, 2 and 3: Basic issues in Probabilities and Information theory
Read: Chapter 2 from the Foundations of Statistical Natural Language Processing book by
Christopher Manning and Hinrich Schütze, MIT Press, 2002, and/or Probability Theory Review
for Machine Learning, Samuel Ieong, November 6, 2006 (https://see.stanford.edu/materials/aimlcs229/cs229prob.pdf).
Week 1: Random events
Concepts/definitions:
Theoretical results/formulas:
• event/sample space, random event
• elementary probability formula:
# favorable cases
# all possible cases
• the “multiplication” rule; the “chain” rule
• probability function
• conditional probabilities
• independent random events (2 forms);
conditionally independent random events (2
forms)
• “total probability” formula (2 forms)
• Bayes formula (2 forms)
Exercises illustrating the above concepts/definitions and theoretical results/formulas,
in particular: proofs for certain properties derived from the definition of the probability function
for instance: P (∅) = 0, P (Ā) = 1 − P (A), A ⊆ B ⇒ P (A) ≤ P (B)
Ciortuz et al.’s exercise book: ch. 1, ex. 1-5 [6-7], 43-46 [47-49]
Week 2: Random variables
Concepts/definitions:
• random variables;
random variables obtained through function composition
• discrete random variables;
probability mass function (pmf)
examples: Bernoulli, binomial, geometric, Poisson
distributions
• cumulative function distribution
• continuous random variables;
probability density function (pdf)
examples: Gaussian, exponential, Gamma distributions
• expectation (mean), variance, standard variation; covariance. (See definitions!)
• multi-valued random functions;
joint, marginal, conditional distributions
Theoretical results/formulas:
•P
for any discrete variable X:
x p(x) = 1, where p is the pmf of X
for
R any continuous variable X:
p(x) dx = 1, where p is the pdf of X
• E[X + Y ] = E[X] + E[Y ]
E[aX + b] = aE[X] + b
Var[aX] = a2 Var[X]
Var[X] = E[X 2 ] − (E[X])2
Cov(X, Y ) = E[XY ] − E[X]E[Y ]
• X, Y independent variables ⇒
Var[X + Y ] = Var[X] + Var[Y ]
• X, Y independent variables ⇒
Cov(X, Y ) = 0, i.e. E[XY ] = E[X]E[Y ]
• independence of random variables;
conditional independence of random variables
Advanced issues:
• For any vector of random variables, the covariance matrix is symmetric and positive semidefinite.
Advanced issues:
• the likelihood function (see also Week 12)
• vector of random variables;
covariance matrix for a vector of random variables;
pozitive [semi-]definite matrices,
negative [semi-]definite matrices
1
Ciortuz et al.’s exercise book: ch. 1, ex. 25
Exercises illustrating the above concepts/definitions and theoretical results/formulas, concentrating
especially on:
− identifying in a given problem’s text the underlying probabilistic distribution: either a basic one
(e.g., Bernoulli, binomial, categorial, multinomial etc.), or one derived [by function composition
or] by summation of identically distributed random variables (see for instance pr. 14, 51, 53);
− computing probabilities (see for instance pr. 51, 53a,c);
− computing means / expected values of random variables (see for instance pr. 2, 14, 52, 53b,d);
− verifying the [conditional] independence of two or more random variables (see for instance pr. 12,
17, 53, 54).
Ciortuz et al.’s exercise book: ch. 1, ex. 8-17 [18-22], 50-56 [57-59]
Week 3: Elements of Information Theory
Concepts/definitions:
• entropy;
specific conditional entropy;
average conditional entropy;
joint entropy;
information gain (mutual information)
Advanced issues:
• relative entropy;
• cross-entropy
Theoretical results/formulas:
• 0 ≤ H(X) ≤ H(1/n, 1/n, . . . , 1/n)
{z
}
|
n times
• IG(X; Y ) ≥ 0
• H(X, Y ) = H(X) + H(Y |X) = H(Y ) + H(X|Y )
(generalisation: the chain rule)
• IG(X; Y ) = H(X) − H(X|Y ) = H(Y ) − H(Y |X)
• H(X, Y ) = H(X) + H(Y ) iff X and Y are indep.
• IG(X; Y ) = 0 iff X and Y are independent
Exercises illustrating the above concepts/definitions and theoretical results/formulas, concentrating especially on:
− computing different types of entropies (see ex. 35 and 37);
− proof of some basic properties (see ex. 34a, 67), including the functional analysis of the
entropy of the Bernoulli distribution, as a base for drawing its plot;
− finding the best IG(Xi ; Y ), given a dataset with X1 , . . . , Xn as input variables and Y
as output variable (see pr. LDP-104, i.e. CMU, 2013 fall, E. Xing, W. Cohen, Sample
Questions, pr. 4).
Ciortuz et al.’s exercise book: ch. 1, ex. 33ab-35, 37, 66-70 [33c-g, 36, 38-41, 71]
2
Week 4 and 5: Decision Trees
Read: Chapter 3 from Tom Mitchell’s Machine Learning book.
Important Note:
See the Overview (rom.: “Privire de ansamblu”) document for Ciortuz et al.’s exercise book,
chapter 2. It is in fact a “road map” for what we will be doing here. (This note applies also to
all chapters.)
Week 4: decision trees and their properties; application of ID3 algorithm, properties of ID3:
Ciortuz et al.’s exercise book, ch. 2, ex. 1-7, 20-28
Week 5: extensions to the ID3 algorithm:
− handling of continuous attributes: Ciortuz et al.’s ex. book, ch. 2, ex. 8-9 [10], 29-30 [31-33]
− overfitting: Ciortuz et al.’s ex. book, ch. 2, ex. 8, 19bc, 29, 37b
− pruning strategies for decision trees: Ciortuz et al.’s ex. book, ch. 2, ex. 17, 18, 34, 35
using other impurity neasures as [local] optimality criterion in ID3: Ciortuz et al.’s ex. book,
ch. 2, ex. 13
− other issues: Ciortuz et al.’s ex. book, ch. 2, ex. 11, 12, 14-16, 19ad, 36, 37a
3
Week 6 and 7: Bayesian Classifiers
Read: Chapter 6 from Tom Mitchell’s Machine Learning book (except subsections 6.11 and
6.12.2).
Week 6: application of Naive Bayes and Joint Bayes algorithms:
Ciortuz et al.’s exercise book, ch. 3, ex. 1-4, 11-14
Week 7:
computation of the [training] error rate of Naive Bayes:
Ciortuz et al.’s exercise book, ch. 3, ex. 5-7, 15-17;
some properties of Naive Bayes and Joint Bayes algorithms:
Ciortuz et al.’s exercise book, ch. 3, ex. 8, 20;
comparisons with other classifiers:
Ciortuz et al.’s exercise book, ch. 3, ex. 18-19.
Advanced issues:
classes of ML hypotheses: MAP hypotheses vs. ML hypotheses
Ciortuz et al.’s exercise book, ch. 1 (Probilities and Statistics), ex. 32, 63, and ch. 3, ex. 9-10.
Week 7++: The Gaussian distribution:
Read: “Pattern Classification”, R. Duda, P. Hart, D. Stork, 2nd ed. (Wiley-Interscience, 2000),
Appendices A.4.12, A.5.1 and A.5.2.
The uni-variate Gaussian distribution: pdf, mean and variance, and cdf;
the variable transformation enabling to go from the non-standard case to the standard case;
[Advanced issue: the Central Limit Theorem – the i.i.d. case]
The multi-variate Gaussian distribution: pdf, the case of diagonal covariance matrix
Ciortuz et al.’s exercise book: ch. 1, ex. 23, 24, [27,] 26;
(parameter estimation: 31, 61, 64)
Advanced issues:
Gaussian Bayes classifiers and their relationship to logistic regression
Read: the draft of an additional chapter to T. Mitchell’s Machine Learning book: Generative
and discriminative classifiers: Naive Bayes and logistic regression (especially section 3)
Ciortuz et al.’s exercise book, ch. 3, ex. 21-26.
Week 8: midterm EXAM
4
Week 9: Instance-Based Learning
Read: Chapter 8 from Tom Mitchell’s Machine Learning book.
application of the k-NN algorithm:
Ciortuz et al.’s exercise book, ch. 4, pr. 1-7, 15-21, 23a
comparisons with the ID3 algoritm:
Ciortuz et al.’s exercise book, ch. 4, pr. 11, 13, 23b.
Shepard’s algoritm: Ciortuz et al.’s exercise book, ch. 4, pr. 8.
Advanced issues:
some theoretical properties of the k-NN algorithm:
Ciortuz et al.’s exercise book, ch. 4, pr. 9, 10, 22, 24.
5
Weeks 10-11, 12-14: Clustering
Read: Chapter 14 from Manning and Schütze’ Foundations of Statistical Natural Language
Processing book.
Week 10: Hierarchical Clustering:
see section B in the overview (Rom.: “Privire de ansamblu”) of the Clustering chapter in Ciortuz
et al.’s exercise book;
pr. 1-6, 25-28, 46-47.
Week 11: The k-Means Algorithm
see section C1 in the overview of the Clustering chapter in Ciortuz et al.’s exercise book;
application: pr. 7-11, 29-30,
properties (convergence and other issues): pr. 12-13, 31-34,
a “kernelized” version of k-Means: pr. 35,
comparison with the hierarchical clustering algoritms: pr. 14, 36,
implementation: pr. 48.
Week 12:
Estimating the parameters of probabilistic distributions: the MAP and MLE methods.
Read: Tom Mitchell, Estimating Probabilities, additional chapter to the Machine Learning book
(draft of 24 January 2016).
Ciortuz et al.’s exercise book, ch. 1, pr. 28-31, 60-62, 64, 65.
Week 13:
Using the EM algoritm to solve GMMs (Gaussian Mixture Models), the uni-variate case.
Read: Tom Mitchell, Machine Learning, sections 6.12.1 and 6.12.3;
see section C2 in the overview of the Clustering chapter in Ciortuz et al.’s exercise book;
pr. 15bc-18, 38.
Week 14: EM for GMMs (Gaussian Mixture Models), the multi-variate case:
Ciortuz et al.’s exercise book, the Clustering chapter: pr. 19-22, 23 (the general version), 39-45;
implementation: pr. 49.
Advanced issues: The EM Algorithmic Schemata
Read: Tom Mitchell, Machine Learning, sect. 6.12.2
convergence: Ciortuz et al.’s exercise book, ch. 6, pr. 1-2;
EM for solving mixtures of certain distributions: Bernoulli: pr. 3, 11, categorical: pr. [4,] 5, 12,
Gamma: pr. 14, exponential: pr. 9;
EM for solving a mixture (linear combination) of unspecified distributions: pr. 8;
application to text categorization: pr. [6,] 13;
a “hard” version of the EM algorithm: pr. 15.
Weeks 15-16: final EXAM
6
Related documents