Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Deep Learning Basics Lecture 1: Feedforward Princeton University COS 495 Instructor: Yingyu Liang Motivation I: representation learning Machine learning 1-2-3 β’ Collect data and extract features β’ Build model: choose hypothesis class π and loss function π β’ Optimization: minimize the empirical loss Features π₯ Color Histogram Extract features build hypothesis Red Green Blue π¦ = π€ππ π₯ Features: part of the model Nonlinear model build hypothesis Linear model π¦ = π€ππ π₯ Example: Polynomial kernel SVM π₯1 π¦ = sign(π€ π π(π₯) + π) π₯2 Fixed π π₯ Motivation: representation learning β’ Why donβt we also learn π π₯ ? Learn π π₯ π π₯ Learn π€ π¦ = π€ππ π₯ π₯ Feedforward networks β’ View each dimension of π π₯ as something to be learned β¦ π¦ = π€ππ π₯ β¦ π π₯ π₯ Feedforward networks β’ Linear functions ππ π₯ = πππ π₯ donβt work: need some nonlinearity β¦ π¦ = π€ππ π₯ β¦ π π₯ π₯ Feedforward networks β’ Typically, set ππ π₯ = π(πππ π₯) where π(β ) is some nonlinear function β¦ π¦ = π€ππ π₯ β¦ π π₯ π₯ Feedforward deep networks β’ What if we go deeper? β¦ β¦ β¦β¦ π¦ β¦ β¦ β1 β2 βπΏ π₯ Figure from Deep learning, by Goodfellow, Bengio, Courville. Dark boxes are things to be learned. Motivation II: neurons Motivation: neurons Figure from Wikipedia Motivation: abstract neuron model β’ Neuron activated when the correlation between the input and a pattern π exceeds some threshold π β’ π¦ = threshold(π π π₯ β π) or π¦ = π(π π π₯ β π) β’ π(β ) called activation function π₯1 π₯2 π¦ π₯π Motivation: artificial neural networks Motivation: artificial neural networks β’ Put into layers: feedforward deep networks β¦ β¦ β¦β¦ π¦ β¦ β¦ β1 β2 βπΏ π₯ Components in Feedforward networks Components β’ Representations: β’ Input β’ Hidden variables β’ Layers/weights: β’ Hidden layers β’ Output layer Components First layer Output layer β¦ β¦ β¦β¦ π¦ β¦ β¦ Input π₯ Hidden variables β1 β2 βπΏ Input β’ Represented as a vector β’ Sometimes require some preprocessing, e.g., β’ Subtract mean β’ Normalize to [-1,1] Expand Output layers Output layer π€πβ β’ Regression: π¦ = +π β’ Linear units: no nonlinearity π¦ β Output layers β’ Multi-dimensional regression: π¦ = β’ Linear units: no nonlinearity ππβ Output layer +π π¦ β Output layers Output layer π(π€ π β β’ Binary classification: π¦ = + π) β’ Corresponds to using logistic regression on β π¦ β Output layers Output layer β’ Multi-class classification: β’ π¦ = softmax π§ where π§ = π π β + π β’ Corresponds to using multi-class logistic regression on β π§ β π¦ Hidden layers β¦ β’ Neuron take weighted linear combination of the previous layer β’ So can think of outputting one value for the next layer β¦ βπ βπ+1 Hidden layers β’ π¦ = π(π€ π π₯ + π) β’ Typical activation function π β’ Threshold t π§ = π[π§ β₯ 0] β’ Sigmoid π π§ = 1/ 1 + exp(βπ§) β’ Tanh tanh π§ = 2π 2π§ β 1 π(β ) π₯ π¦ Hidden layers β’ Problem: saturation π(β ) Too small gradient π₯ π¦ Figure borrowed from Pattern Recognition and Machine Learning, Bishop Hidden layers β’ Activation function ReLU (rectified linear unit) β’ ReLU π§ = max{π§, 0} Figure from Deep learning, by Goodfellow, Bengio, Courville. Hidden layers β’ Activation function ReLU (rectified linear unit) β’ ReLU π§ = max{π§, 0} Gradient 0 Gradient 1 Hidden layers β’ Generalizations of ReLU gReLU π§ = max π§, 0 + πΌ min{π§, 0} β’ Leaky-ReLU π§ = max{π§, 0} + 0.01 min{π§, 0} β’ Parametric-ReLU π§ : πΌ learnable gReLU π§ π§