Download Local linear regression

Kernel methods - overview  Kernel smoothers  Local regression  Kernel density estimation  Radial basis functions Data Mining and Statistical Learning - 2008 1 Introduction Kernel methods are regression techniques used to estimate a response function y  f ( X ), X  Rd from noisy data Properties: • Different models are fitted at each query point, and only those observations close to that point are used to fit the model • The resulting function is smooth • The models require only a minimum of training Data Mining and Statistical Learning - 2008 2 A simple one-dimensional kernel smoother N fˆ x0    K  x , x  y i 1 N 0 i i  K  x0 , xi  i 1 Observed Fitted 6 5.9 5.8 5.7 where 5.6 5.5 5.4  x  x0  K   1   , if | x  x0 |    0, otherwise 5.3 5.2 5.1 5 4.9 0 5 Data Mining and Statistical Learning - 2008 10 15 20 25 3 Kernel methods, splines and ordinary least squares regression (OLS) • OLS: A single model is fitted to all data • Splines: Different models are fitted to different subintervals (cuboids) of the input domain • Kernel methods: Different models are fitted at each query point Data Mining and Statistical Learning - 2008 4 Kernel-weighted averages and moving averages The Nadaraya-Watson kernel-weighted average N fˆ x0    K  x , x  y 0 i 1 N i i  K  x , x  0 i 1 i  x  x0  K  D     where  indicates the window size and the function D shows how the weights change with distance within this window The estimated function is smooth! K-nearest neighbours fˆ ( x)  Ave( yi | xi  Nk ( x)) The estimated function is piecewise constant! Data Mining and Statistical Learning - 2008 5 Examples of one-dimesional kernel smoothers • Epanechnikov kernel    3 1  t 2 if t  1 D(t )   4  0 otherwise • Tri-cube kernel   3 3   1 t if t  1 D(t )     0 otherwise Data Mining and Statistical Learning - 2008 6 Issues in kernel smoothing • The smoothing parameter λ has to be defined • When there are ties at xi : Compute an average y value and introduce weights representing the number of points • Boundary issues • Varying density of observations: – bias is constant – the variance is inversely proportional to the density Data Mining and Statistical Learning - 2008 7 Boundary effects of one-dimensional kernel smoothers Locally-weighted averages can be badly biased on the boundaries if the response function has a significant slope apply local linear regression Data Mining and Statistical Learning - 2008 8 Local linear regression Find the intercept and slope parameters solving The solution is a linear combination of yi: Data Mining and Statistical Learning - 2008 9 Kernel smoothing vs local linear regression Kernel smoothing Solve the minimization problem N min a ( x0 )  K  ( x0 , xi )[ y i   ( x0 )] 2 i 1 Local linear regression Solve the minimization problem N min a ( x0 ),  ( x0 )  K  ( x0 , xi )[ y i   ( x0 )   ( x0 ) xi ] 2 i 1 Data Mining and Statistical Learning - 2008 10 Properties of local linear regression • Automatically modifies the kernel weights to correct for bias • Bias depends only on the terms of order higher than one in the expansion of f. Data Mining and Statistical Learning - 2008 11 Local polynomial regression • Fitting polynomials instead of straight lines Behavior of estimated response function: Data Mining and Statistical Learning - 2008 12 Polynomial vs local linear regression Advantages: • Reduces the ”Trimming of hills and filling of valleys” Disadvantages: • Higher variance (tails are more wiggly) Data Mining and Statistical Learning - 2008 13 Selecting the width of the kernel Bias-Variance tradeoff: Selecting narrow window leads to high variance and low bias whilst selecting wide window leads to high bias and low variance. Data Mining and Statistical Learning - 2008 14 Selecting the width of the kernel fˆ  S  y, S  ij  l j  xi  1. Automatic selection ( cross-validation) 2. Fixing the degrees of freedom df  traceS   Data Mining and Statistical Learning - 2008 15 Local regression in RP The one-dimensional approach is easily extended to p dimensions by • Using the Euclidian norm as a measure of distance in the kernel. • Modifying the polynomial   b X   1, X 1 , X 2 , X 12 , X 1 X 2 , X 22 , Data Mining and Statistical Learning - 2008 16 Local regression in RP ”The curse of dimensionality” • The fraction of points close to the boundary of the input domain increases with its dimension • Observed data do not cover the whole input domain Data Mining and Statistical Learning - 2008 17 Structured local regression models Structured kernels (standardize each variable) Note: A is positive semidefinite Data Mining and Statistical Learning - 2008 18 Structured local regression models Structured regression functions • ANOVA decompositions (e.g., additive models) Backfitting algorithms can be used • Varying coefficient models (partition X) • INSERT FORMULA 6.17 Data Mining and Statistical Learning - 2008 19 Structured local regression models Varying coefficient models (example) Data Mining and Statistical Learning - 2008 20 Local methods • Assumption: model is locally linear ->maximize the loglikelihood locally at x0: • Autoregressive time series. yt=β0+β1yt-1+…+ βkyt-k+et -> yt=ztT β+et. Fit by local least-squares with kernel K(z0,zt) Data Mining and Statistical Learning - 2008 21 Kernel density estimation • Straightforward estimates of the density are bumpy • Instead, Parzen’s smooth estimate is preferred: Normally, Gaussian kernels are used Data Mining and Statistical Learning - 2008 22 Radial basis functions and kernels Using the idea of basis expansion, we treat kernel functions as basis functions: where ξj –prototype parameter, λj-scale parameter Data Mining and Statistical Learning - 2008 23 Radial basis functions and kernels Choosing the parameters: • • Estimate {λj, ξj } separately from βj (often by using the distribution of X alone) and solve least-squares. Data Mining and Statistical Learning - 2008 24

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Local linear regression