Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Neural Computation and Applications in Time Series and Signal Processing Georg Dorffner Dept. of Medical Cybernetics and Artificial Intelligence, University of Vienna And Austrian Research Institute for Artificial Intelligence June 2003 Neural Computation for Time Series 1 Neural Computation • Originally biologically motivated (information processing in the brain) • Simple mathematical model of the neuron neural network • Large number of simple „units“ • Massively parallel (in theory) • Complexity through the interplay of many simple elements • Strong relationship to methods from statistics • Suitable for pattern recognition June 2003 Neural Computation for Time Series 2 A Unit Activation, Output Weight w1 w2 • Propagation rule: Unit (Neuron) yj f xj • Transfer function f: … wi June 2003 – Weighted sum – Euclidian distance (Net-) Input – Threshold fct. (McCulloch & Pitts) – Linear fct. – Sigmoid fct. – Gaussian fct. Neural Computation for Time Series 3 Multilayer Perceptron (MLP), Radial Basis Function Network (RBFN) • 2 (or more) layers (= connections) Output Units (typically linear) Hidden Units (typically nonlinear) Input Units k n in out MLP: x vlj f wil xi RBFN: x j vlj f l 1 i 1 l 1 1 x2 f x f x e 1 e x k out j June 2003 Neural Computation for Time Series w n i 1 il in 2 i x 4 MLP as Universal Function Approximator • E.g,: 1 Input, 1 Output, 5 Hidden • MLP can approximate arbitrary functions (Hornik et al. 1990) • trough superposition of weighted sigmoids • Similar is true for RBFN out k x g k x w n in j 1 out jk Stretch, mirror June 2003 Neural Computation for Time Series m hid in out f wij xi wihid 0 wj0 i 1 move (bias) 5 Training (Model Estimation) • Iterative optimisation based on gradient (gradient descent, conjugent gradient, quasi-Newton): E E xout wi xout wi contribution of network contribution of error function • Typical error function: n m E xkout,(i ) t k(i ) 2 (summed squared error) i 1 k 1 target all patterns all outputs • „Backpropagation“ (application of chain rule): out out f ' y out j j t j x j June 2003 hid j w f y ' n hid j Neural Computation for Time Series k 1 out out jk k 6 Recurrent Perceptrons • Recurrent connection = feedback loop • From hidden layer („Elman“) or output layer („Jordan“) June 2003 copy Input Zustands- bzw. Kontextlayer Learning: „backpropagation through time“ Neural Computation for Time Series 7 Time series processing • Given: time-dependent observables xt , t 0,1, • Scalar: univariate; vector: multivariate • Typical tasks: - Forecasting - Noise modeling Time series (minutes to days) June 2003 - Pattern recognition - Modeling - Filtering - Source separation Signals (milliseconds to seconds) Neural Computation for Time Series 8 Examples Standard & Poor‘s Sunspots Preprocessed: rt xt xt 1 (returns) Preprocessed: st xt xt 11 (de-seasoned) June 2003 Neural Computation for Time Series 9 Autoregressive models • Forecasting: making use of past information to predict (estimate) the future • AR: Past information = past observations xt F xt 1 , xt 2 ,, xt p t past observations Expected value X t, p Noise, „random shock“ x̂t • Best forecast: expected value June 2003 Neural Computation for Time Series 10 Linear AR models • Most common case: p x t ai xt 1 t i 1 • Simplest form: random walk xt xt 1 t ; t ~ N 0,1 • Nontrivial forecast impossible June 2003 Neural Computation for Time Series 11 MLP as NAR • Neural network can approximate nonlinear AR model • „time window“ or „time delay“ June 2003 Neural Computation for Time Series 12 Noise modeling • Regression is density estimation of: (Bishop 1995) px, t pt | x px Target = future past • Likelihood: n L p t i | xi p xi i 1 Distribution with expected value F(xi) June 2003 Neural Computation for Time Series 13 Gaussian noise • Likelihood: F X t , p ; W xt 2 1 exp 2 2 2 LW pxt | X t , p n n t 1 t 1 • Maximization = minimization of -logL (constant terms can be deleted, incl. p(x)) E F X t , p ; W xt n 2 i 1 • Corresponds to summed squared error (typical backpropagation) June 2003 Neural Computation for Time Series 14 Complex noise models • Assumption: arbitrary distribution ~ D • Parameters are time dependent (dependent on past): g X t , p • Likelihood: N L d g X t(,ip) i 1 Probability density function for D June 2003 Neural Computation for Time Series 15 Heteroskedastic time series • Assumption: Noise is Gaussian with time-dependent variance x X N 1 2 X L e i 1 2t2 X t(,ip) • ARCH model (i ) t 2 t (i ) t,p (i ) t,p 2 p t2 ai rt 2i i 1 • MLP is nonlinear ARCH (when applied to returns/residuals) t2 F rt 1 , rt 2 ,, rt p F ' rt21 , rt22 ,, rt2 p June 2003 Neural Computation for Time Series 16 Non-Gaussian noise • Other parametric pdfs (e.g. t-distribution) • Mixture of Gaussians (Mixture density network, Bishop 1994) k 2 d , , i 1 i X t , p 2 X t , p 2 i e x i X t , p 2 2 i2 X t , p • Network with 3k outputs (or separate networks) June 2003 Neural Computation for Time Series 17 Identifiability problem • Mixture models (like neural networks) are not identifiable (parameters cannot be interpreted) • No distinction between model and noise e.g. sunspot data: • Models have to be treated with care June 2003 Neural Computation for Time Series 18 Recurrent networks: Moving Average • Second model class: Moving Average models • Past information: random shocks q xt bi t i i 0 • Recurrent (Jordan) network: Nonlinear MA t xˆt xt • However, convergence not guaranteed June 2003 Neural Computation for Time Series 19 GARCH • Extension of ARCH: p p i 1 i 1 t2 ai rt 2i bi t2i • Explains „volatility clustering“ • Neural network can again be a nonlinear version • Using past estimates: recurrent network June 2003 Neural Computation for Time Series 20 State space models • Observables depend on (hidden) time-variant state xt Cst t st Ast 1 Bt • Strong relationship to recurrent (Elman) networks • Nonlinear version only with additional hidden layers June 2003 Neural Computation for Time Series 21 Symbolic time series xt si • Examples: alphabet – DNA – Text – Quantised time series (e.g. „up“ and „down“) • Past information: past p symbols probability distribution p xt | xt 1 , xt 2 , , xt p • Markov chains • Problem: long substrings are rare June 2003 Neural Computation for Time Series 22 Fractal prediction machines • Similar subsequences are mapped to points close in space • Clustering = extraction of stochastic automaton June 2003 Neural Computation for Time Series 23 Relationship to recurrent network • Network of 2nd order June 2003 Neural Computation for Time Series 24 Other topics • Filtering: corresponds to ARMA models NN as nonlinear filters • Source separation independent component analysis • Relationship to stochastic automata June 2003 Neural Computation for Time Series 25 Practical considerations • Stationarity is an important issue • Preprocessing (trends, seasonalities) • N-fold cross-validation time-wise (validation set must be after training set • Mean and standard deviation model selection June 2003 test train validation Neural Computation for Time Series 26 Summary • Neural networks are powerful semi-parametric models for nonlinear dependencies • Can be considered as nonlinear extensions of classical time series and signal processing techniques • Applying semi-parametric models to noise modeling adds another interesting facet • Models must be treated with care, much data necessary June 2003 Neural Computation for Time Series 27