Download lecturer noteof jkpa..

LECTURER NOTE MATH-IV BSCM1210 NUMERICAL ANALYSICS AND THEORY OF PROBABILITY AND STATISTICS J.K.PATI Lecturer note -1 Introduction, approximation, Round of errors: All numerical methods involve approximations due to either limits in the algorithm or physical limits in the computer hardware. Errors associated with measurements or calculations can be characterized with reference to accuracy and precision. Accuracy refers to how closely a computed or measured value agree with the true value. Precision refers to how close measured or computed values agree with each other after repeated sampling. Floating point numbers: Computer memory can only store a finite number of digits. Therefore, a question becomes apparent: given a fixed number of digits how can we define a representation so that it gives the largest coverage of real numbers? An obvious method to use is the scientific notation, i.e., a number of very large or very small magnitude is represented as a truncated number multiplied by an appropriate power of 10. For example, 2.597 − 03 represents 2.597 × 10−3. General form of floating point number x=± 0.m×10e where m is called mantissa( or M bits), e is called exponent(orE bits) Base 𝜷 A base 𝛽 floating point number consists of a fraction f containing the significant figure of the number and exponent e. The value of the number is f.𝜷𝒆 This floating point number is said to be normalized if 𝛽 −1 ≤ 1 So Obviously, 2.597× 10−3 is not normalized, while , 0.2597× 10−3 is. Commonly used bases: • binary — base 2, used by most of computer systems. • decimal — base 10, used in most of hand calculators. • hex — base 16, IBM mainframes and clones. Approximation of Numbers. In approximation of numbers, a finite number of digits( or sometimes called bits) after the decimal point are retained. Let the number is x= 0.1234× 103 is approximated upto three digit floating point form. Then the result is x*= 0.123× 103 Mainly we do the approximation in following two ways Rounding and Chopping Process for rounding off numbers: The following rules are followed to round off the numbers. Suppose we desire to retain digits upto kth decimal place of number x and let X=anan-1----------a0a-1.a-2-------a-ka-(k+1) (i) If a-(k+1)<5 then a-k is not changed. (ii) If a-(k+1)>5 then a-k is increased by one (iii) If a-(k+1)=5 , increase a-k by one if a-k is odd and leave if it is even. In case of chopping no such restrictions are there rather than discard the digits so long we want to approximate of the mantissa part. The number 3.42543 after rounding off to 3 decimal places becomes 3.425 Significant digits: The first nonzero digit and all the digits right to the decimal point of a number are defined to be significant digits of the number. For example the significant digits of the number 2.103 are 2,1,0,3 and that of 0.0103 are 1,0( between 1 and3) and 3 Floating point arithmetic: Consider the addition of the numbers by using 4 digit floating point arithmetic. X= 0.1234501× 103 and y=0.132045× 103 Using chopping we have x+y=( 0.1234+0.1320) × 103 Similarly we also have difference and multiplication of numbers using floating point arithmetic. Lecturer note -2 Floating point arithmetic continued and errors: Drawbacks of floating point arithmetic The following problems are usually encountered in K digit arithmetic. (i) Loss of accuracy: This is explained through the following example. 𝟏 Consider the addition of the numbers x= and y=1234 using chopping by 𝟑 (ii) four digit floating point arithmetic. Now x*=0.3333 and y*= 0.1234× 104 In this case the two numbers in floating point form are having different exponents. The operand with larger exponent is kept as it is and the mantissa with smaller exponent is adjusted so as to make the exponent equal to the larger exponent. So x*=0.00003333× 104 =0.0000× 104 We conclude that x is effectively zero compared to y which is known as result of accuracy. Loss of significance: When two nearly equal numbers are substracted there is a loss of significant figures. Algebraic manipulation to avoid the loss of significance: There are no specific methods for algebraic manipulations to avoid loss of significance. The type of manipulation required differs from problem to problem and can’t be predicted before hand. Example: Solve the quadratic equation and draw your conclusion regarding loss of significance. x 2  40 x  2  0 using 4 significant digit in the computation Solution: X= Now we have the roots corresponds to this equation as −𝑏±√𝑏2 −4𝑎𝑐 Now x1= 2𝑎 −𝑏+√𝑏2 −4𝑎𝑐 2𝑎 , −𝑏−√𝑏2 −4𝑎𝑐 x2= 2𝑎 Now √𝑏 2 − 4𝑎𝑐= √3.98= 19.95, so x1=39.95, x2=0.05 This result is poor since it involves loss of significant digits. 𝟐.𝟎𝟎𝟎 If we compute x2 by using x2=c|ax1 we have x2= 𝟑𝟗.𝟗𝟓 =0.05006 This is a better result over previous value of x2 Errors: Numerically computed solutions are subject to certain errors. Mainly there are following types of errors. They are inherent errors, truncation errors and errors due to rounding. 1. Inherent errors or experimental errors arise due to the assumptions made in the mathematical modeling of problem. It can also arise when the data is obtained from certain physical measurements of the parameters of the problem. i.e., errors arising from measurements. 2. Truncation errors are those errors corresponding to the fact that a finite (or infinite) sequence of computational steps necessary to produce an exact result is “truncated” prematurely after a certain number of steps. 3. Round of errors are errors arising from the process of rounding off during computation. These are also called chopping, i.e. discarding all decimals from some decimals on. Simple error definitions: There are a number of ways to describe errors in measurements and calculations. The simplest is the absolute error, this is the difference between the measured or calculated value and the true value i.e 𝜀= |True value- approximation| so Approximate value = True value + Error. A shortcoming of the absolute error is that it doesn’t take into account the order of magnitude of the value under consideration. One way to account for the magnitude is to consider instead the relative error. So the relative error is 𝜺= |𝑻𝒓𝒖𝒆 𝒗𝒂𝒍𝒖𝒆−𝒂𝒑𝒑𝒓𝒐𝒙𝒊𝒎𝒂𝒕𝒊𝒐𝒏| |𝑻𝒓𝒖𝒆 𝒗𝒂𝒍𝒖𝒆| ×100 For example, consider the value of √2(1.414213...) up to four decimal places, then √2 1.4142 Error . 𝜺 = 1.4142 1.41421 = .00001, taking 1.41421 as true or exact value. Hence, the relative error is 𝟎.𝟎𝟎𝟎𝟎𝟏 𝟏.𝟒𝟏𝟒𝟐 . Rounding errors originate from the fact that computers can only represent numbers using a fixed and limited number of significant figures. Thus, numbers such as 𝜋 or √2 cannot be represented exactly in computer memory. The discrepancy introduced by this limitation is call round-off error. Even simple addition can result in round-off error. Truncation errors in numerical analysis arise when approximations are used to estimate some quantity. Often a Taylor series is used to approximate a solution which is then truncated. The figure below shows a function f(xi) being approximated by a Taylor series that has been truncated at different levels. The more terms that are retained in the Taylor series the better the approximation and the smaller the truncation error. Taylor series The following way we approximate to have truncation error. For the zero order approximation we have For the first order approximation we have Similarly for the second order approximation we have Lecturer note -3 Roots of an equation: Numerical Iteration Method: A numerical iteration method or simply iteration method is a mathematical procedure that generates a sequence of improving approximate solutions for a class of problems. A specific way of implementation of an iteration method, including the termination criteria, is called an algorithm of the iteration method. In the problems of finding the solution of an equation an iteration method uses an initial guess to generate successive approximations to the solution. Since the iteration methods involve repetition of the same process many times, computers can act well for finding solutions of equation numerically. Some of the iteration methods for finding solution of equations involves (1) Bisection method (2) Method of false position (Regula-falsi Method) (3) Newton-Raphson method. (4) Fixed point iteration method. (5) Muller’s method A numerical method to solve equations may be a long process in some cases. If the method leads to value close to the exact solution, then we say that the method is convergent. Otherwise, the method is said to be divergent. Solution of Algebraic and Transcendental Equations: One of the most common problem encountered in engineering analysis is that given a function f (x), find the values of x for which f(x) = 0. The solution (values of x) are known as the roots of the equation f(x) = 0, or the zeroes of the function f (x). The roots of equations may be real or complex. In general, an equation may have any number of (real) roots, or no roots at all. For example, sin x – x = 0 has a single root, namely, x = 0, whereas tan x – x = 0 has infinite number of roots (x = 0, ± 4.493, ± 7.725, …). Algebraic and Transcendental Equations: f(x) = 0 is called an algebraic equation if the corresponding f (x) is a polynomial. An example is 7x2+ 6x+8=0 f (x) 0 is called transcendental equation if the f (x) contains trigonometric, or exponential or logarithmic functions. Examples of transcendental equations are sin x – x = 0, tan x x 0 There are two types of methods available to find the roots of algebraic and transcendental equations of the form f (x) = 0. 1. Direct Methods: Direct methods give the exact value of the roots in a finite number of steps. We assume here that there are no round off errors. Direct methods determine all the roots at the same time. 2. Indirect or Iterative Methods: Indirect or iterative methods are based on the concept of successive approximations. The general procedure is to start with one or more initial approximation to the root and obtain a sequence of iterates xk which in the limit converges to the actual or true solution to the root. Indirect or iterative methods determine one or two roots at a time. The indirect or iterative methods are further divided into two categories: bracketing and open methods. The bracketing methods require the limits between which the root lies, whereas the open methods require the initial estimation of the solution. Bisection and False position methods are two known examples of the bracketing methods. Among the open methods, the Newton-Raphson is most commonly used. The most popular method for solving a non-linear equation is the Newton-Raphson method and this method has a high rate of convergence to a solution. Intermediate value theorem for continuous functions: If f is a continuous function and f (a) and f (b) have opposite signs, then at least one root lies in between a and b. If the interval (a, b) is small enough, it is likely to contain a single root. The interval [a, b] must contain a zero of a continuous function f if the product f (a) f (b) 0. Geometrically, this means that if f (a) f (b) 0, then the curve f has to cross the x-axis at some point in between a and b. Bisection method: We are looking for a root of a function f(x) which we assume is continuous on the interval [a, b]. We also assume that it has opposite signs at both edges of the interval, i.e., f(a)f(b) < 0. We then know that f(x) has at least one zero in [a, b]. Of course f(x) may have more than one zero in the interval. The bisection method is only going to converge to one of the zeros of f(x). There will also be no indication as of how many zeros f(x) has in the interval, and no hints regarding where can we actually hope to find more roots, if indeed there are additional roots. The first step is to divide the interval into two equal subintervals, i.e c=(a+b)|2 . This generates two subintervals, [a, c] and [c, b], of equal lengths. We want to keep the subinterval that is guaranteed to contain a root. Of course, in the rare event where f(c) = 0 we are done. Otherwise, we check if f(a)f(c) < 0. If yes, we keep the left subinterval [a, c]. If f(a)f(c) > 0, we keep the right subinterval [c, b]. This procedure repeats until the stopping criterion is satisfied: we fix a small parameter 𝜖 > 0 and stop when |f(c)| < 𝜖. Where 𝜖 is the error of accuracy. Note: How many iterations are needed in order that the interval length is less then 𝜖 Let L0 = b − a. From the construction of the bisection method we see that after k iterations, the length becomes Lk =(L0)| 2k. We require Lk ≤ 𝜖 This implies k≤ 𝐥𝐨𝐠 𝟐 𝒍𝟎 𝜺 𝒍𝟎 We choose k =⌈𝐥𝐨𝐠 𝟐 𝜺 ⌉ where ⌈·⌉ denotes the ceiling function. Example: If b − a = 1 and 𝝐 =10-6, then k=20 Solved problem: Example: Solve x3 – 9x+1 = 0 for the root between x = 2 and x = 4, by bisection method. Solution: Given f (x) x3 9x 1 . Now f (2) 9, f (4) 29 so that f (2) f (4) 0 and hence a root lies between 2 and 4. Set a = 2 and b= 4. Then x0= (a+b)|2=3 Since f (2) f (3) 0 , a root lies between 2 and 3, hence we set a1 = a= 2 and b1=x0=3 Then the next approximation is x1= (2+3)|2= 2.5 Now f (2.5) 5.875 Since f (2) f (2.5) 0, root lies between 2.5 and 3, hence we set a2 x12.5 and b2 =b1=3. Continue this process till we get the exact root i.e 2.9375 Example: Find a real root of the equation f (x) x3x 1 0. Solution: Since f (1) is negative and f (2) positive, a root lies between 1 and 2 and therefore we take x0= (3|2)=1.5 Then f (x0) is positive and hence f (1) f (1.5)0 and Hence the root lies between 1 and 1.5 we obtain x1= (1+1.5)|2= 1.25 Now f (x1 ) 19 / 64, which is negative and hence f (1) f (1.25)0 and hence a root lies between 1.25 and 1.5 The procedure is repeated and the successive approximations are X3 1.3125, x41.34375, etc. Merits of bisection method a) The iteration using bisection method always produces a root, since the method brackets the root between two values. b) As iterations are conducted, the length of the interval gets halved. So one can guarantee the convergence in case of the solution of the equation. Demerits of bisection method a) The convergence of the bisection method is slow as it is simply based on halving the interval. b) Bisection method cannot be applied over an interval where there is a discontinuity. c) Bisection method cannot be applied over an interval where the function takes always values of the same sign. d) The method fails to determine complex roots. Lecturer note -4 Method of false position and introduction to NR method: Regula Falsi method or Method of False Position This method is also based on the intermediate value theorem. In this method also, as in bisection method, we choose two points a and b such that f( a) and f (b) are of opposite signs i.e f(a). f(b)<0 Then, intermediate value theorem suggests that a zero of f lies in between a and b if f is a continuous function. Since f(a). f(b)<0, the curve y=f(x) crosses the x axis only once at the point x=m in between a and b Consider the points A(a, f(a)) and B(b,f(b)) on the curve y= f(x). Then the equation of the chord AB is y- f(a)= n(x-a), where n= (f(b)-f(a))|(b-a) At the point C where chord AB crosses x axis, we have for y=0 the above equation leads to X=a- (1|n)[f(a)] where n= (f(b)-f(a))|(b-a) This gives the x co-ordinate of the approximate root C. If the interval [a,b] is sufficiently small, the x co-ordinate of the point c is sufficiently close to the point x=m which is the exact root. In otherwords x given by the above equation serves as an approximate value of m when b-a is sufficiently small. Related problems: Example: Find an approximate value of the root of the equation x3+ x-1=0 near x=1 using the method of false position( regularfalsi) two times. Solution: f(x)= x3+ x-1=0, f(1)=1, f(0.5)= -0.375 So the root lies between 0.5 and 1 Let x1=0 and x2=1 So x= (x1f(x2)-x2f(x1))| (f(x2)-f(x1)) = (0.5+0.375)|(1.375)= 0.64 Now f(0.64)=-0.0979, f(1)=1 Root lies between 0.64 and 1 Hence x1= 0.64, x2=1 So x= [0.64(1)-1(-0.0979)]|(1+0.0979)= 0.672 Now f(0.672) = -0.0245 and f(1)=1 So x1= 0.672, x2=1 So x= (0.672+0.245)|(1.0245)=0.6822 Hence the approximate root is x=0.68 Note: The bisection and regular-falsi method is always convergent. Since the method brackets the root, the method is guaranteed to converge. The main disadvantage is, if it is not possible to bracket the roots, the methods cannot applicable. For example, if f (x) is such that it always takes the values with same sign, say, always positive or always negative, then we cannot work with bisection method. Some examples of such functions are f (x) x3 which take only non-negative values and f (x) x2, which take only non-positive values. Newton Raphson method: The Newton-Raphson method, or Newton Method, is a powerful technique for solving equations numerically. Like so much of the differential calculus, it is based on the simple idea of linear approximation. Consider f (x) 0 , where f has continuous derivative f . Let at x=a, y=f(a)=0 , which means that a is a solution to the equation f (x) 0 . In order to find the value of a, we start with any arbitrary point x0 Let the tangent to the curve y=f(x) at the point (x0, f(x0)) with slope f′(x0) touches x axis at x1 f ( x0 )  f ( x1 ) So tan𝛽= f′(x0)= x0  x1 f ( x0 ) As f(x1)=0 , the above simplifies to x1  x0  f ( x0 ) f ( x1 ) f ( x1 ) Proceeding likewise we have the final iteration for n+1 th approximation is f ( xn ) xn 1  xn  f ( xn ) In the second step , we compute x2  x1  Lecturer note -5 NR method continued and Fixed point iteration method: Geometrical interpretation of NR method: So Newton raphson method sometimes called the tangent method. Related problem based on NR method: Example: Set up a Newton iteration for computing the square root of a given positive number. Using the same find the square root of 2 exact to six decimal places. Solution: Let c be a given positive number and let x be its positive square root, so that x=√𝑐 . Then x2=c so wehave f ( x)  x 2  c  0 f ( x)  2 x Using the Newtons iteration formula we have xn1  xn  So xn 1  x2n  c 2 xn 1 c ( xn  ) 2 xn Now to find the square root of 2 we put c=2 in the above formula. So xn 1  1 2 ( xn  ) 2 xn Choose x0=1, we have the other approximations are X1=1.500000, x2= 1.416667, x3= 1.4142, x4= 1.4142 Hence the square root of 2 correct to six decimal places is 1.414. Example: Apply Newton’s method to solve the algebraic equation f(x)=x3+x-1=0, correct to six decimal places start with x0=1. Solution: Now f(x)=x3+x-1=0, f′(x)=3x2+1 Substituting these in the Newton’s formula we have xn1  2 xn3  1 n=0,1,2--3xn2  1 Starting from x0= 1.0000, we have x1= 0.7500, x2= 0.686047, x3=0.682340 So we accept 0.6823 as an approximate solution of f(x)=x3+x-1=0 correct to six decimal places. Note: Newton’s formula converges provided the initial approximation x0 is choosen sufficiently close to the exact root. Proper choice of initial guess is very important for the success of Newton’s method. It is applicable to find the solution of both algebraic and transcendental equation and can also be used when the roots are complex. Fixed point iteration method: Consider f(x)=0 be the equation.------------------------(1) Transform the equation f(x)=0 in the form x=∅(x)--------(2) Take an arbitrary x0 and then compute a sequence x1, x2, x3,----recursively from a relation of the form xn+1= =∅(xn), n=0,1,-----------------------(3) The solution of (2) is called fixed point of ∅. To a given equation (1) there may correspond several equations (2) and the behavior especially, as regards speed of convergence of iterative sequences x0, x1------may differ accordingly. Conditions to find the suitable iterative function ∅(x) Let x=𝛿 be a root of f(x)=0 and let I be an interval containing the point x= 𝛿. Let ∅(x) be continuous in I where ∅(𝑥) is defined by the equation x=∅(x) which is equivalent to f(x)=0. Then if |∅′ (𝑥)|< 1 for all x in I , the sequence of approximations x0, x1,-----defined by xn+1= =∅(xn), converges to the root 𝛿, provided that the initial approximation is choosen in I. Related problems: Example: Solve f(x)= x2-3x+1=0 by fixed point iteration method. Solution: Write the given equation as x2=3x-1 or x=3Choose ∅(x)= 3- 𝟏 𝒙 Then ∅′ (𝑥)= 1 𝑥2 𝟏 𝒙 and |∅′ (𝑥)|< 1 on the interval (1,2) Hence the iteration formula can be applied The iterative formula is xn+1= 3- 𝟏 𝒙𝒏 n=0,1,2--- Starting with x0=1 we obtain the next approximations are x1=2.00,x2=2.5,x3=2.60 Lecturer note -6 Muller’s method: Newton’s method uses a local linear approximation to the function f. In the secant method we start with initial guess. But in Mullers method we use 3 initial guesses and determines the intersection with the axis of a parabola. Note that this is done by finding the root of an explicit quadratic equation. The case where the roots are not real is handled as well, though the geometric interpretation is more complicated. (This is where the analyticity of the function is important, it makes the value of the function for a complex argument meaningful). Muller’s method is based on a quadratic approximation. The steps of the mullers method are Step-1: Given 3 points xk-2, xk-1 and xk , find a quadratic function g(x)= a+bx+cx2 Such that g(xi)= f(xi) where i= k-2,k-1,k Step-2: Solve g(x)=0 for xk+1 that lies nearest xk How to get the coefficient of the quadractic function : Given 3 points (xi,f(xi)), i=0,1,2 on the curve y=f(x), we find a quadratic function of the form g(x)=a(x-x2)2+ b(x-x2)+c which passes through the three points i.e g satisfies g(x0)= a(x0-x2)2+ b(x0-x2)+c =f(x0)=f0 g(x1)= a(x1-x2)2+ b(x1-x2)+c =f(x1)=f1 g(x2)=c=f2 Solving this systemof equation we have To find x3 i.e zero of g, apply the quadractic formula to g. There will be two roots. The root we are intrested in is the one that is close to x2. To avoid roundoff errors due to subtraction of two nearly equal numbers we use x3  x2  2c b  b 2  4ac use the sign that agrees with the discriminant, i.e. the one that gives largest denominator, and a result closer to x3. x3  x2  2c b  sgn(b) b 2  4ac Once x3 is determined, Let x0=x1, x1=x2, x2=x3 and repeat the same process till we get the root upto the desired accuracy. Related problem: Example: Use Mullers method solve the equation f(x)=𝑒 𝑥 +1=0 by taking the initial approximations x0=1, x1=0, x2=-1 Solution: Now the quadratic polynomial is g(x)= a(x+1)2+b(x+1)+c So the value of a,b,c is Now x3 is found as the root of g(x) i.e close to x2, that is So x3= -1.0820+1.5849i we use the positive sign in front of the square root in the denominator to match the sign of in order to choose the value of x3  x2 smallest absolute value. Of course, in this case since the term under the square root is negative, the two roots have the same absolute value, but, we choose this one even so, since that is the way the algorithm is defined. However, this raises the issue of how to pick the sign of the square root when may not be real. The guiding principal is to always make the choice that picks the root of the quadratic that is closest to our most recent estimate. At the next iteration, we get Note: Speed and rate of convergence of numerical methods: The numerical method is said to have rateof convergence p if we have the relation 𝒑 𝜺𝒌+𝟏 = c𝜺𝒌 , where 𝜺𝒌+𝟏 is the error present in (k+1)th approximation. 𝜺𝒌 is the error present in k th appoximation. The rate of convergence of Regular falsi methd is 1.62, and of muller method is 1.84,and of NR method is 2 Lecturer note -7 System of linear equation and LU decompisition: A system of m linear equations in n unknowns x1, x2, . . . , xn is a set of equations of the form a11 x1 + a12 x2 + . . . + a1n x n = b1 a21 x1 + a22 x2 + . . . + a2n x n = b2 .............................. a m1 x1 + a m2 x2 + . . . + a mn x n = bm where the coefficients ajk and the bj are given numbers. The system is said to be homogeneous if all the bj are zero; otherwise, it is said to be non-homogeneous. The system of linear equations is equivalent to the matrix equation (or the single vector equation) Ax b where the coefficient matrix A aij] is the m n matrix and x and b are the column matrices (vectors) given by:  a11 a12  A=  a21 a22 a  m1 am 2 a1n   a2 n  , amn   x1   b1      x2  b2  and b    x  x3   b3       bm   xn  A solution of the system is a set of numbers x1, x2, . . . , xn which satisfy all the m equations, and a solution vector of (1) is a column matrix whose components constitute a solution of system. The method of solving such a system using methods like Cramer’s rule is impracticable for large systems. Hence, we use other methods like Gauss elimination, Matrix method, LU decomposition etc. Triangulation Method (LU Decomposition Method): In linear algebra, LU decomposition (also called LU factorization) factorizes a matrix as the product of a lower triangular matrix and an upper triangular matrix Let A be a non-singular square matrix. LU decomposition is a decomposition of the Form A=LU where L is a lower triangular matrix and U is an upper triangular matrix. This means that L has only zeros above the diagonal and U has only zeros below the diagonal Let we have a system of linear equations for three variables x1, x2,x3 in the above form. So A will be a matrix of order 3 To solve the system of equations by LU decomposition, first we decompose A as LU 1 0  Where L   l21 1 l  31 l32 0  u11 u12 u13     0  and U   0 u22 u23  0 1  0 u33   This gives, LUx = b. Let Ux=y. This implies, Ly=b.  1 0 0  y1   b1       That is,  l21 1 0  y2    b2  l      31 l32 1  y3   b3  Hence y1=b1, l21 y1+y2=b2, l31 y1+ l32 y2+ y3=b3 This gives the y values by forward substitution, which means, substitute the value of y1 given by the first equation in the second and solve y2 , then use these values of y1 and y2 in third equation to get y3 Then the system of equations Ux=y implies that  u11 u12   0 u22 0 0  u13  x1   y1      u23  x2    y2     u33   x3   y3  It gives the required values of x1, x2, x3 as the solution of the original system of linear equations by backward substitution. LU factorization of a matrix:  a11 a12  Let A is the matrix of order 3, where A=  a21 a22 a  31 a32 1 0 Let A= LU where L   l21 1 l  31 l32 a13   a23  a33  0  u11 u12 u13     0  and U   0 u22 u23  0 1  0 u33   By simple matrix multiplication and equating the corresponding terms of matrix A we get the value of entries of L as well as U. Lecturer note -8 LU problem continued and inverse of a matrix: LU decomposition of a matrix needn’t be unique. It is not necessary to take the diagonal elements of L or U is 1. Related problems: Example: Solve the following system of equations by LU decomposition. 2x+3y+z=9 x+2y+3z=6 3x+y+2z=8. Solution: The above system of equations is written as,  2 3 1  x   9        1 2 3  y    6   3 1 2  z   8       2 3 1  1 0 Let A=LU as  1 2 3  =  l21 1  3 1 2 l    31 l32 0  0 1   u11 u12 u13     0 u22 u23  0 0 u33   we equate the corresponding terms of A and LU , we obtain u11=2, u12=3, u13=1 l21  a a21 1 3 1 1 5 3  , l31  31  , u22  a22  l21u12  2   , u23  a23  l21u13  3  1  2 2 2 2 u11 2 u11 2 l32  a32  l31u12  7 , u33  a33  (l31u13  l32u23 )  18 u22 So  1 2 3 1    1 1 2 3 =  2  3 1 2    3  2  0 2 3 1    1 5 1 0 0  2 2   0 0 18   7 1    0  1  1 2 3  2  0 2 3 1    1 5 1 0 0  2 2   0 0 18   7 1    0  x  9      y    6  z  8     Consider 2  0  0  1   x   y1  1 5     y  y2 2 2      z   y3  0 18    1  1 Then  2 3  2 3 ,  0   y1   9      1 0   y2    6        y3   8  7 1   0 Solving these we get y1=9, y2=3|2, y3=5 2  Again  0  0  1 9  x    1 5    3  y  2 2     2  z 0 18     5  3 Now, solving the above expression we obtain the values of x, y and z as a solution of the given system of equations as,  35     x   18     29   y    18  z      5   18  To find inverse of a matrix by using LU decomposition: As per the decomposition we have A=LU. One thing we keep in mind that inverse of an uppertriangular matrix is uppertriangular and also for lower triangular matrix is lower triangular . Using this concept we can find L-1, U-1 Then A-1= U-1 L-1 Lecturer note -9 Solution of System of Equations by process of iteration: The methods discussed in the previous section belong to the direct methods for solving systems of linear equations; these are methods that yield solutions after an amount of computations that can be specified in advance. In this section, we discuss indirect or iterative methods in which we start from an initial value and obtain better and better approximations from a computational cycle repeated as often as may be necessary, for achieving a required accuracy, so that the amount of arithmetic depends upon the accuracy required. Gauss Seidel iteration method: Consider a linear system of n linear equations in n unknowns x1, x2,------,xn of the form a11x1+ a12x2+--------------------+a1nxn=b1 a21x1+ a22x2+--------------------+a2nxn=b2 an1x1+an2x2+-----------------------+annxn=bn in which the diagonal elements aii do not vanish. A sufficient condition for obtaining a solution by this method is the diagonal n dominance,i.e | aii |  aii ,i  1, 2,   , n , j≠I j 1 i.e., in each row of A the modulus of the diagonal element exceeds the sum of the off diagonal elements and also the diagonal elemens aii≠0. If any diagonal element is 0, the equations can always be re-arranged to satisfy this condition. The above system can be written as a a b a x1  1  12 x2  13 x3       1n xn a11 a11 a11 a11 a a b a x2  2  21 x1  23 x3       2 n xn a22 a22 a22 a22 xn  a bn an1 a  x1  n 2 x2       n,n1 xn1 ann ann ann ann Suppose we start with x1(0), x2(0),---------xn(0) as initial values to the variables x1, x2,---xn. Step-1 Now the next approximations are a b1 a12 (0) a13 (0)  x2  x3       1n xn (0) a11 a11 a11 a11 a a b a Similarly x2(1)  2  21 x1(1)  23 x3(0)       2 n xn (0) a22 a22 a22 a22 a b a a So xn (1)  n  n1 x1(1)  n 2 x2(1)       n,n1 xn1(1) ann ann ann ann The process is repeated in this manner upto number of steps as mentioned. x1(1)  Related problems: Example: Using Gauss Siedel iteration solve the following system of equations, in three steps starting from 1, 1, 1. 10x y z 6 x 10y z 6 x y 10z 6 Solution: Now the system is diagonally dominant and we have x 0.6 0.1 y 0.1z y= 0.6- 0.1x- 0.1z z= 0.6-0.1x-0.1y Starting with initial approximation x(0)=1, y(0)=1, z(0)=1 we have the next approximations are Step-1 x(1)  0.6  0.1y(0)  0.1z (0)  0.4 y (1)  0.6  0.1x(1)  0.1z (0)  0.46 z (1)  0.6  0.1x(1)  0.1y (1)  0.514 Step-2 Using x(1)=0.4, y(1)=0.46, z(1)=0.514 we have x(2)  0.6  0.1y (1)  0.1z (1)  0.5026 y (2)  0.6  0.1x(2)  0.1z (1)  0.49834 z (2)  0.6  0.1x(2)  0.1y (2)  0.499906 Similarly step -3 is obtained by using the same process and result in step-2 Lecturer note -10 Interpolation and finite differences operators: Consider a single valued continuous function y f (x) defined over [a,b] where f (x) is known explicitly. It is easy to find the values of ‘y’ for a given set of values of ‘x’ in [a,b]. i.e., it is possible to get information of all the points (x, y) where a x b. But the converse is not so easy. That is, using only the points (x0,y0), (x1,y1),------(xn,yn) where a xi b i n , it is not so easy to find the relation between x and y in the form y f (x) explicitly. That is one of the problem we face in numerical differentiation or integration. Now we have first to find a simpler function, say g(x) such that f (x) and g(x) agree at the given set of points and accept the value of g(x) as the required value of f (x) at some point x in between a and b. Such a process is called interpolation. If g(x) is a polynomial, then the process is called polynomial interpolation. When a function f(x) is not given explicitly and only values of f (x) are given at a set of distinct points called nodes or tabular points, using the interpolated function g(x) to the function f(x), the required operations intended for f (x) , like determination of roots, differentiation and integration etc. can be carried out. The approximating polynomial g(x) can be used to predict the value of f (x) at a nontabular point. The deviation of g(x) from f (x) , that is f (x) g(x) is called the error of approximation. Consider a continuous single valued function f (x) defined on an interval [a, b]. Given the values of the function for n + 1 distinct tabular points x0,x1,x2,-------xn such that a≤x0≤x1≤----------≤xn≤b. . The problem of polynomial interpolation is to find a polynomial g(x) or pn(x) of degree n, which fits the given data. The interpolation polynomial fitted to a given data is unique. If we are given two points satisfying the function such as (x0,y0), (x1,y1) where y0=f(x0)and y1=f(x1), it is possible to fit a unique polynomial of degree 1. If three distinct points are given, a polynomial of degree not greater than two can be fitted uniquely. In general, if n+ 1 distinct points are given, a polynomial of degree not greater than n can be fitted uniquely. FINITE DIFFERENCES OPERATORS: For a function y=f(x), it is given that y0, y1,------,yn are the values of the variable y corresponding to the equidistant arguments x0, x1,x2,-----,xn where x1=x0+h, x2=x0+2h, ---xn=x0+nh In this case, even though interpolation polynomials can be used for interpolation, some simpler interpolation formulas can be derived. For this, we have to be familiar with some finite difference operators and finite differences, . Finite differences deal with the changes that take place in the value of a function f(x) due to finite changes in x. Finite difference operators include, forward difference operator, backward difference operator, shift operator, central difference operator and mean operator. Forward difference operator ( ) : For the values y0, y1,------,yn of a function y=f(x), for the equidistant values x0, x1,x2,----,xn where x1=x0+h, x2=x0+2h, ----xn=x0+nh. The forward difference operator with table is defined by f ( xi )  f ( xi  h)  f ( xi )  f ( xi 1 )  f ( xi ) yi  yi 1  yi So ∆y0, ∆y1,-------∆yn are known as first order forward differences. The second order forward difference is given by  2 f ( xi )  [ f ( xi  h)  f ( xi )]  f ( xi  2h)  2 f ( xi  h)  f ( xi )  yi  2  2 yi 1  yi In general the forward difference of nth order is given by  n f ( xi )   n 1 f ( xi  h)   n 1 f ( xi ) Example: Construct the forward difference table for the data X: -2 0 2 4 Y: 4 9 17 22 Solution: the table is given by ∆ x Y=f(x) 2 -2 4 ∆y0=5 0 9 ∆2 y0=3 ∆y1=8 2 17 ∆2 y1=-3 ∆y2=5 4 22 3 ∆3 y0=-6 Properties of forward difference operator: (1) Forward difference of a constant function is zero. (2) ∆(f(x)+g(x))=∆f(x)+ ∆g(x) (3) ∆(f(x).g(x) = f(x+h) ∆g(x) + g(x) ∆f(x) (4) ( f ( x) g ( x)f ( x)  f ( x)g ( x) ) g ( x) g ( x  h) g ( x ) Backward Difference Operator: For the values y0, y1,y2,--------yn of a function y=f(x) for the equidistant values x0, x1,-------,xn where x1=x0+h, x2=x0+2h, ----xn=x0+nh. The backward difference operator ∇ is defined on the function f as f ( xi )  f ( xi )  f ( xi  h)  yi  yi 1 Which is the first backward difference. The second backward difference is 2 f ( xi )  yi  2 yi 1  yi 2 Similarly the third backward difference is 3 f ( xi )  yi  3 yi 1  3 yi 2  yi 3 The following table is the backward difference table Example: construct the backward difference table for the following data X: -2 0 2 4 Y: -8 3 1 12 Solution: x -2 Y=f(x) -8 ∆ 2 3 ∇y1=11 0 ∇2 y2=-13 3 ∇3 y3=26 ∇y2=-2 2 ∇2 y3=13 1 ∇y3=11 4 12 Lecturer note -11 Shift operator, Central operator, average operator: Shift operator, E Let y = f (x) be a function of x, and let x takes the consecutive values x, x + h, x + 2h, etc. We then define an operator E, called the shift operator having the property E f(x) = f (x + h) Thus, when E operates on f (x), the result is the next value of the function. If we apply the operator twice on f (x), we get E2f(x)= E [E f (x)] = f (x+ 2h). Thus, in general, if we apply the shift operator n times on f (x), we arrive at En f (x) = f (x+ nh) for all real values of n. The inverse operator E-1 is defined as E-1f(x)= f(x-h) Average Operator: The average operator 𝝁 is defined as 𝟏 𝒉 𝒉 𝝁f(x)= 𝟐[f(x+𝟐)+f(x-𝟐)] Central Differences: Central difference operator 𝛿 for a function is defined as 𝒉 𝒉 𝛿f(x)= [f(x+𝟐)-f(x-𝟐)] where h being the interval of differencing h y 1  f ( x0  ) . Then 2 2 h h h h h  y 1   f ( x0  )  f ( x0   )  f ( x0   ) 2 2 2 2 2 2 Let  f ( x0  h)  f ( x0 )  f ( x1 )  f ( x0 )  y1  y0 Central difference table: Relation between all operators: (1) ∆ =E-1 f ( x)  f ( x  h)  f ( x)  Ef ( x)  f ( x)  ( E  1) f ( x) So ∆ =E-1 (2) ∇= 1-E-1 f ( x)  f ( x)  f ( x  h)  f ( x)  E 1 f ( x)  (1  E 1 ) f ( x) So ∇= 1-E-1 1 1 (3)   E 2  E 2 h 2 h 2  f ( x)  f ( x  )  f ( x  ) 1 1 1 1  E 2 f ( x)  E 2 f ( x)  ( E 2  E 2 ) f ( x) So the proof follows (4) 1     (1  2 2 2 2 )2 From the definition of operator we have So 1 2   ( E  E 1 ) 1 1   2 2  1  ( E 2  2  E 2 ) 4 1  ( E  E 1 ) 2 4 1 1 1  1  ( E 2  E 2 )2 2 2 1  ( E  E 1 ) 2 1 Now we have 2 So the proof follows. Differences of a Polynomial Let us consider the polynomial of degree n in the form f ( x)  a0 x n  a1 x n 1  a2 x n 2      an Where a0, a1,a2,------,an are constants with a0≠0 h is the interval of difference Then  n f ( x)  a0 n(n  1)(n  2)(n  3)    (2)(1)h n  a0 (n !)hn  const Since ∆𝑛 f(x) is constant, so ∆𝑛+1 f(x) is zero Hence the (n+ 1)th and higher order differences of a polynomial of degree n are 0 Conversely, if the n th differences of a tabulated function are constant and the (n 1)th , (n 2)th,..., differences all vanish, then the tabulated function represents a polynomial of degree n. It should be noted that these results hold good only if the values of x are equally spaced. The converse is important in numerical analysis since it enables us to approximate a function by a polynomial if its differences of some order become nearly constant. Lecturer note -12 Different interpolation formulaes: Linear interpolation In linear interpolation, we are given with two pivotal values f 0= f(x0) and f 1= f(x1) we approximate the curve of f by a chord (straight line) P1 passing through the points (x0,f0), (x1,f1). Hence the approximate value of f at the intermediate point x=x0+rh is given by the linear interpolation formula i.e f ( x)  P1 ( x)  f 0  r ( f1  f 0 )  f 0  r f 0 x  x0 Where r ,0  r 1 h Example Evaluate ln 9.2 , given that ln 9.0 2.197 and ln 9.5 2.251 Solution: Here x0=9.0, x1=9.5 , h=x1-x0=0.5, f0=ln(9.0)= 2.197, f1=ln9.5=2.251 Now to calculate ln(9.2)=f(9.2) Take x=9.2 so that r  x  x0 , 0  r  1 , we have r=0.4, So h ln 9.2  f (9.2)  P1 (9.2)  f 0  r ( f1  f 0 )  2.197  0.4(2.251  2.197)  2.219 Quadratic Interpolation In quadratic interpolation we are given with three pivotal values f 0=f(x0), f1=f(x1), f2=f(x2) and we approximate the curve of the function f between x0 and x2 = x0 +2h by the quadratic parabola which passes through the points (x0,f0), (x1,f1), (x2,f2). The quadratic interpolation formula becomes r (r  1) 2 f ( x)  P2 ( x)  f 0  r f 0   f0 2 x  x0 r ,0  r 1 Where h Newton’s Forward Difference Interpolation Formula Using Newton’s forward difference interpolation formula we find the n degree polynomial Pn which approximates the function f(x) in such a way that Pn and f agrees at n+1 equally spaced x values, so that pn ( x0 )  f0 , pn ( x1 )  f1 ,   , pn ( xn )  f n The Newton’s forward difference interpolation formula is r (r  1) 2 r (r  1)  (r  n  1) n  f0      f0 2! n! x  x0 Where x=x0+rh, r  ,0  r  n h Derivation of Newton’s forward Formulae for Interpolation: Let we have n+1 set of tabular points where f is defined. So degree of the interpolating polynomial is ≤n f ( x)  pn ( x)  f 0  r f 0  Let pn(x) is the polynomial of nth degree agree at the tabular points and the values of x be equidistant Let pn(x)= a0  a1 ( x  x0 )  a2 ( x  x0 )( x  x1 )    an ( x  x0 )( x  x1 )    ( x  xn1 ) Imposing now the condition that f (x) and pn(x) should agree at the set of tabulated points, we obtain f1  f 0 f 0 n f0 a0  f 0 , a1   ,   , an  x1  x0 h n !h n Setting x=x0+rh and substituting for a0, a1,-----,an we obtain the expression. Note: Newton’s forward difference interpolation formula is useful for interpolation near the beginning of a set of tabular values and for extrapolating values of y a short distance backward, that is left from y0 .The process of finding the value of y for some value of x outside the given range is called extrapolation. Related problem: Example: Using Newton’s forward difference interpolation formula and the following table evaluate f(15) . x y ∆𝑓 ∆4 f ∆2 f ∆3 f 10 46 20 66 30 81 40 93 20 -5 15 2 -3 12 -3 -1 -4 8 50 101 𝐱−𝐱 Solution: Here x0=10, x1=20, x=15, h=10, r = 𝐡 𝟎 =0.5 Now f0=46, ∆f0=20, ∆2 f0=-5, ∆3 f0=2, ∆4 f0=-3 Substituting these values in the Newton’s forward difference interpolation formula for n=4, we obtain r (r  1) 2  f0 2! r (r  1)(r  2) 3 r (r  1)(r  2)(r  3) 4   f0   f0 3! 4! f ( x)  P4 ( x)  f 0  r f 0  Putting the value of r we obtain f(15)=56.8672 Example: Using the Newton’s forward difference interpolation formula evaluate f(2.05) where f(x)=√𝑥, using these values. x: 2.0 2.1 2.2 2.3 f(x)=√𝑥 1.414 1.449 1.483 Solution: The forward difference table is x Y=√𝒙 2.0 1.414214 ∆𝑓 2.4 1.516 ∆2 f 1.549 ∆3 f ∆4 f 0.034924 2.1 1.449138 -0.000822 0.034102 2.2 1.483240 2.3 1.516575 2.4 1.549193 0.000055 -0.000767 0.033335 -0.000005 0.000050 -0.000717 0.032618 𝐱−𝐱 Here r = 𝐡 𝟎 =0.5, x=2.05,x0=2.0, x1=2.1 so by substituting the values in Newton’s formula we obtain r (r  1) 2 f (2.05)  P4 (2.05)  f 0  r f 0   f0 2! r (r  1)(r  2) 3 r (r  1)(r  2)(r  3) 4   f0   f0 3! 4! So f(2.05)=1.431783 Example: Find the missing term in the following table: X: 0 1 2 3 4 Y: 1 3 9 -- 81 Solution: Since four points are given, the given data can be approximated by a third degree polynomial in x. Hence ∆4 f0=0, Substituting E 1 we get (E-1)4f0=0, This implies E4f0-4E3f0+6E2f04Ef0+f0=0 Since Erf0=fr, so we obtain f4-4f3+6f2-4f1+f0=0 Substituting the value of f0,f1,f2,f4 we obtain f3=31 Lecturer note -13 Newton’s Backward difference interpolating formula: Newton’s backward difference interpolation formula is f ( x)  Pn ( x)  f n  rf n   Where r (r  1) 2  fn      2! r (r  1)  (r  n  1) n  fn n! x  xn  rh, r  x  xn , n  r  0 h Derivation of Newton’s Backward Formulae for Interpolation: Given the set of n+1 values i.e (x0,f0), (x1,f1),------ (xn,fn) of x and f, it is required to find pn(x) a polynomial of the nth degree such that f (x) and pn(x) agree at the tabulated points. Let the values of x be equidistant, i.e., let xi=x0+rh, r=0,1,2,---n Let pn(x) is the polynomial of nth degree agree at the tabular points and the values of x be equidistant. Let pn ( x)  a0  a1 ( x  xn )  a2 ( x  xn )( x  xn 1 )     an ( x  xn )( x  xn 1 )  ( x  x1 ) Putting x=xn we have pn(xn) = fn, so a0=fn Similarly Imposing the condition that f (x) and pn(x) should agree at the set of tabulated points we obtain (after some simplification) the above formula. Note: (1) Since ∆𝒏 yi=𝛁nyi+n that means a forward difference table may be derived from backward difference table and viceversa. So result will be almost same if we approximate by Forward or backward interpolation formula. (2) The backward difference interpolation formula is commonly used for interpolation near the end of a set of tabular values and for extrapolating values of y a short distance forward that is right from yn. Related problem: For the following table of values, estimate f(7.5), using Newton’s backward difference interpolation formula. Example: x Y=f(x) 1 1 ∇𝑓 ∇2 f ∇3 f ∇4 f 7 2 8 12 19 3 27 6 18 37 4 64 24 61 5 125 30 216 343 8 512 0 6 36 127 7 0 6 91 6 0 6 0 6 42 169 Since the fourth and higher order differences are 0, the Newton’s backward interpolation formula is Solution: f ( xn  rh)  yn  ryn  Now r We have r (r  1) 2 r (r  1)(r  2)  (r  n  1) n  yn   yn 2! n! x  xn and r=7.5-8.0=-0.5 h yn  169,  2 yn  42, 3 yn  6,  4 yn  0 0.5(0.5  1) 0.5(0.5  1)(0.5  2) 42  6 =421.875 2! 3! INTERPOLATION - Arbitrarily Spaced x values In the previous sections we have discussed interpolations when the x-values are equally spaced. These interpolation formulae cannot be used when the x-values are not equally spaced. In the following sections, we consider formulae that can be used even if the x-values are not equally spaced. f (7.5)  512  ( 0.5)169  Newton’s Divided Difference Interpolation Formula If x0, x1, . . . , xn are arbitrarily spaced (i.e. if the difference between x0 and x1, x1 and x2 etc. may not be equal), Then the polynomial of degree n through (x0,f0), (x1,f1),---(xn,fn) is given by the Newton’s divided difference interpolation formula (also known as Newton’s general interpolation formula) given by f ( x)  f0  ( x  x0 ) f [ x0 , x1 ]  ( x  x0 )( x  x1 ) f [ x0 , x1 , x2 ]      ( x  x0 )( x  x1 )  ( x  xn1 ) f [ x0 , x1, x2 , , xn ] With the remainder term is given by ( x  x0 )( x  x1 )  ( x  xn ) f [ x, x0 , x1, x2 , , xn ] Where f ( x1 )  f ( x0 ) f [ x , x ]  f [ x0 , x1 ] and f [ x0 , x1 , x2 ]  1 2 x1  x0 x1  x0 f [ x1 ,   , xn ]  f [ x0 , , xn 1 ] f [ x0 , x1 ,   xn ]  xn  x0 f [ x0 , x1 ]  Note: If x0, x1,-----,xn are equispaced i.e when xk=x0+kh then f[x0,x1,----,xk]= ∆𝒌 𝒌!.𝒉𝒌 f0 and Newton’s divided difference interpolation formula takes the form of Newton’s forward difference interpolation formula. Properties of divided difference: 1.The divided differences are symmetrical about their arguments. f ( x1 )  f ( x0 ) f ( x0 )  f ( x1 ) That is f [ x0 , x1 ]    f [ x1 , x0 ] x1  x0 x0  x1 So the order of the arguments has no importance When we are considering the nth divided difference also, we can write f (x ) f ( x0 ) f ( x1 ) n f [ x0 , x1 ,   , x ]    n ( x0  x1 )( x0  x2 )  ( x0  x ) ( x1  x0 )( x1  x2 )  ( x1  x ) ( x  x0 )( x  x1 )  ( x  x ) n n n n n n 1 From this expression it is clear that, whatever be the order of the arguments, the expression is same. Hence the divided differences are symmetrical about their arguments. 2. Divided difference operator is linear. For example, consider two polynomials f xand g(x) . Let h(x) af xb g(x) , where ‘a’ and ‘b’ are any two real constants. The first divided difference of h(x) corresponding to the arguments x0 and x1 is h[x0,x1] We find h[x0,x1]=af[x0,x1]+bg[x0,x1] 3. The nth divided difference of a polynomial of degree n is its leading coefficient. Now we consider a general polynomial of degree n as, g(x)=a0xn+a1xn-1+-- +an Since the divided difference operator is linear, we get nth divided difference of g xas a0 which is the leading coefficient of g x. Lecturer note -14 Problem continued and Langrange’s interpolating formula: Related problem on Divided difference formula: Example: Use the following data find the Newton’s divided difference interpolating polynomial. X: -1 0 3 6 7 Y: 3 -6 39 822 1611 Solution: The divided difference table is x y -1 3 0 -6 3 39 6 822 1st 1611 Third Fourth divided differ f [ x0 , x1 ] =-9 f [ x0 , x1 , x2 ] f [ x1 , x2 ] =15 =6 f [ x1 , x2 , x3 ] f [ x2 , x3 ] =261 7 2nd f [ x3 , x4 ] =41 f [ x2 , x3 , x4 ] f [ x0 , x1 , x2 , x3 ] =5 f [ x1 , x2 , x3 , x4 ] f [ x0 , x1 , x2 , x3 , x4 ] =1 =13 =132 =789 So the required polynomial is f(x)=3+(x+1) f [ x0 , x1 ] +(x+1)x f [ x0 , x1 , x2 ] + (x+1)x(x-3) f [ x0 , x1 , x2 , x3 ] +(x+1)x(x-3)(x-6) f [ x0 , x1 , x2 , x3 , x4 ] = `x4-3x3+ 5X2-6 Example: Obtain Newton’s divided difference interpolating polynomial satisfied by 4,1245, 1,33,0,5,2,9and 5,1335. Solution: Newton’s divided difference interpolating polynomial is given by, f ( x)  f ( x0 )  ( x  x0 ) f [ x0 , x1 ]      ( x  x0 )( x  x1 )    ( x  xn1 ) f [ x0 , x1,   , xn ] Here x values are gives as, -4, -1, 0, 2 and 9. Corresponding f(x) values are 1245 ,33, 5, 9 and 1335. Hence the divided difference as shown in the following table: x y -4 1245 -1 33 0 5 2 9 5 1335 1st 2nd Third Fourth divided differ f [ x0 , x1 ] =404 f [ x0 , x1 , x2 ] =94 f [ x1 , x2 ] =-28 f [ x1 , x2 , x3 ] =10 f [ x2 , x3 ] =2 f [ x3 , x4 ] f [ x2 , x3 , x4 ] f [ x0 , x1 , x2 , x3 ] =-14 f [ x1 , x2 , x3 , x4 ] f [ x0 , x1 , x2 , x3 , x4 ] =3 =13 =88 =442 Hence the interpolating polynomial is f ( x)  1245  404( x  4)  94( x  4)( x  1)  14( x  4)( x  1) x 3( x  4)( x  1)( x)( x  2) =3x4-5x3+6x2-14x+5 Example: Newton’s Interpolation formula with divided differences Solution: and x0 is Consider two arguments x and x0. The first divided difference between x f ( x, x0 )  f ( x)  f ( x0 ) x  x0  f ( x)  f ( x0 )  ( x  x0 ) f [ x, x0 ] Consider x,x0 and x1, wehave f ( x, x0 , x1 )  f ( x0 , x1 )  f ( x, x0 ) x1  x  f ( x, x0 )  f ( x0 )  ( x  x0 )[ f [ x0 , x1 ]  ( x  x1 ) f [ x, x0 , x1 ]] So f ( x)  f ( x0 )  ( x  x0 ) f [ x0 , x1]  ( x  x0 )( x  x1) f [ x, x0 , x1] Proceeding in this way we obtain f ( x)  f ( x0 )  ( x  x0 ) f [ x0 , x1 ]  ( x  x0 )( x  x1 ) f [ x0 , x1, x2 ]      ( x  x0 )( x  x1 )  ( x  xn1 ) f [ x0 , x1,   xn ] Since f(x) is a polynomial of degree n, so f[x,x0,x1,---,xn]=0 Lagrangian Interpolation Another method of interpolation in the case of arbitrarily spaced pivotal values x0, x1, . . . , xn is Lagrangian interpolation. This method is based on Lagrange’s n+1 point interpolation formula given by n lk ( x) fk K 0 lk ( xk ) f ( x)  Ln ( x)   Where l0(x)=(x-x1)(x-x2)----(x-xn) and ln(x)= )=(x-x0)(x-x1)----(x-xn-1) Derivation of the formula: Given the set of (n 1) points, (x0,f0), (x1,f1),---------(xn,fn) of x and f(x) it is required to fit the unique polynomial pn(x) of maximum degree n, such that f and pn(x) agree at the given set of points. The values x0,x1,--,xn may not be equidistant. Since the interpolating polynomial must use all the ordinates f(x0), f(x1),---f(xn) , it can be written as a linear combination of these ordinates. That is, we can write the polynomial as pn ( x)  l0 ( x) f ( x0 )  l1 ( x) f ( x1 )    ln ( x) f ( xn ) At x=x0 as f(x) and pn(x) coincides we get f ( x0 )  pn ( x0 )  l0 ( x0 ) f ( x0 )  l1 ( x0 ) f ( x1 )    ln ( x0 ) f ( xn ) This equation is satisfied only when l0 ( x0 )  1, li ( x0 )  0, i  0 At a general point x=xi we get f ( xi )  pn ( xi )  l0 ( xi ) f ( x0 )      ln ( xi ) f ( xn ) li ( xi )  1, l j ( xi )  0, i  j Since li(x)=0 at x=x0,x1,-----,xn so (x-x0)-------(x-xn) are the factors of li(x) The product of these factors is a polynomial of degree n. Therefore, we can write li ( x)  c( x  x0 )( x  x1 )    ( x  xn ) where c is constant Lecturer note -15 Langrange’s interpolating formula derivation and problems: Now since li(xi)=1, we get 1  c( xi  x0 )( xi  x1 )    ( xi  xn ) 1 ( xi  x0 )( xi  x1 )    ( xi  xn ) c Hence li ( x)  So ( x  x0 )( x  x1 )      ( x  xn ) ( xi  x0 )( xi  x1 )    ( xi  xn ) Now the polynomial pn ( x)  l0 ( x) f ( x0 )  l1 ( x) f ( x1 )    ln ( x) f ( xn ) With li(x) as defined above is called Lagrange interpolating polynomial and li(x) are called Lagrange fundamental polynomials. Related problems: Example: Given f(2) = 9, and f(6) = 17. Find an approximate value for f(5) by the method of Lagrange’s interpolation. Solution: For the given two points (2,9) and (6,17), the Lagrangian polynomial of degree 1 is p(x)=l0(x) f(x0)+l1(x)f(x1) Where l0 ( x)  ( x  x0 ) ( x  x1 ) , l1 ( x)  ( x0  x1 ) ( x1  x0 ) So the required polynomial is p( x)  f ( x)  ( x  6) ( x  2) 9 17 (2  6) (6  2) Hence f(5)=15 Example: Use Lagrange’s formula, to find the quadratic polynomial that takes the values X: 0 Y: 0 1 1 3 0 Solution: For the given three points (0,0) , (1,1) and (3,0), the quadratic polynomial by Lagrange’s interpolation is P(x)=l0(x)f(x0)+l1(x)f(x1)+l2(x)f(x2) Where l0 ( x)  ( x  x0 )( x  x2 ) ( x  x0 )( x  x1 ) ( x  x1 )( x  x2 ) , l1 ( x)  , l2 ( x )  ( x0  x1 )( x0  x2 ) ( x1  x0 )( x1  x2 ) ( x2  x0 )( x2  x1 ) We are considering the given x values 0,1, and 3 as x0,x1,x2. Now f(x0) and f(x2)=0, f(x1)=1 So the required polynomial is ( x  0)( x  3) 3x  x 2 P(x)= l1(x) f(x1)= 1 (1  0)(1  3) 2 Example: Find ln 9.2 with n 3 , using Lagrange’s interpolation formula with the given table: x: 9.0 9.5 y=lnx 2.197 2.251 10.0 2.302 11.0 2.397 3 Solution: lk (9.2) fk K 0 lk ( xk ) ln(9.2)  f (9.2)  L3 (9.20)   (9.2  9.5)(9.2  10.0)(9.2  11.0) (9.2  9.0)(9.2  10.0)(9.2  11.0) 2.19722  2.25129 (9.0  9.5)(9.0  10.0)(9.0  11) (9.5  9.0)(9.5  10.0)(9.5  11) (9.2  9.0)(9.2  9.5)(9.2  11.0) (9.2  9.0)(9.2  9.5)(9.2  10.0)  2.30259  2.39790 (10.0  9.0)(10.0  9.5)(10.0  11) (11.0  9.0)(11.0  9.5)(11.0  10)  =2.21920 Inverse Lagrangian Interpolation Formula Interchanging x and y in the Lagrangian Interpolation Formula, we obtain the n l ( y) inverseLagrangian interpolation formula given by x  Ln ( y)   k xk K 0 lk ( yk ) Example: If y1=4, y3=12, y4=19,yx=7, find x 2 lk (7) xk K 0 lk ( yk ) Solution: Using the inverse interpolation formula, we have x  Ln (7)   Where x0=1, x1=3, x2=4, y0=4,y1=12, y2=19 and y=7 x (7  y0 )(7  y2 ) (7  y0 )(7  y1 ) (7  y1 )(7  y2 ) x0  x1  x2 Putting the values we ( y0  y1 )( y0  y2 ) ( y1  y0 )( y1  y2 ) ( y2  y0 )( y2  y1 ) get x=1.86 Lecturer note -16 Error in interpolation and numerical integration: Error or remainder term in interpolation: In this section we would like to provide estimates on the “error” we make when interpolating data that is taken from sampling an underlying function f(x). While the interpolant and the function agree with each other at the interpolation points, there is, in general, no reason to expect them to be close to each other elsewhere. Nevertheless we can estimate the difference between them, a difference which we refer to as the interpolation error. Let f(x) be a function in the interval (a,b) and suppose that f n+1(x) exists in (a,b). then the error is ( x  x0 )( x  x1 )    ( x  xn ) n1 f(x)-pn(x)=. f ( ) (n  1)! where 𝜉 depends upon x,x0,-----,xn and f and min(x,x0,-----,xn)< 𝜉<max(x,x0,-----,xn) Numerical Integration: In this chapter we are going to explore various ways for approximating the integral of a function over a given domain. Since we can not analytically integrate every function, the need for approximate integration formulas is obvious. In addition, there might be situations where the given function can be integrated analytically, and still, an approximation formula may end up being a more efficient alternative to evaluating the exact expression of the integral. We want to construct numerical algorithms that can perform definite integrals b of the form  f ( x)dx a Calculating these definite integrals numerically is called numerical integration, numerical quadrature, or more simply quadrature. THE TRAPEZOIDAL RULE In this method to evaluate , we partition the interval of integration [a, b] c and replace f by a straight line segment on each subinterval. The vertical lines from the ends of the segments to the partition points create a collection of trapezoids that approximate the region between the curve and the x-axis. We add the areas of the trapezoids counting area above the x-axis as positive and area below the axis as negative and denote the sum by T. 1 1 1 ( y0  y1 )h  ( y1  y2 ) h       ( yn 1  yn ) h 2 2 2 Then T= h  ( y0  yn  2( y1  y2      yn 1 ) 2 Where y0=f(a), y1=f(x1)--------yn-1=f(xn-1), yn=f(b) b SIMPSON’S 1/3 RULE Simpson’s rule for approximating the integral  f ( x)dx is a based on approximating f with quadratic polynomials instead of linear polynomials. We approximate the graph with parabolic arcs instead of line segments . The integral of the quadratic polynomial y=Ax2+Bx+c from x=-h to x=h is ℎ ℎ ∫−ℎ(Ax2 + Bx + c)dx = 3(y0+4y1+y2) Simpson’s rule follows from partitioning [a, b] into an even number of subintervals of equal length h, applying Eq. to successive interval pairs, and adding the results. Algorithm: Simpson’s 1/3 Rule To approximate the integral we use S= h ( y0  4 y1  2 y2  4 y3      2 yn  2  4 yn 1  yn ) 3 The y’s are the values of f at the partition points x0=a, x1=a+h,x2=a+2h,------,xn=b In particular we have h S  ( s0  4s1  2s2 ), s0  y0  yn , 3 s1  y1  y3     yn1 , s2  y2  y4     yn2 Where h=(b-a)|n nis the number of subinterval of the partitions. Derivation of Trapezoidal and Simpson’s 1/3 rules of integration from Lagrangian Interpolation Integrating the formula in Lagrangian interpolation, we obtain b b b n fk a f ( x)dx  a Ln ( x)dx  K0 lk ( xk ) a lk ( x)dx For n=1 we have only one interval [x0, x1] such that a = x0 and b = x1 and then the above integration formula gives trapezoidal rule. For n = 2 , we have two subintervals [x0, x1] and [x1, x2] of equal width h such that a = x0 and b = x2 and then the above integration formula becomes x2 b h f ( x ) dx  a x f ( x)dx  3 ( f0  4 f1  f 2 ) 0 and is the Simpson’s 1/3 rule of integration. For n = 3 the above integration formula becomes x3 b 3h f ( x ) dx  a x f ( x)dx  8 ( f0  3 f1  3 f 2  f3 ) 0 and is known as Simpson’s 3/8 rule of integration. Lecturer note -17 Theory continued and related problems on numerical integration: Note: Using langranges interpolation we know n Pn ( x)   li ( x) f ( xi ) So , we i 0 can approximate  b a b n b n a i 0 a i 0 f ( x)dx   pn ( x)dx   f ( xi )  li ( x)dx   Ai f ( xi ) is called a Newton-Cotes formula Note: General integration formulas We recall that a weight function is a continuous, non-negative function with a positive mass. We assume that such a weight function w(x) is given and would like to write a quadrature of the form  b a f ( x)w( x)dx  n  A f (x ) i 0 i i Such quadratures are called general (weighted) quadratures. Note: Composite Integration Rules In a composite quadrature, we divide the interval into subintervals and apply an integration rule to each subinterval. Note: Throughout this section we assumed that all functions we are interested in integrating are actually integrable in the domain of interest. We also assumed that they are bounded and that they are defined at every point, so that whenever we need to evaluate a function at a point, we can do it. We will go on and use these assumptions throughout the chapter. Note: Simpson’s one third rule is applied subject to even number of subintervals that is number of sub intervals should be multiple of 2. Similarly simpson’s three eight rule is applied when the number of subinterval is multiple of three. Note: Degree of precision is nothing but the maximum degree of the poly which can be fitted through the data. Degree of precision of trapezoidal rule is 1 and of Simpson’s one third rule is 2 Gauss quadrature having maxm degree of precision. Related problem: 𝟐 Example: Use the trapezoidal rule with n 4 to estimate ∫𝟏 𝒙𝟐 dx Compare the estimate with the exact value of the integral Solution: To find the trapezoidal approximation, we divide the interval of integration into four subintervals of equal length and list the values of y=x2 at the endpoints and partition points. j xj Yj=xj2 Yj=xj2 0 1.0 1.0000 1 1.25 1.5625 2 1.50 2.2500 3 1.75 3.0625 4 2.00 4.0000 𝒃−𝒂 With n 4 and h= = 0.25 𝒏 So the approximate value of the integral is T= h 1 [ y0  y4  2( y1  y2  y3 )]  [1.4  2(6.875)]  2.34375 2 8 2 2 x3 7   2.33334 The exact value of the integral is  x dx  31 3 1 2 The approximation is a slight overestimate. Each trapezoid contains slightly more than the corresponding strip under the curve. 1 1 dx with four Example: Using Trapezoidal rule solve the integral,  2 x  6 x  10 0 subintervals. Solution: For four subintervals we have the trapezoidal rule is h f ( x)dx  [ y0  2 y1  2 y2  2 y3  y4 ] a 2 the range of integral [0,1] is divided into four equal subinterval of width h=0.25, by the points, 0.0,0.25,0.50,0.75 and 1 . Considering them as the x values, corresponding values of the integrand denoted by y0,y1,y2,y3,y4 are 0.10, 0.08649,0.07547,0.06639, and 0.05882  b 1 Hence  x 0 2 1 0.25 dx  [0.10  2  0.08649  2  0.07547  6 x  10 2 2  0.06639  0.05882]  0.07694 5 Example: Find an approximate value of loge5 by calculating dx  4x  5 by Simpson’s 0 1/3 rule of integration. 5 calculate the value of  0 5 dx 1 1 1  0 4 x  5   4 log(4 x  5) 0  4 [log 25  log 5]  4 log 5 .Now to 5 Solution: we note that dx by Simpson’s rule of integration, divide the interval 4x  5 𝑏−𝑎 [0, 5] into n = 10 equal subintervals, each of length h= 𝟏 𝑛 =0.5 Since x0=0, y0=0.2, x1=0.5, y1= similarly we find the values at the other points 𝟕 5 Hence dx 0.5 0 4 x  5  S  3 [( y0  y11 )  4( y1  y3  y5  y7  y9 ) 2( y2  y4  y6  y8  y10 )]  0.5 [0.24  4(0.3963)  2(0.2944)] 3 =0.4023 and loge 5 = 4(0.4023) = 1.6092. 10 1 Example: Find  dx using Simpson’s one third rule by taking 10 subintervals 1  x2 0 Solution: By simpson’s one third rule we have b h a f ( x)dx  3 [ y0  4( y1  y3  )  2( y2  y4  )  yn ] let the range [0,10] is subdivided into 10 equal interval of width h=1, by the x values 0,1,2,3,4,5,6,7,8,9 and 10. Corresponding y values of the 1 function f ( x)  are listed below: 1  x2 x 0 1 2 3 4 5 6 7 8 9 10 y 1 0.5 0.2 0.1 0.0588 0.0385 0.0270 0.02 0.0154 0.0122 0.0099 10 1  1 x 0 2 1 dx  [1  4(0.5  0.1  0.0385  0.02  0.0122) 3 2(0.2  0.0588  0.027  0.0154)  0.0099] 1  (4.2951)  1.4317 3 Hence Example: Evaluate using Simpson’s three eight rule. 6 1 0 3  x 2 dx Let the limit of integral [0,6] be divided into six equal parts with interval h=1, using the x values 0,1,2,3,4,5 and 6. Corresponding y values of the given integrand function are, 6 1  3 x 0 x 0 1 2 3 4 5 6 y 0.333 0.25 0.1429 0.1 0.0526 0.0357 0.0256 2 3 3 dx  [ y0  3( y1  y2  y4  y5 )  2 y3  y6 ]  [2.0022]  0.7508 8 8 Lecturer note -18 Gauss quadrature: b So far, all the quadratures we encountered were of the form  n f ( x)dx   Ai f ( xi ) . An i 0 a approximation of this form was shown to be exact for polynomials of degree ≤ n for an appropriate choice of the quadrature coefficients Ai. In all cases, the quadrature points x0, x1,-----,xn were given up front. In other words, given a set of nodes x0, x1,-----,xn the coefficients {𝐴𝑖 } where i=0,1,2,---,n were determined such that the approximation was exact in the respective set of polynomials. We are now interested in investigating the possibility of writing more accurate quadratures without increasing the total number of quadrature points. This will be possible if we allow for the freedom of choosing the quadrature points. The quadrature problem becomes now a problem of choosing the quadrature points in addition to determining the corresponding coefficients in a way that the quadrature is exact for polynomials of a maximal degree. Quadratures that are obtained that way are called Gaussian quadratures. Gaussian integral formula and Gauss legender two point formula: This formula is based on unequispaced nodes. b Let we have to integrate  f ( x)dx a Change the limit of the integral from a to b to -1 and 1 in order to maintain orthogonal property of legender polynomial by using the transformation x ba ab u . So we are looking for a quadrature of the form 2 2 1  f ( x)dx  A f ( x )  A f ( x ) 0 0 1 1 1 A straightforward computation will amount to making this quadrature exact for the polynomials of degree ≤3. The linearity of the quadrature means that it is sufficient to make the quadrature exact for 1, x, x2, x3. Hence we write the system of equations 1  1 1 f ( x)dx   xi dx  A0 x0i  A1 x1i From this we can write Solving we get i=0,1,2,3 1 𝟐 A0 + A1 = 2, A0x0+A1x1=0, A0x02+A1x12=𝟑 , A0x03+A1x13=0 A0=A1=1 1 x0= -x1= √3 1  f ( x)dx  f ( so that the desired quadrature is 1 1 1 ) f ( ) 3 3 Similarly for Gauss legender three point rule we have the coefficients are A0=0.555, A1=0.888, A2=0.555 x0=-0.7745, x1=0, x2=0.7745 Related problem: 1 Example: Solve 1  1  x dx by using Gauss legender 2 point and 3 point rule 0 Now a=0, b=1 So using the transform x  Solution: when x=0,u=-1, 𝒖+𝟏 ba ab we have x= u 𝟐 2 2 x=1,u=1 1 1 1 1 1 1 0 1  xdx  1 u  3du  A0 f ( 3 )  A1 f ( 3 ) 1 1   1 1 3 3 3 3 So Now by using 3 point formula we have 1 1 1 1 1 0 1  xdx  1 u  3du  0.555 f (0.7745)  0.888 f (0)  0.555 f (0.7745) Where f(u)=𝑢+3 b Quadrature error. If the trapezoidal rule is  f ( x)dx  a ba [ f (a)  f (b)] then the 2 f ( ) f ( ) ( x  a)( x  b)dx   (b  a )3 ,   (a, b)  2 a 12 b E interpolation error is Surprisingly, Simpson’s quadrature is exact for polynomials of degree ≤ 3 and not only 𝒃−𝒂 for polynomials of degree ≤ 2. Let h= 𝟐 This means that the quadrature error for Simpson’s rule is h 1 F (a  2h)  [ f (a)  4 f (a  h)  f (a  2h)]   h5 f 4 (a)    3 90 a2h Where F (a  2h)   f ( x) dx a 1 b  a 5 (4) ( ) f ( ),  [ a, b] 90 2 Since the fourth derivative of any polynomial of degree ≤ 3 is identically zero, the quadrature error formula implies that Simpson’s quadrature is exact for polynomials of degree ≤ 3. Hence the error is E   Lecturer note -19 Numerical solution of Ordinary differential equation: There are differential equations that cannot be solved using the standard methods even though they possess solutions. In such situations, we apply numerical methods for obtaining approximate solutions, where the accuracy is sufficient. These methods yield the solution in one of the following forms: (i) Single-step method: A series for y in terms of powers of x, from which the value of y at a particular value of x can be obtained by direct substitution. (ii) Multi-step method: In multi step methods, the solution at any point x is obtained using the solution at a number of previous points. Taylor’s, Picard’s, Euler’s and Modified Euler’s methods are coming under singlestep method of solving an ordinary differential equation. The need for finding the solution of the initial value problems occur frequently in Engineering and Physics. There are some first order differential equations that cannot be solved using the standard methods. In such situations we apply numerical methods. These methods yield the solution in one of the two forms: (iii)A series for y in terms of powers of x, from which the value of y can be obtained by direct substitution. (iv)A set of tabulated values of x and y. The methods of Taylor and Picard belong to class (i), whereas those of Euler, RungeKutta, etc., belong to the class (ii) EULER METHOD: Consider the initial value problem of first order y′=f(x,y), y(x0)=y0----------------(1) Starting with given x0 and the value of h is chosen so small, we suppose x0,x1,x2----be equally spaced x values (called mesh points) with interval h. i.e x1=x0+h, x2=x1+h,-------Also denote y(x0)=y0, y(x1)=y1,-------By separating variables, the differential equation in (1) becomes dy=f(x,y)dx----------(2) Integrating (2) from x0 to x1 with respect to x, (at the same time y changes from y0 to y1) we get This causes y1 x1 y0 x0  dy   f ( x, y)dx 𝒙 y1=y0+∫𝒙 𝟎 𝒇(𝒙, 𝒚)dx 𝟏 Assuming that So f(x,y)≈f(x0,y0), we have y1=y0+ (x1-x0) f(x0,y0), y1=y0+ (x1-x0) f(x0,y0)= y0+ hf(x0,y0) in the range Proceeding in this way, we obtain the general formula x0<x<x1 yn+1= yn+hf(xn,yn) The above is called the Euler method or Euler-Cauchy method. Related problems: Example: Use Euler’s method with h = 0.1 to solve the initial value problem dy  x2  y2 dx With y(0)=0 in the range 0≤ 𝑥 ≤ 0.5 Solution: Here f(x,y) =x2+y2, x0=0, y0=0, h=0.1 Hence x1=x0+h=0.1, x2=x1+h=0.2, x3=x2+h=0.3, x4=x3+h=0.4, x5=x4+h=0.5 We know the iterative formula for Euler method is yn+1= yn+hf(xn,yn) So yn+1= yn+h(xn2+yn2) y1  y0  0.1( x0 2  y0 2 )  0 y2  y1  0.1( x12  y12 )  0.001 y3  y2  0.1( x2 2  y2 2 )  0.005 y4  y3  0.1( x32  y32 )  0.014 So continuing this way to get y5 i.e y(0.5) Example: Using Euler method solve the equation 𝑑𝑦 𝑑𝑥 = 2xy +1, y(0)=0, h=0.02 for x=0.1 Solution: Here f(x,y) =2xy+1, x0=0, y0=0, h=0.02 Hence x1=x0+h=0.02, x2=x1+h=0.04, x3=x2+h=0.06, x4=x3+h=0.08, x5=x4+h=0.1 We know the iterative formula for Euler method is yn+1= yn+hf(xn,yn) So yn+1= yn+h(2xnyn+1) y1  y0  0.02(2 x0 y0  1)  0.02 y2  y1  0.02(2 x1 y1  1)  0.04 y3  y2  0.02(2 x2 y2  1)  0.06 y4  y3  0.02(2 x3 y3  1)  0.08 , y5  y(0.1)  y4  0.02(2 x4 y4  1)  0.1 Modified Euler Method: Modified Euler method is given by the iteration formula h y1( n 1)  y0  [ f ( x0 , y0 )  f ( x1 , y1( n ) )] 2 (n) Where y1 is the nth approximation to y1 .The iteration formula can be started by choosing y1(0) from Euler’s formula y1(0)  y0  hf ( x0 , y0 ) Lecturer note -20 Improved Euler method continued and Related problems: As per theory of previous section we conclude that in each step we modify or improve the approximation by Euler’s method by the process recommended to minimize the error in numerical computation. Example: given that Using modified Euler’s method, determine the value of y when x 0.1 dy y(0)=1, take h=0.5  x2  y dx Solution: Here f(x,y) =x2+y, x0=0, y0=1, h=0.5 Now y1(0)  y0  hf ( x0 , y0 )  1  0.05  1.05 The predicted value h y1(1)  y0  [ f ( x0 , y0 )  f ( x1 , y1(0) )] 2  1  0.025[ f (0,1)  f (0.05,1.05)] The corrected value  1  0.025[1  (0.05)  1.05] 2  1.0513 Hence we take y1=1.0513, which is correct to four decimal places. h Formula takes the form y2 ( n 1)  y1  [ f ( x1 , y1 )  f ( x2 , y2 ( n ) )] 2 where we first evaluate y2(0) using the Euler formula y2 (0)  y1  h[ f ( x1 , y1 )]  1.0513  0.05[(0.05)2  1.0513]  1.1040 h y2 (1)  y1  [ f ( x1 , y1 )  f ( x2 , y2(0) )] 2 0.05  1 {(0.05) 2  1.0513  (0.1) 2  1.1040} 2  1.1055 Hence we take y2= 1.1055 . So the value of y when x 0.1 is 1.1055 correct to four decimal places. Example: Using modified Euler’s method, determine the value of y when x 0.2 given 𝒅𝒚 that = x+√𝑦 y(0)=1 Take h=0.2 𝒅𝒙 Solution: Here f(x,y) =x+√y, x0=0, y0=1, h=0.2 y1(0)  y0  h[ f ( x0 , y0 )]  1  0.2  1.2 Now Which is the predicted value h y1(1)  y0  [ f ( x0 , y0 )  f ( x1 , y1(0) ] 2 0.2  1 [1  (0.2  1.2)  1.2295 2 Which is the corrected value of y1 Also we find h y1(2)  y0  [ f ( x0 , y0 )  f ( x1 , y1(1) ] 2  1  0.1[1  (0.2  1.2295)]  1.2309 Hence we take y(0.2) y1 1.2309. Note: The Taylor series method has desirable features, particularly in its ability to keep the errors small, but that it also has the strong disadvantage of requiring the evaluation of higher derivatives of the function f(x,y) . We observed that the Euler method could be improved by computing the function f(x,y) at a predicted point at the far end of the step in x The Runge-Kutta approach is to aim for the desirable features of the Taylor series method, but with the replacement of the requirement for the evaluation of higher order derivatives with the requirement to evaluate f(x,y) at some points within the step x i to xi+1. RUNGE KUTTA METHODS We use the fact that Runge-Kutta method of rth order agree with Taylor’s series solution up to the terms of hr. Second Order Runge-Kutta Method Computationally, most efficient methods in terms of accuracy were developed by two German mathematicians, Carl Runge and Wilhelm Kutta. These methods are well known as Runge-Kutta methods (R-K methods). In this and the coming section we consider second and fourth order R-K methods. There are several second order Runge-Kutta formulas and we consider one among them. Working Method (Second Order Runge-Kutta Method) Given the initial value problem. Suppose x0, x1, x2,----be equally spaced x values with space length h. xn1  xn  h Alogrithm: kn  hf ( xn , yn ), ln  hf ( xn 1 , yn  kn ) 1 yn1  yn  (kn  ln ) 2 Lecturer note -21 Runge kutta method continued and Related problems: Note: Modified Euler method is a special case of second order Runge-Kutta method Fourth Order Runge-Kutta method: The Runge-Kutta method of fourth order (also known as classical Runge-Kutta method) gives greater accuracy and is most widely used for finding the approximate solution of first order ordinary differential equations. The method is well suited for computers. The method is shown in the following algorithm. Algorithm (The Runge-Kutta method) Given the initial value problem. Suppose x0, x1, x2,----be equally spaced x values with space length h. Also denote y(x0)=y0, y(x1)=y1 ---------- For n 0, 1,----- until termination do: xn 1  xn  h h 1 An  hf ( xn , yn ), Bn  hf ( xn  , yn  An ) 2 2 h 1 Cn  hf ( xn  , yn  Bn ) 2 2 Dn  hf ( xn  h, yn  Cn ) 1 yn 1  yn  ( An  2 Bn  2Cn  Dn ) 6 Related problem: Example: Use Runge-Kutta method with h 0.1 to find y(0.2) given With y(0)=0 Solution: Here f(x,y) =x2+y2, x0=0, y0=0, h=0.1 Hence x1=x0+h=0.1, x2=x1+h=0.2 Now as per the process we have dy  x2  y2 dx An  0.1( xn 2  yn 2 ) 1 An ) 2 ] 2 1 Cn  0.1[( xn  0.05) 2  ( yn  Bn ) 2 ] 2 2 2 Dn  0.1[ xn 1  ( yn  Cn ) ] 1 yn 1  yn  ( An  2 Bn  2Cn  Dn ) 6 Bn  0.1[( xn  0.05) 2  ( yn  Now x1= x0+0.1=0.1, we have A0  (0.1)0  0 B0  0.1[(0.05) 2  0)  0.00025 C0  0.1[(0.05) 2  (0.000125) 2 ]  0.00025 D0  0.1[(0.1) 2  (0.00025) 2 ]  0.001 1 y1  y0  ( A0  2 B0  2C0  D0 )  0.00033 6 I.e y(0.1)=0.00033 Similarly 1 y(0.2)=y1+ (A1+2B1+2C1+D1) 6 A1  0.1( x12  y12 )  0.1[(0.1)2  (0.00033)2 ]  0.001 1 B1  0.1[( x1  0.05) 2  ( y1  A1 ) 2 ]  0.00225 2 1 C1  0.1[( x1  0.05) 2  ( y1  B1 ) 2 ]  0.00025 2 2 2 D1  0.1[ x 2 ( y1  C1 ) ]  0.004 1 Y2=y(0.2)= y1+ (A1+2B1+2C1+D1)=0.002663 6 Lecturer note -22 Probability : Probability theory is the branch of mathematics that studies the possible outcomes of given events together with the outcomes' relative likelihoods and distributions. In common usage, the word \probability" is used to mean the chance that a particular event (or set of events) will occur expressed on a linear scale from 0 (impossibility) to 1 (certainty), also expressed as a percentage between 0 and 100%. The analysis of data (possibly generated by probability models) is called statistics. Probability is a way of summarizing the uncertainty of statements or events. It gives a numerical measure for the degree of certainty (or degree of uncertainty) of the occurrence of an event. Another way to define probability is the ratio of the number of favorable outcomes to the total number of all possible outcomes. This is true if the outcomes are assumed to be equally likely. The collection of all possible outcomes is called the sample space. Example: When we flip a coin then sample space is S={H,T}, Where H denotes that the coin lands ”Heads up” and T denotes that the coin lands ”Tails up”. For a ”fair coin ” we expect H and T to have the same ”chance ” of occurring, i.e., if we flip the coin many times then about 50 % of the outcomes will be H. We say that the probability of H to occur is 0.5 (or 50 %) . The probability of T to occur is then also 0.5. If there are n total possible outcomes in a sample space S, and m of those are favorable for an event A, then probability of event A is given as P ( A)  number of favorable outcomes m 𝒏(𝑨) = = n total number of possible outcomes 𝒏(𝑺) Example: Find the probability of getting a 3 or 5 while throwing a die. Solution. Sample space S = {1,2,3,4,5,6} and event A = {3,5}. We have n(A) = 2 and n(S) = 6. So, 𝒏(𝑨) P(A) =𝒏(𝑺) = 0.3333 Experiments and random events. In probability theory, random experiment means a repeatable process that yields a result or an observation. Tossing a coin, rolling a die, extracting a ball from a box are random experiments. When tossing a coin, we get one of the following elementary results:(heads); (tails): Event: Any subset of the sample space is known as event A random event is an event that either happens or fails to happen as a result of an experiment. When tossing a coin, the event (heads) may happen or may fail to happen, so this is a random event. A random experiment is the process of observing the outcome of a chance event. Complement The complement of event A is the set of all outcomes in a sample that are not included in the event A. The complement of event A is denoted by A′ If the probability that an event occurs is P, then the probability that the event does not occur is q=1-p. i.e. probability of the complement of an event = 1- probability of the event. i.e. P(A′)= 1- P(A) Intersections of Events The event A ∩ 𝐵 is the intersection of the events A and B and consists of outcomes that are contained within both events A and B. The probability of this event, is the probability that both events A and B occur. Mutually Exclusive Events Two events are said to be mutually exclusive if A ∩ 𝐵 = ∅ ; (i.e. they have empty intersection) so that they have no outcomes in common. Unions of Events The event A∪ 𝐵 is the union of events A and B and consists of the outcomes that are contained within at least one of the events A and B. Types of Probability There are three ways to define probability, namely classical, empirical and subjective probability. . Classical probability Classical or theoretical probability is used when each outcome in a sample space is equally likely to occur. Roll a die and observe that P(A) = P(rolling a 3) = 1|6 Empirical probability Empirical (or statistical) probability is based on observed data. The empirical probability of an event A is the relative frequency of event A Subjective Probability: Subjective probabilities result from intuition, educated guesses, and estimates. For example: given a patient's health and extent of injuries a doctor may feel that the patient has a 90% chance of a full recovery. Laws of Probability As we have seen in the previous section, the probabilities are not always based on the assumption of equal outcomes. Axioms of Probability For an experiment with a sample space S ={e1,e2,---,en} we can assign probabilities P(e1), P(e2),---P(en) provided that (1) 0≤P(ei)≤ 1 (2) P(s)= ∑𝑛𝑖=1 𝑃(𝑒𝑖 )=1 (3) If a set (event) A consists of outcomes Complement Rule For any event A, we have P(A′)= 1- P(A) {e1,e2,---,ek} then P(A)=∑𝑘𝑖=1 𝑃(𝑒𝑖 ) Addition Law If A and B are two different events then P(A ∪ B) = P(A) + P(B) - P(A ∩ B) Example Probability that John passes a Math exam is 4/5 and that he passes a Chemistry exam is 5/6. If the probability that he passes both exams is 3/4, and the probability that he will pass at least one exam. Solution. Let M = John passes Math exam, and C = John passes Chemistry exam. P(John passes at least one exam) = P(M ∪ C) = P(M) + P(C) - P(M ∩ C) = 53|60 Note: If two events A and B are mutually exclusive, then P(A ∪ B) = P(A) + P(B): This follows immediately since A and B are mutually exclusive, P(A ∩ B) = 0. Conditional probability and independence: Conditional probability is the probability of an event occurring given the knowledge that another event has occurred The conditional probability of event A occurring, given that event B has occurred is denoted by P(A|B) and is read that probability of A given B. The conditional probability of event A given B is P( A | B)  P( A  B) , P( B)  0 P( B) Lecturer note -23 Conditional Probability and related problems : Note: In case when all the outcomes are equally likely, it is sometimes easier to find conditional probabilities directly, without apply) the above condition directly. If we already know that B has happened, we need only to consider outcomes in B, thus reducing our sample space to B. Then, Numberofoutcome sin A  B Numberofoutcome sin B Example: Let A = {a family has two boys} boy} Find P(A|B). P(A|B) = and B = {a family of two has at least one Solution: The event B contains the following outcomes: (B;B), (B;G) and (G;B). Only one of these is in A. Thus, P(A|B) = 1|3. However, if I know that the family has two children, and I see one of the children and it's a boy, then the probability suddenly changes to 1/2. There is a difference in the language and this changes the conditional probability Multiplication rule for probabilities we have the rule for multiplication is Using the concept of conditional probability P( A  B)  P( A | B) P( B) Statistical independence of events: The events A and B are called (statistically) independent if P( A  B)  P( A) P( B) Another way to express independence is to say that the knowledge of B occurring does not change our assessment of P(A). This means that P(A|B) =P(A). (The probability that a person is female given that he or she was born in March is just the same as the probability that the person is female.) Example: For a coin tossed twice, denote H1 the event that we got Heads on the first toss, and H2 is the Heads on the second. Clearly, P(H1) = P(H2) = 1/2. Then, counting the outcomes, P(H1H2) = 1/4 = P(H1)P(H2), therefore H1 and H2 are independent events. This agrees with our intuition that the result of the first toss should not affect the chances for H2 to occur. Example: Three bits (0 or 1 digits) are transmitted over a noisy channel, so they will Be flipped independently with probability 0.1 each. What is the probability that a) At least one bit is flipped Solution. a) Using the complement rule, P(at least one) = 1 - P(none). If we denote Fk the event that kth bit is flipped, then P(no bits are flipped) = P(F1′ F2′ F3′ ) = (1 – 0.1)3 due to independence. Then, P(at least one) = 1 – (0.9)3 = 0.271 Note: If an object is selected and then replaced before the next object is selected, this is known as sampling with replacement. Otherwise, it is called sampling without replacement. Rolling a die is equivalent to sampling with replacement, whereas dealing a deck of cards to players is sampling without replacement. Example: If we randomly pick two television sets in succession from a shipment of 240 television sets of which 15 are defective, what is the probability that they will be both defective? Solution: Let A denote the event that the first television picked was defective. Let B denote the event that the second television picked was defective. Then A|B will denote the event that both televisions picked were defective. Using the conditional probability, we can calculate 15 14 7 P(A ∩ B) = P(A) P(B|A)= ( )( ) 240 239 1912 we assume that we are sampling without replacement Example: A box of fuses contains 20 fuses, of which 5 are defective. If 3 of the fuses are selected at random and removed from the box in succession without replacement, what is the probability that all three fuses are defective? Solution: Let A be the event that the first fuse selected is defective. Let B be the event that the second fuse selected is defective. Let C be the event that the third fuse selected is defective. The probability that all three fuses selected are defective is P(A ∩ B ∩ C). Hence P(A ∩ B ∩ C) = P(A) P(B|A) P(C|A ∩B) 5 4 3 1 = ( )( )( )  20 19 18 114 Note: The concept of independence is fundamental. In fact, it is this concept that justifies the mathematical development of probability as a separate discipline from measure theory. “independence of events is not a purely mathematical concept.” It can, however, be made plausible that it should be interpreted by the rule of multiplication of probabilities and this leads to the mathematical definition of independence. Example: Flip a coin and then independently cast a die. What is the probability of observing heads on the coin and a 2 or 3 on the die? Solution: Let A denote the event of observing a head on the coin and let B be the 1 2 1 event of observing a 2 or 3 on the die. Then P(A ∩ B) = P(A) P(B) = ( )( )  2 6 6 Example: Two possible mutually exclusive events are always dependent (that is not independent). P(A ∩B) = P(A) P(B) P(∅) = P(A) P(B) 0 = P(A) P(B). Hence, we get either P(A) = 0 or P(B) = 0. This is a contradiction to the fact that A and B are possible events. This completes the result. Solution: Suppose not. Then Example: Two possible independent events are not mutually exclusive. Solution: Let A and B be two independent events and suppose A and B are mutually exclusive. Then P(A) P(B) = P(A ∩B) = P(∅) = 0. Therefore, we get either P(A) = 0 or P(B) = 0. This is a contradiction to the fact that A and B are possible events. The possible events A and B exclusive implies A and B are not independent; and A and B independent implies A and B are not exclusive. If A and B are independent events. Then A′ and B are independent. Similarly A and B′ are independent. Example: Solution: We know that A and B are independent, that is P(A ∩ B) = P(A) P(B) and we want to show that A′ and B are independent, that is P(A′ ∩ B) = P(A′) P(B). Since P(A′ ∩ B) = P(A′|B) P(B) = [1 − P(A|B)] P(B) = P(B) − P(A|B)P(B) = P(B) − P(A ∩ B) = P(B) − P(A) P(B) = P(B) [1 − P(A)] = P(B)P(A′), ′ the events A and B are independent. Similarly, it can be shown that A and B′ are independent. Note: If the events {Bi}, i=1 to m constitute a partition of the sample space S and P(Bi) ≠ 0 for i = 1, 2, ...,m, then for any event A in S, we have m P( A)   P( Bi ) P( A | Bi ) i 1 Lecturer note -24 Random variable and Probability distributions: Random variables: A variable whose numerical value is determined by the outcome(or result) of a random experiment is called a random variable or chance variable. In probability theory and statistics it would be extremely useful to be able to work with symbols representing “the numeric outcome that the chance experiment will provide when carried out”. Such symbols are random variables; random variables are most frequently denoted by capital letters (such as X, Y , Z). Note: Consider a random experiment whose sample space is S. A random variable X is a function from the sample space S into the set of real numbers R such that for each interval I in R, the set {s  S | X ( s)  I } is an event in S. Example: Consider a random experiment of tossing 3 coins. Define the random variable by this example. Solution: Since we toss 3 coins. So the sample space S is given by S={HHH,HHT,HTH,HTT,THH,THT,TTH,TTT} Let X denote the number of heads observed. Then X=0 if the outcome is TTT Similarly X=1 provided the outcome is HTT or THT or TTH So X is the random variable whose values are determined by the outcomes of random experiment of tossing three coins, and it is a function with domain S and range {0,1,2,3} Hence X(TTT)=0, X(HTH)=2 etc Note: Given a random experiment, there can be many random variables. This is due to the fact that given two (finite) sets A and B, the number of distinct functions one can come up with is |B|.|A| Here |A| means the cardinality of the set A. Note: A random variable is neither random nor variable, it is simply a function. The values it takes on are both random and variable. Note: The set {x ∈ R| x = X(s), s ∈ S} is called the space of the random variable X. There are three types of random variables: discrete, continuous, and mixed. However, in most applications we encounter either discrete or continuous random variable. Discrete Random variable: If the space of random variable X is countable, then X is called a discrete random variable. This means that , in practice, we may consider a list of possible outcomes x1,x2,-------xn even if (n→ ∞) for any discrete random variable X Distribution Functions of Discrete Random Variables Every random variable is characterized through its probability density function. Let RX be the space of the random variable X. The function f : RX→ 𝑅 defined by f(x) = P(X = x) is called the probability density function (pdf) of X. Example: A fair coin is tossed 3 times. Let the random variable X denote the number of heads in 3 tosses of the coin. Find the sample space, the space of the random variable, and the probability density function of X. Solution: The sample space S of this experiment consists of all binary sequences of length 3, that is S = {TTT, TTH, THT, HTT, THH, HTH, HHT, HHH} The space of this random variable is given by RX = {0, 1, 2, 3} Therefore, the probability density function of X is given by 1 3 3 1 f(0) = P(X = 0) = f(1) = P(X = 1) = f(2) = P(X = 2) = f(3) = P(X = 3) = 8 8 8 8 Note: If X is a discrete random variable with space RX and probability density function f(x), then (a) f(x) ≥ 0 for all x in RX, and (b)  f ( x)  1 xRX Example: If the probability of a random variable X with space RX = {1, 2, 3, ..., 12} is given by f(x) = k (2x − 1), then, what is the value of the constant k? Solution: So Note: x 12 12 x 1 x 1 1   k (2 x  1)  k{2 x  12}  k{12 13  12}  144k k 1 144 The cumulative distribution function F(x) of a random variable X is defined as F(x) = P(X ≤ x) for all real numbers x. Note: If X is a random variable with the space RX , then F(X)=∑𝒕≤𝐱 𝒇(𝒕) for x∈ RX , and t≤ x Example: If the probability density function of the random variable X 144 is given by f(x)= for x = 1, 2, 3, ..., 12 2x 1 Solution: The space of the random variable X is given by Rx = {1, 2, 3, ..., 12}. 1 F (1)   f (t )  f (1)  144 t 1 1 3 4 F (2)   f (t )  f (1)  f (2)    144 144 144 t 2 1 3 5 9 F (3)   f (t )  f (1)  f (2)  f (3)     144 144 144 144 t 3 Similarly F (12)   f (t )  f (1)  f (2)  f (3)     f (12)  1 t 12 Note: Let X be a random variable with cumulative distribution function F(x). Then the cumulative distribution function satisfies the followings: (a) F(−∞) = 0, (b) F(∞) = 1, and (c) F(x) is an increasing function, that is if x < y, then F(x) ≤ F(y) for all reals x, y. Example: Find the probability density function of the random variable X whose cumulative distribution function is Find P(X≤3), P(X=3), and P(X<3) Solution: The space of this random variable is given by RX = {−1, 1, 3, 5}. The probability density function of X is given by f(−1) = 0.25 f(1) = 0.50 − 0.25 = 0.25 f(3) = 0.75 − 0.50 = 0.25 f(5) = 1.00 − 0.75 = 0.25. The probability P(X ≤ 3) can be computed by using the definition of F. Hence P(X ≤ 3) = F(3) = 0.75. The probability P(X = 3) can be computed from P(X = 3) = F(3) − F(1) = 0.75 − 0.50 = 0.25. Finally, we get P(X < 3) from P(X < 3) = P(X ≤ 1) = F(1) = 0.5. Lecturer note -25 Continuous random variable and Probability distributions: All of the random variables discussed previously were discrete, meaning they can take only a _nite (or, at most, countable) number of values. However, many of the random variables seen in practice have more than a countable collection of possible values. For example, the proportions of impurities in ore samples may run from 0.10 to 0.80. Such random variables can take any value in an interval of real numbers. Since the random variables of this type have a continuum of possible values, they are called continuous random variables. Distribution Functions of Continuous Random Variables A random variable X is said to be continuous if its space is either an interval or a union of intervals. The function f(x) is a probability density function (PDF) for the continuous random variable X, defined over the set of real numbers R, if (a) f ( x)  0, forallx (b)    f ( x)dx  1 b (c) P(a  x  b)   f ( x)dx a Example: Is the real valued function f:R→R defined by a probability density function for some random variable X? Solution: We have to show that f is nonnegative and the area under f(x) is unity. Since the domain of f is the interval (0, 1), it is clear that f is nonnegative. Next, we calculate    2 1 f ( x)dx   2 x dx  2    1  x 1 1 2 2 Thus f is a probability density function. Example: For what value of the constant c, the real valued function f : R→ R given by where a, b are real constants, is a probability density function for random variable X? Solution: Since f is a pdf, k is nonnegative. Further, since the area under f is unity, we get  I   c So Note: b f ( x)dx   cdx  c  x a  c(b  a ) =1 b a 1 ba Let f(x) be the probability density function of a continuous random variable X. The cumulative distribution function F(x) of X is x defined as F ( x)  P( X  x)   f (t )dt  The cumulative distribution function F(x) represents the area under the probability density function f(x) on the interval (−∞, x) Note: If F(x) is the cumulative distribution function of a continuous random variable X, the probability density function f(x) of X is the d F ( x)  f ( x) derivative of F(x), that is dx By Fundamental Theorem of Calculus, we get This tells us that if the random variable is continuous, then we can find the pdf given cdf by taking the derivative of the cdf. Recall that for discrete random variable, the pdf at a point in space of the random variable can be obtained from the cdf by taking the difference between the cdf at the point and the cdf immediately below the point. Example: What is the probability density function of the random 1   x   1  e x Solution: The pdf of the random variable is given by d d 1 e x f ( x)  F ( x )  ( )  dx dx 1  e x (1  e x )2 Note: Let X be a continuous random variable whose cdf is F(x). Then followings are true: (a) P(X < x) = F(x), (b) P(X > x) = 1 − F(x), (c) P(X = x) = 0 , and (d) P(a < X < b) = F(b) − F(a). Note: We will say that the random variable X is symmetric with respect to the point c if the following conditions are satisfied: i) if c + a is a value of the random variable X, then c- a is also a value of the random variable X; ii) P(X = c + a) = P(X = c- a). The condition ( ii) can be rewritten as P(X- c = a) = P(c- X = a) which shows that "X is symmetric with respect to the point c". X- c and c- X have the same distribution. variable whose cdf is F ( x)  Expected values of Random Variables: One of the most important things we'd like to know about a random variable is: what value does it take on average? What is the average price of a computer? What is the average value of a number that rolls on a die? Expected value (mean) The mean or expected value of a discrete random variable mass function p(x) is given by E ( X )   xp( x) x with probability x We will sometimes use the notation E(X)=𝜇 Note: Expected value of a function If X is a discrete random variable with probability mass function p(x) and if g(x) is a real valued function of x, then E ( g ( X ))   g ( x) p( x) x Note: Variance given by The variance of a random variable X with expected value 𝜇 is V ( x)   2  E ( X   ) 2  E ( X 2 )   2 Where E ( X 2 )   x 2 p( x) x The variance defines the average (or expected) value of the squared difference from the mean. If we use V(X)= E ( X   )2 as a definition, we can see that V ( X )  E ( X   ) 2  E ( X 2  2 x   2 )  E ( X 2 )  2 E ( X )   2  E( X 2 )   2 Lecturer note -26 Mean , variance continued and Probability distributions: Note: Standard deviation The standard deviation of a random variable X is the square root of the variance, and is given by    2  E ( X   ) 2 The mean describes the center of the probability distribution, while standard deviation describes the spread. Larger values of 𝜎 signify a distribution with larger variation. This will be undesirable in some situations, e.g. industrial process control, where we would like the manufactured items to have identical characteristics. Note: For any random variable x and constants a and b, we have E (aX  b)  aE ( X )  b , V (aX  b)  a 2V ( X )  a 2 2 And for several random variables x1, x2,-------xk we have E(x1+x2+-----+xk)= E(x1)+E(x2)+--------+E(xk) Example: The number of fire emergencies at a rural county in a week, has the following distribution x 0 1 2 3 4 P(X = x) 0.52 0.28 0.14 0.04 0.02 Find E(X), V (X) and 𝜎 Solution: From Definition , we see that E(X) = 0(0.52) + 1(0.28) + 2(0.14) + 3(0.04) + 4(0.02) = 0.76 = 𝜇 and from definition of E(x2), we get E(x2) = 02(0.52) + 12(0.28) + 22(0.14) + 32(0.04) + 42(0.02) = 1.52 Hence, from Definition we get V ( X )  E ( X 2 )   2  0.9424 Now, from Definition the standard deviation 𝜎 =√0.9424 Example: Let X be a random variable having probability mass function given in the above Example. Calculate the mean and variance of g(X) = 4X + 3. Solution: In the above Example , we found E(X) = 𝜇= 0.88 and V (X) = 0.8456. E(g(X)) = 4E(X) + 3 = 4(0.88) + 3 = 3.52 + 3 = 6.52 and V (g(X)) = 42 V (X) = 16(0.8456) = 13.5296 Note: Moments of Random Variables The nth moment about the origin of a random variable X, as denoted by E(Xn), is defined to be If n = 1, then E(X) is called the first moment about the origin. If n = 2, then E(X2) is called the second moment of X about the origin. In general, these moments may or may not exist for a given random variable. If for a random variable, a particular moment does not exist, then we say that the random variable does not have that moment. Note; Let X be a random variable with space RX and probability density function f(x). The mean μX of the random variable X is defined as if the right hand side exists. The mean of a random variable is a composite of its values weighted by the corresponding probabilities. The mean is a measure of central tendency Example: If the probability density function of the random variable X Is then what is the expected value of X? Solution The expected value of X is Example: Let X have the density function For what value of k is the variance of X equal to 2 Solution: The expected value of X is So the variance is var(X)= E ( X 2 )  ( E ( X )) 2  1 2 k 18 Bernoulli Distribution: A Bernoulli trial is a random experiment in which there are precisely two possible outcomes, which we conveniently call ‘failure’ (F) and ‘success’ (S). We can define a random variable from the sample space {S, F} into the set of real numbers as follows: X(F) = 0 X(S) = 1. Note: The random variable X is called the Bernoulli random variable if its probability density function is of the form f(x) = px (1 − p)1-x, x = 0, 1 where p is the probability of success. Note: If X is a Bernoulli random variable with parameter p, then the mean, variance are  X    p, respectively given by  X 2   2  p (1  p ) The mean of the Bernoulli random variable is Similarly the variance of X is given by x 1 x 1 x 0 x 0  X 2   ( x   X ) 2 f ( x)    p(1  p) ( x  p) 2 p x (1  p)1 x Lecturer note -27 Binomial and Poission distributions: Binomial Distribution Consider a fixed number n of mutually independent Bernoulli trails. Suppose these trials have same probability of success, say p. A random variable X is called a binomial random variable if it represents the total number of successes in n independent Bernoulli trials. Now we determine the probability density function of a binomial random variable. Recall that the probability density function of X is defined as f(x) = P(X = x). Thus, to find the probability density function of X we have to find the probability of x successes in n independent trails. If we have x successes in n trails, then the probability of each n-tuple with x successes and n − x failures is px (1 − p)n-x. n However, there are   tuples with x successes and n − x failures in n trials.  x n Hence P( X  x)    p x (1  p) n  x  x Therefore, the probability density function of X is n f ( x)  P( X  x)    p x (1  p) n  x , x  0,1, , n  x Note: The Bernoulli trials are formally defined by the following properties: a) The result of each trial is either a success or a failure b) The probability of success p is constant from trial to trial. c) The trials are independent d) The random variable X is defined to be the number of successes in n repeated trials The mean and variance of the binomial distribution are E(X) = 𝜇 = np and V(X) = 𝜎 2= npq: Example: On a five-question multiple-choice test there are five possible answers, of which one is correct. If a student guesses randomly and independently, what is the probability that she is correct only on questions 1 and 4? Solution: Here the probability of success is p = 1/5 , and thus 1 − p = 4/5 . Therefore, the probability that she is correct on questions 1 and 4 is P(correct on questions 1 and 4) =p2(1 − p)3= 0.02048 Example: On a five-question multiple-choice test there are five possible answers, of which one is correct. If a student guesses randomly and independently, what is the probability that she is correct only on two questions? Solution: Here the probability of success is p = 1/5 , and thus 1−p = 4/5 5 . There are   different ways she can be correct on two questions  2 Therefore, the probability that she is correct on two questions is 5 P(correct on two questions) =   p2(1 − p)3=0.2048  2 Example: What is the probability of rolling two sixes and three non sixes in 5 independent casts of a fair die? Solution: Let the random variable X denote the number of sixes in 5 independent casts of a fair die. Then X is a binomial random variable with probability of success p and n = 5. The probability of getting a six is p = 1/6 .  5 1 2 53 f (2)  P( X  2)     0.160751 Hence  2 6 6 Example: What is the probability of rolling at most two sixes in 5 independent casts of a fair die? Solution: Let the random variable X denote number of sixes in 5 independent casts of a fair die. Then X is a binomial random variable with probability of success p and n = 5. The probability of getting a six is p = 1/6 . Hence, the probability of rolling at most two sixes is P(X ≤ 2) = F(2) = f(0) + f(1) + f(2) 0 5 1 4 2 3  5  1   5   5 1   5   5  1   5                        0.9577  0 6   6  1 6   6   2 6   6  Poisson Distribution: It is often useful to define a random variable that counts the number of events that occur within certain specified boundaries. For example, the average number of telephone calls received by customer service within a certain time limit. The Poisson distribution is often appropriate to model such situations. A random variable X is said to have a Poisson distribution if its probability density e   x function is given by f ( x)  , x  0,1, 2     x! where 0 < 𝜆 < 1 is a parameter. Mean and variance of Poisson RV For Poisson RV with parameter 𝜇 E(X) = V (X) = 𝜇 E(X)= Similarly V(X)= Where Example: A random variable X has a Poisson distribution with a mean of 3. What is the probability that X is bounded by 1 and 3, that is, P(1 ≤ X ≤ 3)? Solution: μX = 3 =𝛌 e   x , x  0,1, 2     x! e3 3x f ( x)  , x  0,1, 2     x! f ( x)  Hence So P(1 ≤ X ≤ 3) = f(1) + f(2) + f(3)=12e-3 Note: Poisson distribution is the limiting form of binomial distribution when the number of trials n becomes sufficiently large and the probability p of success in a trial is very small. Example: The number of traffic accidents per week in a small city has a Poisson distribution with mean equal to 3. What is the probability of exactly 2 accidents occur in 2 weeks? Solution: The mean traffic accident is 3. Thus, the mean accidents in two weeks are 𝜆 = (3) (2) = 6 e   x , x  0,1, 2     Since f ( x)  x! we get f(2) = 18 e-6 Example: During a laboratory experiment, the average number of radioactive particles passing through a counter in one millisecond is 4. What is the probability that 6 particles enter the counter in a given millisecond? What is the probability of at least 6 particles? Solution: Using the Poisson distribution with x = 6 and 𝜇 = 4, we get P(X = 6) = P(5 < X ≤ 6) = F(6) - F(5). Lecturer note -28 Poission distributions continued and Hypergeometric distribution: Poisson approximation for Binomial Poisson distribution was originally derived as a limit of Binomial when n → ∞ while p = 𝜇/n, with fixed 𝜇. We can use this fact to estimate Binomial probabilities for large n and small p. Example: At a certain industrial facility, accidents occur infrequently. It is known that the probability of an accident on any given day is 0.005 and the accidents are independent of each other. For a given period of 400 days, what is the probability that (a) there will be an accident on only one day? (b) there are at most two days with an accident? Solution: Let X be a binomial random variable with n = 400 and p = 0.005. Thus 𝜇 = np = (400)(0.005) = 2. Using the Poisson approximation, we have e2 21 P(X = 1) = f (1)   0.271 1! P(X ≤ 2) = P(X = 0)+P(X = 1)+P(X = 2)=f(0)+f(1)+f(2) = 0.1353 + 0.271 + 0.271 = 0.6766 Hypergeometric distribution: Consider the Hypergeometric experiment, that is, one that possesses the following two properties: a) A random sample of size n is selected without replacement from N items. b) Of the N items overall, k may be classified as successes and N- k are classified as failures. We will be interested, as before, in the number of successes X, but now the probability of success is not constant For a hypergeometric random variable X, the number of successes in a random sample of size n selected from N items of which k are labeled success and N- k labeled failure, is f(x) Note: The mean and variance of a hypergeometric distribution are 𝜇= n 𝑘 𝑁 and 𝝈𝟐 = n 𝒌 𝑵 𝒌 𝑵−𝒏 (1-𝑵)(𝑵−𝟏) Example: Lots of 40 components each are called unacceptable if they contain as many as 3 defectives or more. The procedure for sampling the lot is to select 5 components at random and to reject the lot if a defective is found. What is the probability that exactly 1 defective is found in the sample if there are 3 defectives in the entire lot? Solution: Using the above distribution with n = 5,N = 40, k = 3 and x = 1, we can find the probability of obtaining one defective to be  3  37   40  f (1; 40,5,3)     /    0.3011  1  4   5  Example: A shipment of 20 tape recorders contains 5 that are defective. If 10 of them are randomly chosen for inspection, what is the probability that 2 of the 10 will be defective? Solution: Substituting x = 2, n = 10, k = 5, and N = 20 into the formula, we  5 15   20  get f (2)  P( X  2)     /    0.348  2  8   10  Example: Suppose there are 3 defective items in a lot of 50 items. A sample of size 10 is taken at random and without replacement. Let X denote the number of defective items in the sample. What is the probability that the sample contains at most one defective item? Solution: Clearly, X = HY P(3, 47, 10). Hence the probability that the sample contains at most one defective item is  3  47   50   3  47   50  P(X ≤ 1) = P(X = 0) + P(X = 1)     /       /   =0.504+0.4=0.904  0  10   10   1  9   10  Example: A radio supply house has 200 transistor radios, of which 3 are improperly soldered and 197 are properly soldered. The supply house randomly draws 4 radios without replacement and sends them to a customer. What is the probability that the supply house sends 2 improperly soldered radios to its customer? Solution: The probability that the supply house sends 2 improperly soldered radios to  3 197   200  its customer is P( X  2)    /   0.000895  2  2   4  Example: A random sample of 5 students is drawn without replacement from among 300 seniors, and each of these 5 seniors is asked if she/he has tried a certain drug. Suppose 50% of the seniors actually have tried the drug. What is the probability that two of the students interviewed have tried the drug? Solution: Let X denote the number of students interviewed who have tried the drug. Hence the probability that two of the students interviewed have 150 150   300  tried the drug is P( X  2)    /   0.3146  2  3   5  Lecturer note -29 Continuous probability distributions , Uniform and normal distribution: All of the random variables discussed previously were discrete, meaning they can take only a finite (or, at most, countable) number of values. However, many of the random variables seen in practice have more than a countable collection of possible values. For example, the proportions of impurities in ore samples may run from 0.10 to 0.80. Such random variables can take any value in an interval of real numbers. Since the random variables of this type have a continuum of possible values, they are called continuous random variables. The function f(x) is a probability density function (PDF) for the continuous random variable X, defined over the set of real numbers R, if Note: The cumulative distribution function (CDF) F(x) of a continuous random variable X, with density function f(x), is We have folowing two results follows Uniform distribution: One of the simplest continuous distributions is the continuous uniform distribution. This distribution is characterized by a density function that is flat and thus the probability is uniform in a definite interval, say [a, b]. The density function of the continuous uniform random variable X on the interval [a, b] Is The commulative distribution function for uniform distribution is defined as 1 xa dt  ,a  x  b ba ba a x F ( x)   Mean and variance of Uniform distribution: b b a a 1 1  xf ( x)dx  x b  adx  2 (b  a) Mean=E(X)= 1 (a  b) 2 Variance= Var ( X )  E ( X 2 )  ( E( X )) 2  (b2  ab  a 2 )  3 4 b b a a Where E ( X 2 )   x 2 f ( x)dx   x 2 1 1 dx  (b 2  ab  a 2 ) ba 3 Example: If X has a uniform distribution on the interval from 0 to 10, 10 then what is P(X+ 𝑋 ≥7) Solution: the probability density function of X is f(x)= 1 10 for 0≤ 𝑥 ≤10 10  7)  P( x 2  10  7 x)  P( x  2orx  5) x 5 1 7  1  P(2  x  5)  1   dx  10 10 2 Normal distribution: The most widely used of all the continuous probability distributions is the normal distribution (also known as Gaussian). It serves as a popular model for measurement errors, particle displacements under Brownian motion, stock market uctuations, human intelligence and many other things. It is also used as an approximation for Binomial (for large n) and Gamma (for large 𝛼) distributions. P( x  The normal density follows the well-known symmetric bell-shaped curve. The curve is centered at the mean value 𝜇 and its spread is, of course, measured by the standard deviation 𝜎. These two parameters, 𝜇 and 𝜎 2 completely determine the shape and center of the normal density function. Note: A random variable X is said to have a normal distribution if its probability density function is given by f ( x)  1 x 2  ( ) 1 e 2  ,   x    2 It will be denoted as X N ( ,  2 ) Note: The normal random variable Z with 𝜇= 0 and 𝜎 = 1 is said to have the standard normal distribution. Direct integration would show that E(Z) = 0 and V(Z) = 1: Usefulness of Z We are able to transform the observations of any normal random variable X to a new set of observations of a standard normal random variable Z. x This can be done by means of the transformation Z   1 x 2  ( ) 1 Example: Is the real valued function defined by f ( x)  e 2  ,   x    2 a probability density function of some random variable X? Solution: To answer this question, we must check that f is nonnegative and it integrates to 1. The nonnegative part is trivial since the exponential function is always positive. Hence using property of the gamma function, we show that f integrates to 1 on R. Note: A normal random variable is said to be standard normal, if its mean is zero and variance is one. We denote a standard normal random variable X by X ~ N(0, 1). The probability density function of standard normal distribution is the following: x2  1 e 2 ,   x   2 Example: If X ~ N(0, 1), what is the probability of the random variable X less than or equal to −1.72? Solution: P( X  1.72)  1  P ( X  1.72)  1  0.9573  0.0427 f ( x)  Lecturer note -30 Normal distribution continued and joint distribution: Example: If Z ~ N(0, 1), what is the value of the constant c such that P(| z | c)  0.95 0.95  P(| z | c)  P(c  z  c) Solution:  P ( z  c )  P ( z  c )  2 P ( z  c )  1 So P(z≤c)=0.975 𝑥−𝜇 Note: If X ~ N(μ, 𝜎 2 ), then the random variable Z = 𝜎 ~ N(0, 1). We will show that Z is standard normal by finding the probability density function of Z. We compute the probability density of Z by cumulative distribution function method. Example: If X ~ N(3, 16), then what is P(4 ≤ X ≤ 8)? 43 x 3 83 P(4  x  8)  P(   )  P( z  1.25)  P( z  0.25) Solution: 4 4 4  0.8944  0.5987  0.2957 Example: If X ~ N(25, 36), then what is the value of the constant c such that P(|x-25|≤c)=0.9544 Solution: c x  25 c 0.9544  P(| x  25 | c)  P(   ) 6 6 6 c c c  P( z  )  P( z  )  2 P( z  )  1 6 6 6 c P( Z  )  0.9772 6 Example: Given a random variable X having a normal distribution with 𝜇 = 300 and 𝜎 = 50. Find the probability that X is greater than 362. Solution: To find P(X > 362), we need to evaluate the area under the normal curve to the right of x = 362. This can be done by transforming x = 362 to the corresponding Z-value. We get x   362  300 z   1.24  50 Hence P(X > 362) = P(Z > 1.24) = P(Z < -1.24)=0.1075 Example: A diameter X of a shaft produced has a normal distribution with parameters 𝜇 = 1.005; 𝜎 = 0.01. The shaft will meet specifications if its diameter is between 0.98 and 1.02 cm. Which percent of shafts will not meet specifiations? Solution: 1 - P(0.98 < X < 1.02) 0.98  1.005 1.02  1.05  1  P( z )  1  (0.4938  0.4332)  0.0730 0.01 0.01 Note: The famous 68% - 95% rule For a Normal population, 68% of all values lie in the interval [𝜇 − 𝜎, 𝜇 + 𝜎], and 95% lie in [𝜇 − 2𝜎, 𝜇 + 2𝜎]. In addition, 99.7% of the population lies in [𝜇 − 3𝜎, 𝜇 + 3𝜎]. Example: Let X = monthly sick leave time have normal distribution with parameters 𝜇 = 200 hours and 𝜎 = 20 hours. a) What percentage of months will have sick leave below 150 hours? b) What amount of time x0 should be budgeted for sick leave so that the budget will not be exceeded with 80% probability? Solution: (a) P(X < 150) = P(Z < -2.25) = 0.5 – 0.4938 = 0.0062 (b) P(X < x0) = P(Z < z0) = 0.8, which leaves a table area for z0 of 0.3. Thus, z0 = 0.84 and hence x0 = 200 + 20(0.84) = 216.8 hours Normal approximation to Binomial: As another example of using the Normal distribution, consider the Normal approximation to Binomial distribution. This will be also used when discussing sample proportions. Note: If X is a Binomial random variable with mean 𝜇 = np and variance 𝜎 2 =npq, X  np then the random variables Z n  approach the standard Normal as n gets large. npq We already know one Binomial approximation (by Poisson). It mostly applies when the Binomial distribution in question has a skewed shape, that is, when p is close to 0 or 1. When the shape of Binomial distribution is close to symmetric, the Normal appoximation will work better. Practically, we will require that both np and n(1 - p) ≥ 5. Lecturer note -31 Joint Probability distribution: Example: The probability that a patient recovers from a rare blood disease is 0.4. If 100 people are known to have contracted this disease, what is the probability that at most 30 survive? Solution: Let the binomial variable X represent the number of patients that survive. Since n = 100 and p = 0.4, we have 𝜇 = np = (100)(0.4) = 40 And 𝜎 2 = npq = (100)(0.4)(0.6) = 24; To obtain the desired probability, we compute z-value x  1.94 and the probability of fewer than 30 of the 100 for x = 30.5. Thus, Z   patients surviving is P(X <30) = P(Z <-1.94) = 0.5 – 0.4738 = 0.0262. Example: Suppose X is Binomial with parameters n = 15, and p = 0.4, we are interested in the probability that X assumes a value from 7 to 9 inclusive, that is, P(7 ≤X ≤ 9): Solution: The exact probability is given by 9 P(7  x  9)   bin( x;15, 0.4)  0.1771  0.1181  0.0612  0.3564 7 For Normal approximation we find the area between x1 = 6.5 and x2 = 9.5 x  np x1   9.5  6 using z-values which are z1  1   0.26, z2   1.85  1.897 npq Adding or removing 0.5 is called continuity correction. It arises when we try to approximate a distribution with integer values (here, Binomial) through the use of a continuous distribution (here, Normal). the sum over the discrete set 7 ≤X≤ 9 is approximated by the integral of the continuous density from 6.5 to 9.5. P(7 ≤X≤ 9 ) = P(0.26 < Z < 1.85) = 0.4678 -0.1026 = 0.3652 Therefore, the normal approximation provides a value that agrees very closely with the exact value of 0.3564. The degree of accuracy depends on both n and p. The approximation is very good when n is large and if p is not too near 0 or 1. Distribution of several random variables: There are many random experiments that involve more than one random variable. For example, an educator may study the joint behavior of grades and time devoted to study; a physician may study the joint behavior of blood pressure and weight. Similarly an economist may study the joint behavior of business volume and profit. In fact, most real problems we come across will have more than one underlying random variable of interest. Bivariate Discrete Random Variables A discrete bivariate random variable (X, Y ) is an ordered pair of discrete random variables. If X and Y are two discrete random variables, the probability that X equals x while Y equals y is described by p(x, y) = P(X = x, Y = y). That is, the function p(x, y) describes the probability behavior of the pair ( X, Y) Note: A real valued function f of two variables is a joint probability density function of a pair of discrete random variables X and Y (with range spaces RX and RY , respectively) if and only if (a) f ( x, y )  0, ( x, y )  RX  Ry  f ( x, y )  1 yR X Y Bivariate Continuous Random Variables: The joint probability density function of the random variables X and Y is an integrable function f(x, y) such that (a) f ( x, y )  0 forall ( x, y)  R 2 (b)  xR   (b)   f ( x, y )dxdy  1   Example: For what value of the constant k the function given by following is a joint probability density function of some random variables X and Y ? 3 3 3 3 Solution: 1   f ( x, y)  kxy  k (1  2  3  2  4  6  3  6  9)  36k x 1 y 1 x 1 y 1 So k=1/36 Example Let the joint density function of X and Y be given as follows, what is the value of constant k Solution:   1    y 1 f ( x, y)dxdy   ky 0 2  xdxdy 0 =k/10 So k=10 Note: If we know the joint probability density function f of the random variables X and Y , then we can compute the probability of the event A from P( A)    f ( x, y)dxdy A Example: Let the joint density of the continuous random variables X and Y as follows What is the probability of the event X≤y Solution: Let A = (X ≤ Y ). we want to find Note: Let (X, Y ) be a continuous bivariate random variable. Let f(x, y) be the joint probability density function of X and Y . The function  f1 ( x)   f ( x, y )dy is called the marginal probability density function of X. Similarly,   the function f 2 ( y )   f ( x, y )dx is called the marginal probability density function of Y  Example: If the joint density function for X and Y is given by then what is the marginal density function of X, for 0 < x < 1? Solution: The domain of the f consists of the region bounded by the curve x = y2 and the vertical line x = 1. Lecturer note -32 Joint Probability distribution continued and related problems: Note: Let X and Y be the continuous random variables with joint probability density function f(x, y). The joint cumulative distribution function F(x, y) of X and Y is defined as y F ( x, y )  P ( X  x , Y  y )  x  f (u , v)dudv   From the fundamental theorem of calculus, we again obtain f ( x, y)  2 F xy Example: If the joint cumulative distribution function of X and Y is given by then what is the joint density of X and Y ? 1   6 Solution: f ( x, y )  (2 x3 y  3x 2 y 2 )  ( x 2  2 xy ) 5 x y 5 Hence, the joint density of X and Y is given by Example Solution: Covariance of Bivariate Random Variables First, we define the notion of product moment of two random variables and then using this product moment, we give the definition of covariance between two random variables. Let X and Y be any two random variables with joint density function f(x, y). The product moment of X and Y , denoted by E(XY ), is defined as Here, RX and RY represent the range spaces of X and Y respectively Note: Let X and Y be any two random variables with joint density function f(x, y). The covariance between X and Y , denoted by Cov(X, Y ) (or 𝜎 XY), is defined as Cov(X, Y ) = E( (X – μX) (Y – μY ) ), where μX and μY are mean of X and Y , respectively The covariance helps us assess the relationship between two variables. Positive covariance means positive association between X and Y meaning that, as X increases, Y also tends to increase. Negative covariance means negative association. Note: The mean of μX is given by  x  E ( X )      xf ( x)dx    xf ( x, y)dxdy 1  Similarly the mean of μY is given by  y  E (Y )        yf ( y)dy    yf ( x, y)dxdy 2    Note: Let X and Y be any two random variables. Then Cov(X, Y ) = E(XY ) − E(X)E(Y ). Cov(X, Y ) = E((X – μX) (Y – μY )) = E(XY – μX Y – μY X + μX μY )= E(XY ) – μX E(Y ) – μY E(X) + μX μY = E(XY ) – μX μY – μY μX + μX μY = E(XY ) – μX μY = E(XY ) − E(X)E(Y ). Example: Let X and Y be discrete random variables with joint density What is the covariance 𝜎 XY between X and Y . x  2y 1  (2 x  6) 18 18 y 1 2 28 Hence the expected value of X is E ( X )   xf1 ( x)  18 x 1 2 x  2y 1 Similarly, the marginal of Y is f 2 ( y )    (3  4 y ) 18 18 x 1 2 Solution: The marginal of X is f1 ( x)   y 2 29 18 y 1 Further the product moment of X and Y is given by E ( XY ) x  2 y  2 xyf ( x, y )  f (1,1)  2 f (1, 2)  2 f (2,1)  4 f (2, 2) =45/18   Hence the expected value of Yis E ( y)   yf 2 ( y)  x 1 y 1 Hence, the covariance between X and Y is given by Cov(X, Y ) = E(XY ) − E(X)E(Y )=-0.00617 Note: For an arbitrary random variable, the product moment and covariance may or may not exist. Further, note that unlike variance, the covariance between two random variables may be negative. Note: If X and Y are independent random variables, then E(XY ) = E(X)E(Y ). Recall that X and Y are independent if and only if f(x, y) = f1(x) f2(y). Let us assume that X and Y are continuous. Therefore If X and Y are discrete, then replace the integrals by appropriate sums to prove the same result. Note: If X and Y are independent random variables, then the covariance between X and Y is always zero, that is Cov(X, Y ) = 0. Of course, if covariance is 0, then so is the correlation coefficient. Such random variables are called uncorrelated. The inverse of this Note is not true, meaning that zero covariance does not necessarily imply independence. Variance of sums If X and Y are random variables and U = aX + bY + c, then V (U) = V (aX + bY + c) = a2 V (X) + b2 V (Y ) + 2ab Cov(X; Y ) If X and Y are independent then V (U) = V (aX + bY ) = a2 V (X) + b2 V (Y ) Example: If X and Y are random variables with variances V (X) = 2, V (Y ) = 4, and covariance Cov(X; Y ) = -2, and the variance of the random variable Z = 3X - 4Y + 8. Solution: V (Z) = V (3X - 4Y + 8) = 9V (X) + 16V (Y ) - 24Cov(X; Y ) so V (Z) = (9)(2) + (16)(4) - 24(-2) = 130: Lecturer note -33 Mathematical statistics: What is Statistics? Statistics and probabilities are two strongly connected, but still distinct fields of mathematics. It is said that "probabilities are the vehicle of statistics". This is true, meaning that if it weren't for the probabilistic laws, statistics wouldn't be possible. To illustrate the difference between probabilities and statistics, let us consider two boxes: a probabilistic and a statistical one. For the probabilistic box we know that it contains 5 white, 5 black and 5 red balls; the probabilistic problem is that if we take a ball, what is the chance that it were white? For a statistical box we do not know the combination of balls in the box. The probability sets the question of the chance that something (an event) happens when we know the probabilities (we know the population). Statistics asks us to make a sample, to analyze it and then make a prediction concerning the population, based on the information provided by the sample. Basics: The population is a collection (a set) of individuals, objects or numerical data obtained by measurements, whose properties need to be analyzed. Note: The population is the complete collection of individuals, objects or numerical data obtained by measurements and which are of interest (to the one collecting the sample). In statistics the population concept is fundamental. The population has to be carefully defined and is considered completely defined only if the member list is specified. The set of the Mathematics and Informatics' students is a well defined population. Usually if we hear the word population, we think of a set of people. In statistics, the population can be a set of animals, of manufactured objects or of numerical data obtained through measurements. For example, the set of the "heights" of the students of the Faculty of Mathematics and Informatics is a population. Note: The sample is a subset of a population. A sample is made of individuals, objects or measured data selected from a Population Note: A response variable (or simply variable) is a characteristic (usually a numerical one) which is of interest for each element (individual) of a population. Example: The age of the student, his grade point average, his hair color, his height, are answer variables for the population: the students from the Faculty of Mathematics and Informatics. Note: A simple random sample (SRS) is a sample for which each object in the population has the same probability to be picked as any other object, and is picked independently of any other object. Sample mean and variance: The easiest and most popular summary for a data set is its mean X . The mean is a measure of location for the data set. We often need also a measure of spread. One such measure is the sample standard deviation. Sample variance and standard deviation: The sample variance is denoted as S2 and equals to n S2  ( Xi  X ) i 1 2 n   ( X i 2 )  nX 2 i 1 n 1 n 1 Sample standard deviation S is the square root of S2. A little algebra may show that both expressions in the above formula are equivalent. Denominator in the formula is n- 1 which is called degrees of freedom. A simple explanation is that the calculation starts with n numbers and is then constrained by finding X ., thus n-1 degrees of freedom are left. Note that if n = 1 then the calculation of sample variance is not possible. Example: The heights of last 8 US presidents are (in cm)k : 185, 182, 188, 188, 185, 177, 182, 193. Find the mean and standard deviation of these heights. Solution: The average height is X . = 185. To make the calculations more compact, let's subtract 180 from each number, as it will not affect the standard deviation: 5, 2, 8, 8, 5,-3, 2, 13, and X . = 5. Then,  X i 2  364 We get S 2  364  25  8  23.43 and S=4.84 7 Statistical inference: In previous sections we emphasized properties of the sample mean. In this section we will discuss the problem of estimation of population parameters, in general. A point estimate of some population parameter 𝜃 is a single value 𝜃̂ of a statistic. For example, the value X is the point estimate of population parameter 𝜇. One of the basic problems is how to find an estimator of population parameter 𝜃. The numerical value of this statistic is called an estimate of 𝜃. The estimator of the parameter 𝜃 is denoted by 𝜃̂ Maximum Likelihood Method: The maximum likelihood method was first used by Sir Ronald Fisher in 1922 (see Fisher (1922)) for finding estimator of a unknown parameter. However, the method originated in the works of Gauss and Bernoulli. Lecturer note -34 Maximum likelihood method: Let X1,X2, ...,Xn be a random sample from a population X with probability density function f(x; 𝜃), where 𝜃 is an unknown parameter. The likelihood function, L(𝜃), is the distribution of the sample n That is L( )   f ( xi ; ) i 1 This definition says that the likelihood function of a random sample X1,X2, ...,Xn is the joint density of the random variables X1,X2, ...,Xn. The 𝜃 that maximizes the likelihood function L(𝜃) is called the maximum likelihood estimator of 𝜃, and it is denoted by by 𝜃̂ . Example: If X1,X2, ...,Xn is a random sample from a distribution with density function as follows. What is the maximum likelihood estimate of the parameter 𝜃. Solution: The likelihood function of the sample is given by n L( )   f ( xi ; ) i 1 So Example: What is the basic principle of maximum likelihood estimation? Solution: To choose a value of the parameter for which the observed data have as high a probability or density as possible. In other words a maximum likelihood estimate is a parameter value under which the sample data have the highest probability. The simple binomial model: Lecturer note -35 Confidence interval: The confidence interval (CI) or interval estimate is an interval within which we would expect to find the “true" value of the parameter. Interval estimates, say, for population mean, are often desirable because the point estimate X . varies from sample to sample. Instead of a single estimate for the mean, a confidence interval generates a lower and an upper bound for the mean. The interval estimate provides a measure of uncertainty in our estimate of the true mean 𝜇. The narrower the interval, the more precise is our estimate. Confidence limits are evaluated in terms of a confidence level. Although the choice of confidence level is somewhat arbitrary, in practice 90%, 95%, and 99% intervals are often used, with 95% being the most commonly used. Interval Estimators and Confidence Intervals for Parameters: The interval estimation problem can be stated as follow: Given a random sample X 1,X 2, ...,Xn and a probability value 1 − 𝛼, find a pair of statistics L = L(X1,X2, ...,Xn) and U = U(X1,X2, ...,Xn) with L ≤ U such that the probability of 𝜃 being on the random interval [L, U] is 1 − 𝛼. That is P (L ≤ 𝜽 ≤ U) = 1 − 𝜶. The random variable L is called the lower confidence limit and U is called the upper confidence limit. The number (1−𝛼) is called the confidence coefficient or degree of confidence. The interval [l, u] will be denoted as an interval estimate of 𝜃 whereas the random interval [L,U] will denote the interval estimator of 𝜃. Notice that the interval estimator of 𝜃 is the random interval [L, U]. Next, we define the 100(1 − 𝛼)% confidence interval for the unknown parameter 𝜃. CI for the mean: If X is the mean of a random sample of size n from a normal population with known variance 𝜎 2 , an approximate (1 -𝛼)100% confidence interval for   X C    X C whereC  Z  𝜇 is given by n n 2  where Z is the Z-value leaving an area of to the right.  2 2 Note: a 95% confidence interval does not mean that there is a 95% probability that the interval contains the true mean. The interval computed from a given sample either contains the true mean or it does not. Instead, the level of confidence is associated with the method of calculating the interval. For example, for a 95% confidence intererval, if many samples are collected and a confidence interval is computed for each, in the long run about 95% of these intervals would contain the true mean. X  Note: Using Central limit theory we have Z   n   So P( z   X   z   )  1  2 2 n Note: If 𝜎 is unknown, it can replaced by S, the sample standard deviation, with no serious loss in accuracy for the large sample case. Later, we will discuss what happens for small samples. Example: The drying times, in hours, of a certain brand of latex paint are 3.4 2.5 4.8 2.9 3.6 2.8 3.3 5.6 3.7 2.8 4.4 4.0 5.2 3.0 4.8 Compute the 95% confidence interval for the mean drying time. Assume that 𝜎 = 1. Solution: We compute X = 3.79 and z 1:96 2 (𝛼= 0.05, upper-tail probability = 0.025, table area = 0.5-0.025 = 0.475) Then, using (8.1), the 95% C.I. for the mean is 3.79  1(1.96) / 15 Example: An important property of plastic clays is the amount of shrinkage on drying. For a certain type of plastic clay 45 test specimens showed an average shrinkage percentage of 18.4 and a standard deviation of 1.2. Estimate the \true" average shrinkage 𝜇 for clays of this type with a 95% confidence interval. Solution: For these data, a point estimate of 𝜇 is X = 18.4. The sample standard deviation is S = 1:2. Since n is fairly large, we can replace 𝜎 by S. Hence, 95% confidence interval for 𝜇 is 1.2 1.2 18.4  1.96    18.4  1.96  (18.05,18.75) 45 45 Thus we are 95% confident that the true mean lies between 18.05 and 18.75 Example: The average zinc concentration recovered from a sample of zinc measurements in 36 different locations in the river is found to be 2.6 milligrams per liter. Find the 95% and 99% confidence intervals for the mean zinc concentration 𝜇. Assume that the population standard deviation is 0.3.Solution: The point estimate of 𝜇 is X = 2:6. For 95% confidence, z =1.96. Hence, the 95% confidence interval is 2 0.3 0.3    2.6  1.96  (2.50, 2.70) Similarly 99% confidence interval 36 36 0.3 0.3 2.6  2.575    2.6  2.575  (2.47, 2.73) 36 36 2.6  1.96 Lecturer note -36 Confidence interval continued and testing of hypothesis: Sample size calculations: In practice, another problem often arises: how many data should be collected to determine an unknown parameter with a given accuracy? That is, let m be the desired size of the margin of error, for a given confidence level 100%(1-𝛼) we have  m   z n 2 Since CI has the following structure X - m where m is called margin of error. Our question ,What is the sample size n to achieve this goal? To do this, assume that some estimate of 𝜎 is available z  Then, solving for n, we have n  ( 2 )2 m Example: We would like to estimate the pH of a certain type of soil to within 0.1, with 99% confidence. From past experience, we know that the soils of this type usually have pH in the 5 to 7 range. Find the sample size necessary to achieve our goal. Solution: Let us take the reported 5 to 7 range as the ±2𝜎 range. This way, the crude estimate of 𝜎is (7 -5)/4 = 0.5. For 99% confidence, we  find the upper tail area = (1 – 0.99)/2 = 0.005, thus z = 2.576, and 2 2 n = (2.576 x 0.5/0.1)2 = 166 Example: In sampling from a nonnormal distribution with a variance of 25, how large must the sample size be so that the length of a 95% confidence interval for the mean is 1.96 ? Solution: The confidence interval when the sample is taken from a normal   population with a variance of 25 is X  C    X C whereC  Z  n n 2 Thus the length of the confidence interval is 2 25  n  100 n n 2 So far, we have discussed the method of construction of confidence interval for the parameter population mean when the variance is known. It is very unlikely that one will know the variance without knowing the population mean, and thus what we have treated so far in this section is not very realistic. Now we treat case of constructing the confidence interval for population mean when the population variance is also unknown. l  2 z  1.96  2(1.96) Note: Suppose X1,X2, ...,Xn is random sample from a normal population X with mean μ and variance S2 > 0. Let the sample mean and sample variances be X and S2 respectively. Then the 100(1 − 𝛼)% confidence interval for μ when the population X is normal with the unknown variance 𝜎 2 is given by  S   S  [X    t (n  1), X    t (n  1)  n 2  n 2 Where t is the t distribution with (n-1) degrees of freedom. Example A random sample of 9 observations from a normal population yields the observed 1 9 statistics x = 5 and  ( x i  x) 2  36 What is the 95% confidence interval for μ ? 8 i 1 Solution: Since n = 9 x = 5 S2 = 36 and 1 − 𝛼 = 0.95, the 95% confidence interval for μ is given by [5  ( 6 6 )t0.025 (8),5  ( )t0.025 (8)] 9 9 6 6 )2.306,5  ( )2.306]  [0.388,9.612] 9 9 Confidence Interval for Population Variance Let X1,X2, ...,Xn be a random sample from a normal population X with known mean μ and unknown variance 𝜎 2 We would like to construct a 100(1 − 𝛼)% confidence interval for the variance𝜎 2 , that is, we would like to find the estimate of L and U such that P( L   2  U )  1   Therefore, the (1 − 𝛼)% confidence interval for 𝜎 2 when mean is known is  [5  ( Example: A random sample of 9 observations from a normal pop- Solution: Lecturer note -37 Testing of hypothesis: Statistical hypotheses A Statistical hypothesis is an assertion or conjecture concerning one or more populations. The goal of a statistical hypothesis test is to make a decision about an unknown parameter (or parameters). This decision is usually expressed in terms of rejecting or accepting a certain value of parameter or parameters. Some common situations to consider: Is the coin fair? That is, we would like to test if p = 1/2 where p = P(Heads). Is the new drug more effective than the old one? In this case, we would like to compare two parameters, e.g. the average effectiveness of the old drug versus the new one. In making the decision, we will compare the statement (say, p = 1/2) with the available data and will reject the claim p = 1/2 if it contradicts the data. In the subsequent sections we will learn how to set up and test the hypotheses in various situations. Null and alternative hypotheses A statement like p = 1/2 is called the Null hypothesis (denoted by H0). It expresses the idea that the parameter (or a function of parameters) is equal to some fixed value. For the coin example, it's H0 : p = 1/2 and for the drug example it's H0 : 𝜇1 = 𝜇2 where 𝜇1 is the mean effectiveness of the old drug compared to 𝜇2 for the new one. Alternative hypothesis (denoted by HA) seeks to disprove the null. For example, we may consider two-sided alternatives HA : p ≠ 1/2 or, in the drug case, HA : 𝜇1 ≠ 𝜇2 Hypothesis tests of a population mean A null hypothesis H0 for the population mean 𝜇 is a statement that designates the value 𝜇0 for the population mean to be tested. It is associated with an alternative hypothesis HA, which is a statement incompatible with the null. A two-sided (or two-tailed) hypothesis setup is H0 : 𝜇 = 𝜇0 versus HA : 𝜇 ≠ 𝜇0 for a specified value of 𝜇0 and a one-sided (or one-tailed) hypothesis setup is either H0 : 𝜇 = 𝜇0 versus HA : 𝜇 > 𝜇0 (right-tailed test) or H0 : 𝜇 = 𝜇0 versus HA : 𝜇 < 𝜇0 (left-tailed test) Error in testing Hypothesis There are two source of error in hypothesis testing.We can either commit an error of type I or error of type II. The first arises if we reject the null hypothesis even though it is true where as the second refers to the event of failing to reject a false null. So we have A type I error: occurs when H0 is true and H0 is rejected A type II error: occurs when H0 is false but we fail to reject it The probability of the type I error is called 𝛼 and the one of the type II error, 𝛽: The level of significance is the probability 𝛼 to commit a type I error, that is to reject H0 when it is true. Currently 𝛼is given from the beginning, and it determines the critical region. For the example, if 𝛼 = 0.033, then from P(x ≥ 5) = 0.0327 we have that the critical region is x = 5, 6, 7, 8, 9, 10. Note: The critical region is the set of values (W) for which P(X ∈ W) ≥ 𝛼 and it determines us to reject the null hypothesis H0. The critical value is the first value from the critical region. Hypothesis test: A classical approach In this section we present the hypothesis testing for assertions regarding the mean ¹ of a population. To simplify this presentation, we first suppose that the standard deviation 𝜎 of the population is known. The following three examples refer to different formulations of the null hypothesis H0 and of the alternative hypothesis HA Example: An ecologist claims that Timi'soara has an air pollution problem. Specifically, he claims that the mean level of carbon monoxide in downtown air is higher than 4; 9/`106= the normal mean value. Solution: To formulate the hypotheses H0 and HA, we have to identify: the population, the population parameter in question and the value to which it is being compared. In this case, the population can be the set of the downtown Timisoara inhabitants. The variable X is the carbon monoxide concentration, whose values x vary according to the location, and the population parameter is the mean value 𝜇 of this variable. The ecologist makes an assertion concerning the value of 𝜇This value can be: 𝜇< 4; 9/`106 or 𝜇> 4; 9/`106 or. 𝜇= 4; 9/`106 The ecologist claims that 𝜇> 4; 9/`106 . To formulate H0 and HA, we remind that: 1) generally, H0 states that the mean 𝜇(parameter in question) has a specified value. 2) The inference regarding the mean 𝜇 of the population is based on the mean of a sample, and the sample means are approximatively normally distributed. (according to the central mean theorem). 3) A normal distribution is completely determined if the mean value and the standard deviation of the distribution are known. All these suggest that 𝜇= 4; 9/`106 should be the null hypothesis and 𝜇> 4; 9/`106 the alternative hypothesis: Recall that once the null hypothesis is stated, we proceed with the hypothesis test that H0 is true. So H 0 :   4,9 /106 If we admit that the statement 𝜇= 4; 9/`106 or 𝜇<4; 9/`106 is the null hypothesis H0, then: H0: 𝜇 ≤4; 9/`106 HA: 𝜇>4; 9/`106 The equal sign must always be present in the null hypothesis. Steps of a Hypothesis Test The value of 𝛼 is called level of significance, and it represents the risk (probability) of rejecting H0 when it is actually true. We cannot determine whether H0 is true or false. We can only decide whether to reject it or to accept it. The probability with which we reject the true hypothesis is 𝛼 but we do not know the probability with which we make a wrong decision. A type I error and a decision error are two different things. Lecturer note -38 Testing of hypothesis Continued and Chi square test: Example: It has been claimed that the mean weight of women students at a college is 𝜇= 54; 4 kg, and the standard deviation 𝜎= 5; 4 kg. The sports professor does not believe this statement. To test the claim, he makes a random sample of size 100 among the women students and finds the mean X = 53; 75 kg. Is this sufficient evidence to reject the statement at the significance level 𝛼 = 0; 05? Solution: Statistical inference about the population mean when the standard deviation is not known This section deals with inferences about the mean ¹ when the standard deviation 𝜎 is unknown. If the sample size is sufficiently large (generally talking, samples of size greater than n = 30 are considered sufficiently large), the sample standard deviation s is a good estimate of the standard deviation of the population and we can substitute 𝜎 with s in the already discussed procedure. If the population investigated is approximately normal and n ·≥ 30, we will base our procedure on the Student's t distribution. The Student's t distribution (or simple, the t distribution) is the distribution of the t x  statistic which is defined as: t  s n  The degrees of freedom, df, is a parameter that is difficult to defne. It is an index used to identify the correct distribution to be used. In our considerations df = n - 1, where n is the sample size. The critical value of the test t that we should use either in the estimation of the confidence interval or in the hypothesis test is obtained from the above given table. In order to obtain this value we need to know: 1) df - the degrees of freedom; 2) the 𝛼 area determined by the distribution curved situated to the right of the critical value. We will denote this value t(df; 𝛼). Example: Let us return to the example concerning the air pollution; the ecologist's point of view: "the level of carbon monoxide in the air, is higher than4; 9/`106. Does a sample of 25 readings with the mean X = 5; 1/106 and s = 2; 1/106 present sufficient evidence to sustain the statement? We will use the level of significance 𝛼 = 0; 05. Solution: Example: Solution: Lecturer note -39 Chi square distribution and chi square test for goodness of fit: Since the normal population is very important in statistics, the sampling distributions associated with the normal population are very important. The most important sampling distributions which are associated with the normal population are the followings: the chi-square distribution. Chi-square distribution A continuous random variable X is said to have a chisquare distribution with r degrees of freedom if its probability density function is of the form Example: If X  2 (7) , then what are values of the constants a and b such that P(a < X < b) = 0.95? Solution: Since 0.95 = P(a < X < b) = P(X < b) − P(X < a), we get P(X < b) = 0.95 + P(X < a). We choose a = 1.690, so that P(X < 1.690) = 0.025. From this, we get P(X < b) = 0.95 + 0.025 = 0.975 Thus, from chi-square table, we get b = 16.01. GOODNESS OF FITS TESTS: In point estimation, interval estimation or hypothesis test we always started with a random sample X1,X2, ...,Xn of size n from a known distribution. Goodness of fit tests are performed to validate experimenter opinion about the distribution of the population from where the sample is drawn. The most commonly known and most frequently used goodness of fit tests are the Kolmogorov-Smirnov (KS) test and the Pearson chi-square  2 test. A test for goodness-of fit, that is, how well do the observed counts Xi fit a given distribution. Chi-square goodness-of-fit test This is a test for the fit of the sample proportions to given numbers. Suppose that we have observations that can be classified into each of k groups (categorical data). We would like to test Example: we reject H0 and claim that the earthquake frequency does change during the week. Example: A die was rolled 30 times with the results shown below: Number of spots 1 2 3 4 5 6 Frequency (xi) 1 4 9 9 2 5 If a chi-square goodness of fit test is used to test the hypothesis that the die is fair at a significance level 𝛼 = 0.05, then what is the value of the chi-square statistic and decision reached? Solution: In this problem, the null hypothesis is Ho : p1 = p2 = · · · = p6 =1/6 The alternative hypothesis is that not all pi’s are equal to 1/6. The test will be based on 30 trials, so n = 30. The test statistic Example: It is hypothesized that an experiment results in outcomes K, L, M and N with probabilities 1/5, 3/10 , 1/10 and 2/5 respectively. Forty independent repetitions of the experiment have results as follows: Outcome K L M N Frequency 11 14 5 10 significance level 𝛼 = 0.01, then what is the value of the chi-square statistic and the decision reached? Solution; =3.958<𝝌𝟐 =11.35,accept Lecturer note -40 Co rrelation: In statistics, there occur problems of the following type: for the same population we have two sets of data corresponding to two distinct variables and the question arises whether there is a relationship between those two variables? If the answer is yes, what is that relationship? How are these variables correlated? The relationships discussed here are not necessary of the type cause-and-effect. They are mathematic relationships which predict the behavior of one variable from knowledge about the second variable. Here we have some examples: The students spend their time at the university, learning or taking exams. The question arises whether the more they study, the higher grades they will have. The problems from the example require the analysis of the correlation between two variables. When for a population we have two sets of data corresponding to two distinct variables, we form the pairs (x; y), where x is the value of the first variable and y is the value of the second one. For example, x is the height and y is the weight. An ordered pair (x; y) is called bivariate data. Traditionally, the variable X (having the values x) is called input variable (independent variable), and the variable Y (having the values y) is called output variable (dependent variable). The input variable X is the one measured or controlled to predict the variable Y . In problems that deal with the analysis of the correlation between two variables, the sample data are presented as a scatter diagram. Note: A scatter diagram is the graphical representation of the pairs of data in an orthogonal coordinate system. The values x of the input variable X are represented on the X axis, and the values y of the output variable Y are represented on the y axis. Note: The primary purpose of the correlation analysis is to establish a relationship between the two variables. Note: If for the increasing values x of the input variable X there is no definite displacement of the values y of the variable Y , we then say that there is no correlation or no relationship between X and Y . Note: If for the increasing values x of the input variable X there is a definite displacement of the values y of the variable Y , we then say that there is a correlation. We have a positive correlation if y tends to increase, and we have a negative correlation if y tends to decrease while x increases. Note: If the pairs (x; y) tend to follow a line, we say that we have a linear correlation. If all the pairs (x; y) are on a line (that is not horizontal nor vertical) we say that we have a perfect linear correlation. Coefficient of correlation The coefficient of linear correlation r measures the strength of the linear correlation between the two variables. It reflects the consistency of the effect that a change in one variable has on the other. Note: The value of the coefficient of linear correlation r allows us to formulate an answer to the question: is there a linear correlation between the two considered variables? The coefficient of linear correlation r has a value between -1 and +1. The value r = +1 signifies a perfect positive correlation, and the value r = -1 signifies a perfect negative correlation. Note: The coefficient of linear correlation r for a sample, is by definition: Note: In also another way we find the correlation coefficient as Or Example: Determine the linear correlation coefficient r for a random sample of size 10 if the data table is Solution: Lecturer note -41 Regression and fitting of straight lines: If the value of the linear correlation coefficient r indicates a high linear correlation, there arises the problem of the establishment of an exact numerical relationship. This exact relationship is obtained by linear regression. Generally, the statistician looks for an equation to describe the relationship between the two variables. The chosen equation is the best fitting of the scatter diagram. The equations found are called prediction equations. Here are some examples of such equations: y = b0 + b1 x - linear y = a + b x + c x2 - quadratic The linear regression establishes the mean linear dependency of y in terms of x. Next, we shall describe how to establish the best linear dependency for a set of data (x;y). If a straight-line relationship seems appropriate, the best-fitting straight line is found by using the method of least squares.  Suppose that y = b0 + b1 x is the best linear relationship. The least squares method requires  that b0 and b1 are such that  ( y  y)2 is minimum. From Fermat's theorem we have that the minimum values of the function: F (b0 , b1 )   ( y  b0  b1 x)2 are obtained for Where b1 is the slope and b0 is the y intercept To determine the slope b1 we also use the equivalent formula Example: Solution: Example: For a sample of 10 individuals let us consider following set of data Find the regression line Solution:

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download lecturer noteof jkpa..