Download lecturer noteof jkpa..

Document related concepts

History of statistics wikipedia , lookup

Probability wikipedia , lookup

Statistics wikipedia , lookup

Transcript
LECTURER NOTE
MATH-IV
BSCM1210
NUMERICAL ANALYSICS AND THEORY OF PROBABILITY AND
STATISTICS
J.K.PATI
Lecturer note -1
Introduction, approximation, Round of errors:
All numerical methods involve approximations due to either limits in the
algorithm or physical limits in the computer hardware.
Errors associated with measurements or calculations can be characterized with
reference to accuracy and precision.
Accuracy refers to how closely a computed or measured value agree with the
true value.
Precision refers to how close measured or computed values agree with each
other after repeated sampling.
Floating point numbers:
Computer memory can only store a finite number of digits. Therefore, a question
becomes apparent: given a fixed number of digits how can we define a representation
so that it gives the largest coverage of real numbers? An obvious method to use is the
scientific notation, i.e., a number of very large or very small magnitude is represented as
a truncated number multiplied by an appropriate power of 10. For example, 2.597 − 03
represents
2.597 × 10−3.
General form of floating point number
x=± 0.m×10e where m is called mantissa( or M bits), e is called
exponent(orE bits)
Base 𝜷
A base 𝛽 floating point number consists of a fraction f containing the significant figure
of the number and exponent e. The value of the number is f.𝜷𝒆
This floating point number is said to be normalized if 𝛽 −1 ≤ 1
So Obviously, 2.597× 10−3 is not normalized, while , 0.2597× 10−3 is.
Commonly used bases:
• binary — base 2, used by most of computer systems.
• decimal — base 10, used in most of hand calculators.
• hex — base 16, IBM mainframes and clones.
Approximation of Numbers. In approximation of numbers, a finite number of
digits( or sometimes called bits) after the decimal point are retained.
Let the number is x= 0.1234× 103 is approximated upto three digit floating point
form.
Then the result is x*= 0.123× 103
Mainly we do the approximation in following two ways Rounding and Chopping
Process for rounding off numbers:
The following rules are followed to round off the numbers. Suppose we desire to
retain digits upto kth decimal place of number x and let
X=anan-1----------a0a-1.a-2-------a-ka-(k+1)
(i)
If a-(k+1)<5 then a-k is not changed.
(ii)
If a-(k+1)>5 then a-k is increased by one
(iii) If a-(k+1)=5 , increase a-k by one if a-k is odd and leave if it is even.
In case of chopping no such restrictions are there rather than discard the digits so
long we want to approximate of the mantissa part.
The number 3.42543 after rounding off to 3 decimal places becomes 3.425
Significant digits:
The first nonzero digit and all the digits right to the decimal point of a
number are defined to be significant digits of the number.
For example the significant digits of the number 2.103 are 2,1,0,3 and that of
0.0103 are 1,0( between 1 and3) and 3
Floating point arithmetic:
Consider the addition of the numbers by using 4 digit floating point arithmetic.
X= 0.1234501× 103 and y=0.132045× 103
Using chopping we have x+y=( 0.1234+0.1320) × 103
Similarly we also have difference and multiplication of numbers using floating
point arithmetic.
Lecturer note -2
Floating point arithmetic continued and errors:
Drawbacks of floating point arithmetic
The following problems are usually encountered in K digit arithmetic.
(i)
Loss of accuracy: This is explained through the following example.
𝟏
Consider the addition of the numbers x= and y=1234 using chopping by
𝟑
(ii)
four digit floating point arithmetic.
Now x*=0.3333 and y*= 0.1234× 104
In this case the two numbers in floating point form are having different
exponents. The operand with larger exponent is kept as it is and the
mantissa with smaller exponent is adjusted so as to make the exponent
equal to the larger exponent.
So x*=0.00003333× 104 =0.0000× 104
We conclude that x is effectively zero compared to y which is known as
result of accuracy.
Loss of significance: When two nearly equal numbers are substracted
there is a loss of significant figures.
Algebraic manipulation to avoid the loss of significance:
There are no specific methods for algebraic manipulations to avoid loss of
significance. The type of manipulation required differs from problem to
problem and can’t be predicted before hand.
Example: Solve the quadratic equation and draw your conclusion regarding
loss of significance. x 2  40 x  2  0 using 4 significant digit in the computation
Solution:
X=
Now we have the roots corresponds to this equation as
−𝑏±√𝑏2 −4𝑎𝑐
Now x1=
2𝑎
−𝑏+√𝑏2 −4𝑎𝑐
2𝑎
,
−𝑏−√𝑏2 −4𝑎𝑐
x2=
2𝑎
Now √𝑏 2 − 4𝑎𝑐= √3.98= 19.95, so x1=39.95, x2=0.05
This result is poor since it involves loss of significant digits.
𝟐.𝟎𝟎𝟎
If we compute x2 by using x2=c|ax1 we have x2=
𝟑𝟗.𝟗𝟓
=0.05006
This is a better result over previous value of x2
Errors:
Numerically computed solutions are subject to certain errors. Mainly there
are following types of errors. They are inherent errors, truncation errors and errors due
to rounding.
1. Inherent errors or experimental errors arise due to the assumptions made in the
mathematical modeling of problem. It can also arise when the data is obtained from
certain physical measurements of the parameters of the problem. i.e., errors arising
from measurements.
2. Truncation errors are those errors corresponding to the fact that a finite (or infinite)
sequence of computational steps necessary to produce an exact result is “truncated”
prematurely after a certain number of steps.
3. Round of errors are errors arising from the process of rounding off during
computation. These are also called chopping, i.e. discarding all decimals from some
decimals on.
Simple error definitions: There are a number of ways to describe errors in
measurements and calculations. The simplest is the absolute error, this is the
difference between the measured or calculated value and the true value
i.e
𝜀= |True value- approximation|
so
Approximate value = True value + Error.
A shortcoming of the absolute error is that it doesn’t take into account the order
of magnitude of the value under consideration. One way to account for the
magnitude is to consider instead the relative error.
So the relative error is 𝜺=
|𝑻𝒓𝒖𝒆 𝒗𝒂𝒍𝒖𝒆−𝒂𝒑𝒑𝒓𝒐𝒙𝒊𝒎𝒂𝒕𝒊𝒐𝒏|
|𝑻𝒓𝒖𝒆 𝒗𝒂𝒍𝒖𝒆|
×100
For example, consider the value of √2(1.414213...) up to four decimal places, then
√2 1.4142 Error .
𝜺 = 1.4142 1.41421 = .00001,
taking 1.41421 as true or exact value. Hence, the relative error is
𝟎.𝟎𝟎𝟎𝟎𝟏
𝟏.𝟒𝟏𝟒𝟐
.
Rounding errors originate from the fact that computers can only represent
numbers using a fixed and limited number of significant figures. Thus, numbers
such as 𝜋 or √2 cannot be represented exactly in computer memory. The
discrepancy introduced by this limitation is call round-off error. Even simple
addition can result in round-off error.
Truncation errors in numerical analysis arise when approximations are used to
estimate some quantity.
Often a Taylor series is used to approximate a solution which is then truncated.
The figure below shows a function f(xi) being approximated by a Taylor series that
has been truncated at different levels.
The more terms that are retained in the Taylor series the better the
approximation and the smaller the truncation error.
Taylor series
The following way we approximate to have truncation error.
For the zero order approximation we have
For the first order approximation we have
Similarly for the second order approximation we have
Lecturer note -3
Roots of an equation:
Numerical Iteration Method:
A numerical iteration method or simply iteration method is a mathematical
procedure that generates a sequence of improving approximate solutions for a class of
problems.
A specific way of implementation of an iteration method, including the
termination criteria, is called an algorithm of the iteration method. In the problems of
finding the solution of an equation an iteration method uses an initial guess to generate
successive approximations to the solution.
Since the iteration methods involve repetition of the same process many times,
computers can act well for finding solutions of equation numerically. Some of the
iteration methods for finding solution of equations involves
(1) Bisection method
(2) Method of false position (Regula-falsi Method)
(3) Newton-Raphson method.
(4) Fixed point iteration method.
(5) Muller’s method
A numerical method to solve equations may be a long process in some cases.
If the method leads to value close to the exact solution, then we say that the method is
convergent. Otherwise, the method is said to be divergent.
Solution of Algebraic and Transcendental Equations:
One of the most common problem encountered in engineering analysis is that given a
function f (x), find the values of x for which f(x) = 0.
The solution (values of x) are known as the roots of the equation f(x) = 0, or the zeroes
of the function f (x).
The roots of equations may be real or complex. In general, an equation may have any
number of (real) roots, or no roots at all.
For example, sin x – x = 0 has a single root, namely, x = 0, whereas tan x – x = 0 has
infinite number of roots (x = 0, ± 4.493, ± 7.725, …).
Algebraic and Transcendental Equations:
f(x) = 0 is called an algebraic equation if the corresponding f (x) is a polynomial.
An example is 7x2+ 6x+8=0
f (x) 0 is called transcendental equation if the f (x) contains trigonometric, or
exponential or logarithmic functions.
Examples of transcendental equations are sin x – x = 0, tan x x 0
There are two types of methods available to find the roots of algebraic and
transcendental equations of the form f (x) = 0.
1. Direct Methods: Direct methods give the exact value of the roots in a finite number
of steps. We assume here that there are no round off errors. Direct methods determine
all the roots at the same time.
2. Indirect or Iterative Methods: Indirect or iterative methods are based on the
concept of successive approximations.
The general procedure is to start with one or more initial approximation to the root and
obtain a sequence of iterates xk which in the limit converges to the actual or true
solution to the root. Indirect or iterative methods determine one or two roots at a time.
The indirect or iterative methods are further divided into two categories: bracketing and
open methods.
The bracketing methods require the limits between which the root lies, whereas the
open methods require the initial estimation of the solution.
Bisection and False position methods are two known examples of the bracketing
methods.
Among the open methods, the Newton-Raphson is most commonly used.
The most popular method for solving a non-linear equation is the Newton-Raphson
method and this method has a high rate of convergence to a solution.
Intermediate value theorem for continuous functions:
If f is a continuous function and f (a) and f (b) have opposite signs, then at least one root
lies in between a and b. If the interval (a, b) is small enough, it is likely to contain a single
root.
The interval [a, b] must contain a zero of a continuous function f if the product f (a) f (b)
0.
Geometrically, this means that if f (a) f (b) 0,
then the curve f has to cross the x-axis at some point in between a and b.
Bisection method:
We are looking for a root of a function f(x) which we assume is continuous on the
interval [a, b].
We also assume that it has opposite signs at both edges of the interval,
i.e., f(a)f(b) < 0.
We then know that f(x) has at least one zero in [a, b].
Of course f(x) may have more than one zero in the interval.
The bisection method is only going to converge to one of the zeros of f(x).
There will also be no indication as of how many zeros f(x) has in the interval, and no
hints regarding where can we actually hope to find more roots, if indeed there are
additional roots.
The first step is to divide the interval into two equal subintervals, i.e c=(a+b)|2
.
This generates two subintervals, [a, c] and [c, b], of equal lengths.
We want to keep the subinterval that is guaranteed to contain a root. Of course, in the
rare event where f(c) = 0 we are done.
Otherwise, we check if f(a)f(c) < 0. If yes, we keep the left subinterval [a, c].
If f(a)f(c) > 0, we keep the right subinterval [c, b].
This procedure repeats until the stopping criterion is satisfied: we fix a small parameter
𝜖 > 0 and stop when |f(c)| < 𝜖. Where 𝜖 is the error of accuracy.
Note: How many iterations are needed in order that the interval length is less then 𝜖
Let L0 = b − a. From the construction of the bisection method we see that after k
iterations, the length becomes Lk =(L0)| 2k.
We require Lk ≤ 𝜖
This implies
k≤ 𝐥𝐨𝐠 𝟐
𝒍𝟎
𝜺
𝒍𝟎
We choose k =⌈𝐥𝐨𝐠 𝟐 𝜺 ⌉
where ⌈·⌉ denotes the ceiling function.
Example: If b − a = 1 and 𝝐 =10-6, then k=20
Solved problem:
Example: Solve x3 – 9x+1 = 0 for the root between x = 2 and x = 4, by bisection
method.
Solution:
Given f (x) x3 9x 1 . Now f (2) 9, f (4) 29 so that f (2) f (4) 0 and hence a root lies
between 2 and 4.
Set a = 2 and b= 4. Then x0= (a+b)|2=3
Since f (2) f (3) 0 , a root lies between 2 and 3, hence we set a1 = a= 2 and b1=x0=3
Then the next approximation is x1= (2+3)|2= 2.5
Now f (2.5) 5.875
Since f (2) f (2.5) 0, root lies between 2.5 and 3, hence we set a2 x12.5 and b2
=b1=3.
Continue this process till we get the exact root i.e 2.9375
Example: Find a real root of the equation f (x) x3x 1 0.
Solution:
Since f (1) is negative and f (2) positive, a root lies between 1 and 2 and therefore
we take x0= (3|2)=1.5
Then f (x0) is positive and hence f (1) f (1.5)0 and Hence the root lies between 1 and
1.5
we obtain x1= (1+1.5)|2= 1.25
Now f (x1 ) 19 / 64, which is negative and hence f (1) f (1.25)0 and hence a root lies
between 1.25 and 1.5
The procedure is repeated and the successive approximations are
X3 1.3125, x41.34375, etc.
Merits of bisection method
a) The iteration using bisection method always produces a root, since the method
brackets the root between two values.
b) As iterations are conducted, the length of the interval gets halved. So one can
guarantee the convergence in case of the solution of the equation.
Demerits of bisection method
a) The convergence of the bisection method is slow as it is simply based on
halving the interval.
b) Bisection method cannot be applied over an interval where there is a
discontinuity.
c) Bisection method cannot be applied over an interval where the function takes
always values of the same sign.
d) The method fails to determine complex roots.
Lecturer note -4
Method of false position and introduction to NR method:
Regula Falsi method or Method of False Position
This method is also based on the intermediate value theorem. In this method also, as
in bisection method, we choose two points a and b such that f( a) and f (b) are of
opposite signs i.e f(a). f(b)<0
Then, intermediate value theorem suggests that a zero of f lies in between a and b if f
is a continuous function.
Since f(a). f(b)<0, the curve y=f(x) crosses the x axis only once at the point x=m in
between a and b
Consider the points A(a, f(a)) and B(b,f(b)) on the curve y= f(x). Then the equation of the
chord AB is
y- f(a)= n(x-a), where n= (f(b)-f(a))|(b-a)
At the point C where chord AB crosses x axis, we have for y=0 the above equation
leads to
X=a- (1|n)[f(a)]
where n= (f(b)-f(a))|(b-a)
This gives the x co-ordinate of the approximate root C.
If the interval [a,b] is sufficiently small, the x co-ordinate of the point c is sufficiently
close to the point x=m which is the exact root.
In otherwords x given by the above equation serves as an approximate value of m
when b-a is sufficiently small.
Related problems:
Example:
Find an approximate value of the root of the equation x3+ x-1=0 near x=1 using the
method of false position( regularfalsi) two times.
Solution: f(x)= x3+ x-1=0, f(1)=1, f(0.5)= -0.375
So the root lies between 0.5 and 1
Let x1=0 and x2=1
So x= (x1f(x2)-x2f(x1))| (f(x2)-f(x1)) = (0.5+0.375)|(1.375)= 0.64
Now f(0.64)=-0.0979, f(1)=1
Root lies between 0.64 and 1
Hence x1= 0.64, x2=1
So x= [0.64(1)-1(-0.0979)]|(1+0.0979)= 0.672
Now f(0.672) = -0.0245 and f(1)=1
So x1= 0.672, x2=1
So x= (0.672+0.245)|(1.0245)=0.6822
Hence the approximate root is x=0.68
Note: The bisection and regular-falsi method is always convergent.
Since the method brackets the root, the method is guaranteed to converge.
The main disadvantage is, if it is not possible to bracket the roots, the methods cannot
applicable.
For example, if f (x) is such that it always takes the values with same sign, say, always
positive or always negative, then we cannot work with bisection
method.
Some examples of such functions are
f (x) x3 which take only non-negative values and
f (x) x2, which take only non-positive values.
Newton Raphson method:
The Newton-Raphson method, or Newton Method, is a powerful technique for solving
equations numerically. Like so much of the differential calculus, it is based on the simple
idea of linear approximation.
Consider f (x) 0 , where f has continuous derivative f .
Let at x=a, y=f(a)=0 , which means that a is a solution to the equation f (x) 0 .
In order to find the value of a, we start with any arbitrary point x0
Let the tangent to the curve y=f(x) at the point (x0, f(x0)) with slope f′(x0) touches x axis
at x1
f ( x0 )  f ( x1 )
So tan𝛽= f′(x0)=
x0  x1
f ( x0 )
As f(x1)=0 , the above simplifies to x1  x0 
f ( x0 )
f ( x1 )
f ( x1 )
Proceeding likewise we have the final iteration for n+1 th approximation is
f ( xn )
xn 1  xn 
f ( xn )
In the second step , we compute x2  x1 
Lecturer note -5
NR method continued and Fixed point iteration method:
Geometrical interpretation of NR method:
So Newton raphson method sometimes called the tangent method.
Related problem based on NR method:
Example:
Set up a Newton iteration for computing the square root of a given positive
number. Using the same find the square root of 2 exact to six decimal places.
Solution: Let c be a given positive number and let x be its positive square root, so
that x=√𝑐 . Then x2=c so wehave
f ( x)  x 2  c  0
f ( x)  2 x
Using the Newtons iteration formula we have xn1  xn 
So
xn 1 
x2n  c
2 xn
1
c
( xn  )
2
xn
Now to find the square root of 2 we put c=2 in the above formula.
So
xn 1 
1
2
( xn  )
2
xn
Choose x0=1, we have the other approximations are
X1=1.500000, x2= 1.416667, x3= 1.4142, x4= 1.4142
Hence the square root of 2 correct to six decimal places is 1.414.
Example: Apply Newton’s method to solve the algebraic equation f(x)=x3+x-1=0,
correct to six decimal places start with x0=1.
Solution: Now f(x)=x3+x-1=0, f′(x)=3x2+1
Substituting these in the Newton’s formula we have xn1 
2 xn3  1
n=0,1,2--3xn2  1
Starting from x0= 1.0000, we have x1= 0.7500, x2= 0.686047, x3=0.682340
So we accept 0.6823 as an approximate solution of f(x)=x3+x-1=0 correct to six
decimal places.
Note: Newton’s formula converges provided the initial approximation x0 is
choosen sufficiently close to the exact root.
Proper choice of initial guess is very important for the success of Newton’s
method.
It is applicable to find the solution of both algebraic and transcendental
equation and can also be used when the roots are complex.
Fixed point iteration method:
Consider f(x)=0 be the equation.------------------------(1)
Transform the equation f(x)=0 in the form x=∅(x)--------(2)
Take an arbitrary x0 and then compute a sequence x1, x2, x3,----recursively from a
relation of the form xn+1= =∅(xn), n=0,1,-----------------------(3)
The solution of (2) is called fixed point of ∅. To a given equation (1) there may
correspond several equations (2) and the behavior especially, as regards speed of
convergence of iterative sequences x0, x1------may differ accordingly.
Conditions to find the suitable iterative function ∅(x)
Let x=𝛿 be a root of f(x)=0 and let I be an interval containing the point x= 𝛿.
Let ∅(x) be continuous in I where ∅(𝑥) is defined by the equation x=∅(x) which is
equivalent to f(x)=0. Then if |∅′ (𝑥)|< 1 for all x in I , the sequence of
approximations x0, x1,-----defined by xn+1= =∅(xn), converges to the root 𝛿,
provided that the initial approximation is choosen in I.
Related problems:
Example: Solve f(x)= x2-3x+1=0 by fixed point iteration method.
Solution: Write the given equation as x2=3x-1 or x=3Choose ∅(x)= 3-
𝟏
𝒙
Then ∅′ (𝑥)=
1
𝑥2
𝟏
𝒙
and |∅′ (𝑥)|< 1 on the interval (1,2)
Hence the iteration formula can be applied
The iterative formula is xn+1= 3-
𝟏
𝒙𝒏
n=0,1,2---
Starting with x0=1 we obtain the next approximations are x1=2.00,x2=2.5,x3=2.60
Lecturer note -6
Muller’s method:
Newton’s method uses a local linear approximation to the function f.
In the secant method we start with initial guess. But in Mullers method we use
3 initial guesses and determines the intersection with the axis of a parabola.
Note that this is done by finding the root of an explicit quadratic equation. The
case where the roots are not real is handled as well, though the geometric
interpretation is more complicated. (This is where the analyticity of the function is
important, it makes the value of the function for a complex argument
meaningful).
Muller’s method is based on a quadratic approximation. The steps of the mullers
method are
Step-1: Given 3 points xk-2, xk-1 and xk , find a quadratic function g(x)= a+bx+cx2
Such that g(xi)= f(xi) where i= k-2,k-1,k
Step-2: Solve g(x)=0 for xk+1 that lies nearest xk
How to get the coefficient of the quadractic function :
Given 3 points (xi,f(xi)), i=0,1,2 on the curve y=f(x), we find a quadratic function of
the form g(x)=a(x-x2)2+ b(x-x2)+c which passes through the three points i.e g
satisfies
g(x0)= a(x0-x2)2+ b(x0-x2)+c =f(x0)=f0
g(x1)= a(x1-x2)2+ b(x1-x2)+c =f(x1)=f1
g(x2)=c=f2
Solving this systemof equation we have
To find x3 i.e zero of g, apply the quadractic formula to g. There will be two roots.
The root we are intrested in is the one that is close to x2.
To avoid roundoff errors due to subtraction of two nearly equal numbers we use
x3  x2 
2c
b  b 2  4ac
use the sign that agrees with the discriminant, i.e. the one that gives largest
denominator, and a result closer to x3.
x3  x2 
2c
b  sgn(b) b 2  4ac
Once x3 is determined, Let x0=x1, x1=x2, x2=x3 and repeat the same process till we
get the root upto the desired accuracy.
Related problem:
Example: Use Mullers method solve the equation f(x)=𝑒 𝑥 +1=0 by taking the initial
approximations x0=1, x1=0, x2=-1
Solution: Now the quadratic polynomial is g(x)= a(x+1)2+b(x+1)+c
So the value of a,b,c is
Now x3 is found as the root of g(x) i.e close to x2, that is
So x3= -1.0820+1.5849i
we use the positive sign in front of the square root in the denominator to match the
sign of in order to choose the value of x3  x2 smallest absolute value.
Of course, in this case since the term under the square root is negative, the two
roots have the same absolute value, but, we choose this one even so, since that is
the way the algorithm is defined.
However, this raises the issue of how to pick the sign of the square root when
may not be real. The guiding principal is to always make the choice that picks the
root of the quadratic that is closest to our most recent estimate. At the next
iteration, we get
Note: Speed and rate of convergence of numerical methods:
The numerical method is said to have rateof convergence p if we have the relation
𝒑
𝜺𝒌+𝟏 = c𝜺𝒌 , where
𝜺𝒌+𝟏 is the error present in (k+1)th approximation.
𝜺𝒌 is the error present in k th appoximation. The rate of convergence of Regular
falsi methd is 1.62, and of muller method is 1.84,and of NR method is 2
Lecturer note -7
System of linear equation and LU decompisition:
A system of m linear equations in n unknowns x1, x2, . . . , xn is a set of equations of
the form
a11 x1 + a12 x2 + . . . + a1n x n = b1
a21 x1 + a22 x2 + . . . + a2n x n = b2
..............................
a m1 x1 + a m2 x2 + . . . + a mn x n = bm
where the coefficients ajk and the bj are given numbers. The system is said to be
homogeneous if all the bj are zero; otherwise, it is said to be non-homogeneous.
The system of linear equations is equivalent to the matrix equation (or the single vector
equation)
Ax b
where the coefficient matrix A aij] is the m n matrix and x and b are the column
matrices (vectors) given by:
 a11 a12

A=  a21 a22
a
 m1 am 2
a1n 

a2 n  ,
amn 
 x1 
 b1 
 
 
x2 
b2

and b   
x
 x3 
 b3 
 
 
 bm 
 xn 
A solution of the system is a set of numbers x1, x2, . . . , xn which satisfy all the m
equations, and a solution vector of (1) is a column matrix whose components
constitute a
solution of system. The method of solving such a system using methods like Cramer’s
rule is impracticable for large systems. Hence, we use other methods like Gauss
elimination, Matrix method, LU decomposition etc.
Triangulation Method (LU Decomposition Method):
In linear algebra, LU decomposition (also called LU factorization) factorizes a
matrix as the product of a lower triangular matrix and an upper triangular matrix
Let A be a non-singular square matrix. LU decomposition is a decomposition of the
Form A=LU
where L is a lower triangular matrix and U is an upper triangular matrix. This means that
L has only zeros above the diagonal and U has only zeros below the diagonal
Let we have a system of linear equations for three variables x1, x2,x3 in the above form.
So A will be a matrix of order 3
To solve the system of equations by LU decomposition, first we decompose A as LU
1 0

Where L   l21 1
l
 31 l32
0
 u11 u12 u13 



0  and U   0 u22 u23 
0
1 
0 u33 

This gives, LUx = b.
Let Ux=y. This implies, Ly=b.
 1 0 0  y1   b1 

   
That is,
 l21 1 0  y2    b2 
l
   
 31 l32 1  y3   b3 
Hence y1=b1, l21 y1+y2=b2, l31 y1+ l32 y2+ y3=b3
This gives the y values by forward substitution, which means, substitute the value
of y1 given by the first equation in the second and solve y2 , then use these values of y1
and y2 in third equation to get y3
Then the system of equations Ux=y implies that
 u11 u12

 0 u22
0
0

u13  x1   y1 
   
u23  x2    y2 
  
u33 
 x3   y3 
It gives the required values of x1, x2, x3 as the solution of the original system of linear
equations by backward substitution.
LU factorization of a matrix:
 a11 a12

Let A is the matrix of order 3, where A=  a21 a22
a
 31 a32
1 0
Let A= LU where L   l21 1
l
 31 l32
a13 

a23 
a33 
0
 u11 u12 u13 



0  and U   0 u22 u23 
0
1 
0 u33 

By simple matrix multiplication and equating the corresponding terms of matrix A
we get the value of entries of L as well as U.
Lecturer note -8
LU problem continued and inverse of a matrix:
LU decomposition of a matrix needn’t be unique.
It is not necessary to take the diagonal elements of L or U is 1.
Related problems:
Example: Solve the following system of equations by LU decomposition.
2x+3y+z=9
x+2y+3z=6
3x+y+2z=8.
Solution: The above system of equations is written as,
 2 3 1  x   9 

   
 1 2 3  y    6 
 3 1 2  z   8 

   
2 3 1  1 0
Let A=LU as  1 2 3  =  l21 1
 3 1 2 l

  31 l32
0

0
1 
 u11 u12 u13 


 0 u22 u23 
0
0 u33 

we equate the corresponding terms of A and LU , we obtain
u11=2, u12=3, u13=1
l21 
a
a21 1
3 1
1
5
3
 , l31  31  , u22  a22  l21u12  2   , u23  a23  l21u13  3  1 
2 2
2
2
u11 2
u11 2
l32 
a32  l31u12
 7 , u33  a33  (l31u13  l32u23 )  18
u22
So

1
2 3 1 

 1
1 2 3 =  2
 3 1 2 


3

2

0 2 3 1 


1 5
1 0 0

2 2
  0 0 18 

7 1  

0

1

1
2
3

2

0 2 3 1 


1 5
1 0 0

2 2
  0 0 18 

7 1  

0
 x  9
   
 y    6
 z  8
   
Consider
2

0

0

1
  x   y1 
1 5    
y  y2
2 2     
z   y3 
0 18 


1

1
Then 
2
3

2
3
,

0
  y1   9 
   
1 0   y2    6 
   
  y3   8 
7 1 

0
Solving these we get y1=9, y2=3|2, y3=5
2

Again  0

0

1
9
 x   
1 5    3 
y 
2 2     2 
z
0 18     5 
3
Now, solving the above expression we obtain the values of x, y and z as a solution
of the given system of equations as,
 35 
 
 x   18 
   29 
 y    18 
z  
 
 5 
 18 
To find inverse of a matrix by using LU decomposition:
As per the decomposition we have A=LU. One thing we keep in mind that inverse
of an uppertriangular matrix is uppertriangular and also for lower triangular
matrix is lower triangular . Using this concept we can find L-1, U-1
Then
A-1= U-1 L-1
Lecturer note -9
Solution of System of Equations by process of iteration:
The methods discussed in the previous section belong to the direct methods for
solving systems of linear equations; these are methods that yield solutions after an
amount of computations that can be specified in advance.
In this section, we discuss indirect or iterative methods in which we start from an
initial value and obtain better and better approximations from a computational cycle
repeated as often as may be necessary, for achieving a required accuracy, so that the
amount of arithmetic depends upon the accuracy required.
Gauss Seidel iteration method:
Consider a linear system of n linear equations in n unknowns x1, x2,------,xn of the form
a11x1+ a12x2+--------------------+a1nxn=b1
a21x1+ a22x2+--------------------+a2nxn=b2
an1x1+an2x2+-----------------------+annxn=bn
in which the diagonal elements aii do not vanish.
A sufficient condition for obtaining a solution by this method is the diagonal
n
dominance,i.e | aii |  aii ,i  1, 2,   , n , j≠I
j 1
i.e., in each row of A the modulus of the diagonal element exceeds the sum of the off
diagonal elements and also the diagonal elemens aii≠0. If any diagonal element is 0, the
equations can always be re-arranged to satisfy this condition.
The above system can be written as
a
a
b a
x1  1  12 x2  13 x3       1n xn
a11 a11
a11
a11
a
a
b
a
x2  2  21 x1  23 x3       2 n xn
a22 a22
a22
a22
xn 
a
bn an1
a

x1  n 2 x2       n,n1 xn1
ann ann
ann
ann
Suppose we start with x1(0), x2(0),---------xn(0) as initial values to the variables x1, x2,---xn.
Step-1 Now the next approximations are
a
b1 a12 (0) a13 (0)

x2 
x3       1n xn (0)
a11 a11
a11
a11
a
a
b
a
Similarly x2(1)  2  21 x1(1)  23 x3(0)       2 n xn (0)
a22 a22
a22
a22
a
b
a
a
So
xn (1)  n  n1 x1(1)  n 2 x2(1)       n,n1 xn1(1)
ann ann
ann
ann
The process is repeated in this manner upto number of steps as mentioned.
x1(1) 
Related problems:
Example: Using Gauss Siedel iteration solve the following system of equations, in three
steps starting from 1, 1, 1.
10x y z 6
x 10y z 6
x y 10z 6
Solution: Now the system is diagonally dominant and we have
x 0.6 0.1 y 0.1z
y= 0.6- 0.1x- 0.1z
z= 0.6-0.1x-0.1y
Starting with initial approximation
x(0)=1, y(0)=1, z(0)=1 we have the next approximations are
Step-1 x(1)  0.6  0.1y(0)  0.1z (0)  0.4
y (1)  0.6  0.1x(1)  0.1z (0)  0.46
z (1)  0.6  0.1x(1)  0.1y (1)  0.514
Step-2 Using x(1)=0.4, y(1)=0.46, z(1)=0.514 we have
x(2)  0.6  0.1y (1)  0.1z (1)  0.5026
y (2)  0.6  0.1x(2)  0.1z (1)  0.49834
z (2)  0.6  0.1x(2)  0.1y (2)  0.499906
Similarly step -3 is obtained by using the same process and result in step-2
Lecturer note -10
Interpolation and finite differences operators:
Consider a single valued continuous function y f (x) defined over [a,b] where
f (x) is known explicitly. It is easy to find the values of ‘y’ for a given set of values of ‘x’ in
[a,b]. i.e., it is possible to get information of all the points (x, y) where a x b.
But the converse is not so easy. That is, using only the points (x0,y0), (x1,y1),------(xn,yn)
where a xi b i n ,
it is not so easy to find the relation between x and y in the form y f (x) explicitly.
That is one of the problem we face in numerical differentiation or integration.
Now we have first to find a simpler function, say g(x) such that f (x) and g(x) agree at the
given set of points and accept the value of g(x) as the required value of f (x) at some
point x in between a and b.
Such a process is called interpolation. If g(x) is a polynomial, then the process is called
polynomial interpolation.
When a function f(x) is not given explicitly and only values of f (x) are given at a
set of distinct points called nodes or tabular points, using the interpolated function g(x) to
the function f(x), the required operations intended for f (x) , like determination of roots,
differentiation and integration etc. can be carried out.
The approximating polynomial g(x) can be used to predict the value of f (x) at a nontabular point.
The deviation of g(x) from f (x) , that is f (x) g(x) is called the error of approximation.
Consider a continuous single valued function f (x) defined on an interval [a, b].
Given the values of the function for n + 1 distinct tabular points x0,x1,x2,-------xn such that
a≤x0≤x1≤----------≤xn≤b.
.
The problem of polynomial interpolation is to find a polynomial g(x) or pn(x)
of degree n, which fits the given data.
The interpolation polynomial fitted to a given data is unique.
If we are given two points satisfying the function such as (x0,y0), (x1,y1) where
y0=f(x0)and y1=f(x1), it is possible to fit a unique polynomial of degree 1.
If three distinct points are given, a polynomial of degree not greater than two can be
fitted uniquely.
In general, if n+ 1 distinct points are given, a polynomial of degree not greater than n
can be fitted uniquely.
FINITE DIFFERENCES OPERATORS:
For a function y=f(x), it is given that y0, y1,------,yn are the values of the variable y
corresponding to the equidistant arguments x0, x1,x2,-----,xn where x1=x0+h, x2=x0+2h, ---xn=x0+nh
In this case, even though interpolation polynomials can be used for interpolation,
some simpler interpolation formulas can be derived. For this, we have to be familiar
with some finite difference operators and finite differences,
.
Finite differences deal with the changes that take place in the value of a function f(x)
due to finite changes in x.
Finite difference operators include, forward difference operator, backward difference
operator, shift operator, central difference operator and mean operator.
Forward difference operator ( ) :
For the values y0, y1,------,yn of a function y=f(x), for the equidistant values x0, x1,x2,----,xn where x1=x0+h, x2=x0+2h, ----xn=x0+nh.
The forward difference operator with table is defined by
f ( xi )  f ( xi  h)  f ( xi )  f ( xi 1 )  f ( xi )
yi  yi 1  yi
So ∆y0, ∆y1,-------∆yn are known as first order forward differences.
The second order forward difference is given by
 2 f ( xi )  [ f ( xi  h)  f ( xi )]
 f ( xi  2h)  2 f ( xi  h)  f ( xi )
 yi  2  2 yi 1  yi
In general the forward difference of nth order is given by
 n f ( xi )   n 1 f ( xi  h)   n 1 f ( xi )
Example: Construct the forward difference table for the data
X: -2 0 2
4
Y: 4 9 17 22
Solution: the table is given by
∆
x
Y=f(x)
2
-2
4
∆y0=5
0
9
∆2 y0=3
∆y1=8
2
17
∆2 y1=-3
∆y2=5
4
22
3
∆3 y0=-6
Properties of forward difference operator:
(1) Forward difference of a constant function is zero.
(2) ∆(f(x)+g(x))=∆f(x)+ ∆g(x)
(3) ∆(f(x).g(x) = f(x+h) ∆g(x) + g(x) ∆f(x)
(4) (
f ( x)
g ( x)f ( x)  f ( x)g ( x)
)
g ( x)
g ( x  h) g ( x )
Backward Difference Operator: For the values y0, y1,y2,--------yn of a function y=f(x) for
the equidistant values x0, x1,-------,xn where x1=x0+h, x2=x0+2h, ----xn=x0+nh.
The backward difference operator ∇ is defined on the function f as
f ( xi )  f ( xi )  f ( xi  h)  yi  yi 1
Which is the first backward difference.
The second backward difference is 2 f ( xi )  yi  2 yi 1  yi 2
Similarly the third backward difference is 3 f ( xi )  yi  3 yi 1  3 yi 2  yi 3
The following table is the backward difference table
Example: construct the backward difference table for the following data
X: -2 0 2 4
Y: -8 3 1 12
Solution:
x
-2
Y=f(x)
-8
∆
2
3
∇y1=11
0
∇2 y2=-13
3
∇3 y3=26
∇y2=-2
2
∇2 y3=13
1
∇y3=11
4
12
Lecturer note -11
Shift operator, Central operator, average operator:
Shift operator, E
Let y = f (x) be a function of x, and let x takes the consecutive values x, x + h, x + 2h,
etc.
We then define an operator E, called the shift operator having the property
E f(x) = f (x + h)
Thus, when E operates on f (x), the result is the next value of the function. If we apply
the operator twice on f (x), we get
E2f(x)= E [E f (x)] = f (x+ 2h).
Thus, in general, if we apply the shift operator n times on f (x), we arrive at
En f (x) = f (x+ nh) for all real values of n.
The inverse operator E-1 is defined as E-1f(x)= f(x-h)
Average Operator:
The average operator 𝝁 is defined as
𝟏
𝒉
𝒉
𝝁f(x)= 𝟐[f(x+𝟐)+f(x-𝟐)]
Central Differences:
Central difference operator 𝛿 for a function is defined as
𝒉
𝒉
𝛿f(x)= [f(x+𝟐)-f(x-𝟐)] where h being the interval of differencing
h
y 1  f ( x0  ) . Then
2
2
h
h h
h h
 y 1   f ( x0  )  f ( x0   )  f ( x0   )
2
2 2
2 2
2
Let
 f ( x0  h)  f ( x0 )  f ( x1 )  f ( x0 )  y1  y0
Central difference table:
Relation between all operators:
(1) ∆ =E-1
f ( x)  f ( x  h)  f ( x)  Ef ( x)  f ( x)
 ( E  1) f ( x)
So ∆ =E-1
(2) ∇= 1-E-1
f ( x)  f ( x)  f ( x  h)
 f ( x)  E 1 f ( x)  (1  E 1 ) f ( x)
So ∇= 1-E-1
1
1
(3)   E 2  E 2
h
2
h
2
 f ( x)  f ( x  )  f ( x  )
1
1
1
1
 E 2 f ( x)  E 2 f ( x)  ( E 2  E 2 ) f ( x)
So the proof follows
(4) 1     (1 
2
2
2
2
)2
From the definition of operator we have
So
1
2
  ( E  E 1 )
1
1   2 2  1  ( E 2  2  E 2 )
4
1
 ( E  E 1 ) 2
4
1
1 1
 1  ( E 2  E 2 )2
2
2
1
 ( E  E 1 )
2
1
Now we have
2
So the proof follows.
Differences of a Polynomial
Let us consider the polynomial of degree n in the form
f ( x)  a0 x n  a1 x n 1  a2 x n 2      an
Where a0, a1,a2,------,an are constants with a0≠0
h is the interval of difference
Then
 n f ( x)  a0 n(n  1)(n  2)(n  3)    (2)(1)h n
 a0 (n !)hn  const
Since ∆𝑛 f(x) is constant, so ∆𝑛+1 f(x) is zero
Hence the (n+ 1)th and higher order differences of a polynomial of degree n are 0
Conversely, if the n th differences of a tabulated function are constant and the (n 1)th ,
(n 2)th,..., differences all vanish, then the tabulated function represents a polynomial of
degree n.
It should be noted that these results hold good only if the values of x are equally
spaced.
The converse is important in numerical analysis since it enables us to approximate
a function by a polynomial if its differences of some order become nearly constant.
Lecturer note -12
Different interpolation formulaes:
Linear interpolation
In linear interpolation, we are given with two pivotal values f 0= f(x0) and f 1= f(x1)
we approximate the curve of f by a chord (straight line) P1 passing through the points
(x0,f0), (x1,f1). Hence the approximate value of f at the intermediate point x=x0+rh
is given by the linear interpolation formula i.e
f ( x)  P1 ( x)  f 0  r ( f1  f 0 )  f 0  r f 0
x  x0
Where
r
,0  r 1
h
Example Evaluate ln 9.2 , given that ln 9.0 2.197 and ln 9.5 2.251
Solution:
Here x0=9.0, x1=9.5 , h=x1-x0=0.5, f0=ln(9.0)= 2.197,
f1=ln9.5=2.251
Now to calculate ln(9.2)=f(9.2)
Take x=9.2 so that r 
x  x0
, 0  r  1 , we have r=0.4, So
h
ln 9.2  f (9.2)  P1 (9.2)
 f 0  r ( f1  f 0 )
 2.197  0.4(2.251  2.197)  2.219
Quadratic Interpolation
In quadratic interpolation we are given with three pivotal values f 0=f(x0), f1=f(x1), f2=f(x2)
and we approximate the curve of the function f between x0 and x2 = x0 +2h by the
quadratic parabola which passes through the points (x0,f0), (x1,f1), (x2,f2).
The quadratic interpolation formula becomes
r (r  1) 2
f ( x)  P2 ( x)  f 0  r f 0 
 f0
2
x  x0
r
,0  r 1
Where
h
Newton’s Forward Difference Interpolation Formula
Using Newton’s forward difference interpolation formula we find the n degree
polynomial Pn which approximates the function f(x) in such a way that Pn and f agrees at
n+1 equally spaced x values, so that pn ( x0 )  f0 , pn ( x1 )  f1 ,   , pn ( xn )  f n
The Newton’s forward difference interpolation formula is
r (r  1) 2
r (r  1)  (r  n  1) n
 f0    
 f0
2!
n!
x  x0
Where
x=x0+rh, r 
,0  r  n
h
Derivation of Newton’s forward Formulae for Interpolation:
Let we have n+1 set of tabular points where f is defined.
So degree of the interpolating polynomial is ≤n
f ( x)  pn ( x)  f 0  r f 0 
Let pn(x) is the polynomial of nth degree agree at the tabular points and the values of x
be equidistant
Let pn(x)= a0  a1 ( x  x0 )  a2 ( x  x0 )( x  x1 )    an ( x  x0 )( x  x1 )    ( x  xn1 )
Imposing now the condition that f (x) and pn(x) should agree at the set of tabulated
points, we obtain
f1  f 0 f 0
n f0
a0  f 0 , a1 

,   , an 
x1  x0
h
n !h n
Setting x=x0+rh and substituting for a0, a1,-----,an we obtain the expression.
Note: Newton’s forward difference interpolation formula is useful for interpolation near
the beginning of a set of tabular values and for extrapolating values of y a short
distance backward, that is left from y0 .The process of finding the value of y for some
value of x outside the given range is called extrapolation.
Related problem:
Example: Using Newton’s forward difference interpolation formula and the following
table evaluate f(15) .
x
y
∆𝑓
∆4 f
∆2 f
∆3 f
10
46
20
66
30
81
40
93
20
-5
15
2
-3
12
-3
-1
-4
8
50
101
𝐱−𝐱
Solution: Here x0=10, x1=20, x=15, h=10, r = 𝐡 𝟎 =0.5
Now f0=46, ∆f0=20, ∆2 f0=-5, ∆3 f0=2, ∆4 f0=-3
Substituting these values in the Newton’s forward difference interpolation formula for
n=4, we obtain
r (r  1) 2
 f0
2!
r (r  1)(r  2) 3
r (r  1)(r  2)(r  3) 4

 f0 
 f0
3!
4!
f ( x)  P4 ( x)  f 0  r f 0 
Putting the value of r we obtain f(15)=56.8672
Example: Using the Newton’s forward difference interpolation formula evaluate f(2.05)
where f(x)=√𝑥, using these values.
x:
2.0
2.1
2.2
2.3
f(x)=√𝑥 1.414
1.449
1.483
Solution: The forward difference table is
x
Y=√𝒙
2.0
1.414214
∆𝑓
2.4
1.516
∆2 f
1.549
∆3 f
∆4 f
0.034924
2.1
1.449138
-0.000822
0.034102
2.2
1.483240
2.3
1.516575
2.4
1.549193
0.000055
-0.000767
0.033335
-0.000005
0.000050
-0.000717
0.032618
𝐱−𝐱
Here r = 𝐡 𝟎 =0.5, x=2.05,x0=2.0, x1=2.1
so by substituting the values in Newton’s formula we obtain
r (r  1) 2
f (2.05)  P4 (2.05)  f 0  r f 0 
 f0
2!
r (r  1)(r  2) 3
r (r  1)(r  2)(r  3) 4

 f0 
 f0
3!
4!
So f(2.05)=1.431783
Example: Find the missing term in the following table:
X: 0 1 2 3 4
Y: 1 3 9 -- 81
Solution: Since four points are given, the given data can be approximated by a third
degree polynomial in x.
Hence ∆4 f0=0, Substituting E 1 we get (E-1)4f0=0, This implies E4f0-4E3f0+6E2f04Ef0+f0=0
Since Erf0=fr, so we obtain f4-4f3+6f2-4f1+f0=0
Substituting the value of f0,f1,f2,f4 we obtain f3=31
Lecturer note -13
Newton’s Backward difference interpolating formula:
Newton’s backward difference interpolation formula is
f ( x)  Pn ( x)  f n  rf n 

Where
r (r  1) 2
 fn     
2!
r (r  1)  (r  n  1) n
 fn
n!
x  xn  rh, r 
x  xn
, n  r  0
h
Derivation of Newton’s Backward Formulae for Interpolation:
Given the set of n+1 values i.e (x0,f0), (x1,f1),------ (xn,fn) of x and f, it is required to find
pn(x) a polynomial of the nth degree such that f (x) and pn(x) agree at the tabulated
points.
Let the values of x be equidistant, i.e., let xi=x0+rh, r=0,1,2,---n
Let pn(x) is the polynomial of nth degree agree at the tabular points and the values of x
be equidistant.
Let
pn ( x)  a0  a1 ( x  xn )  a2 ( x  xn )( x  xn 1 )   
 an ( x  xn )( x  xn 1 )  ( x  x1 )
Putting x=xn we have pn(xn) = fn, so a0=fn
Similarly Imposing the condition that f (x) and pn(x) should agree at the set of
tabulated points
we obtain (after some simplification) the above formula.
Note: (1) Since
∆𝒏 yi=𝛁nyi+n that means a forward difference table may be
derived from backward difference table and viceversa. So result will be almost same if
we approximate by Forward or backward interpolation formula.
(2) The backward difference interpolation formula is commonly used for
interpolation near the end of a set of tabular values and for extrapolating values of y a
short distance forward that is right from yn.
Related problem:
For the following table of values, estimate f(7.5), using Newton’s
backward difference interpolation formula.
Example:
x
Y=f(x)
1
1
∇𝑓
∇2 f
∇3 f
∇4 f
7
2
8
12
19
3
27
6
18
37
4
64
24
61
5
125
30
216
343
8
512
0
6
36
127
7
0
6
91
6
0
6
0
6
42
169
Since the fourth and higher order differences are 0, the Newton’s
backward interpolation formula is
Solution:
f ( xn  rh)  yn  ryn 
Now
r
We have
r (r  1) 2
r (r  1)(r  2)  (r  n  1) n
 yn 
 yn
2!
n!
x  xn
and r=7.5-8.0=-0.5
h
yn  169,  2 yn  42, 3 yn  6,  4 yn  0
0.5(0.5  1)
0.5(0.5  1)(0.5  2)
42 
6 =421.875
2!
3!
INTERPOLATION - Arbitrarily Spaced x values
In the previous sections we have discussed interpolations when the x-values are
equally spaced.
These interpolation formulae cannot be used when the x-values are not equally
spaced.
In the following sections, we consider formulae that can be used even if the x-values
are not equally spaced.
f (7.5)  512  ( 0.5)169 
Newton’s Divided Difference Interpolation Formula
If x0, x1, . . . , xn are arbitrarily spaced (i.e. if the difference between x0 and x1, x1 and x2
etc. may not be equal),
Then the polynomial of degree n through (x0,f0), (x1,f1),---(xn,fn) is given by the
Newton’s divided difference interpolation formula (also known as Newton’s general
interpolation formula) given by
f ( x)  f0  ( x  x0 ) f [ x0 , x1 ]  ( x  x0 )( x  x1 ) f [ x0 , x1 , x2 ]      ( x  x0 )( x  x1 )  ( x  xn1 ) f [ x0 , x1, x2 , , xn ]
With the remainder term is given by ( x  x0 )( x  x1 )  ( x  xn ) f [ x, x0 , x1, x2 , , xn ]
Where
f ( x1 )  f ( x0 )
f [ x , x ]  f [ x0 , x1 ]
and f [ x0 , x1 , x2 ]  1 2
x1  x0
x1  x0
f [ x1 ,   , xn ]  f [ x0 , , xn 1 ]
f [ x0 , x1 ,   xn ] 
xn  x0
f [ x0 , x1 ] 
Note: If x0, x1,-----,xn are equispaced i.e when xk=x0+kh then f[x0,x1,----,xk]=
∆𝒌
𝒌!.𝒉𝒌
f0
and Newton’s divided difference interpolation formula takes the form of Newton’s
forward difference interpolation formula.
Properties of divided difference:
1.The divided differences are symmetrical about their arguments.
f ( x1 )  f ( x0 ) f ( x0 )  f ( x1 )
That is
f [ x0 , x1 ] 

 f [ x1 , x0 ]
x1  x0
x0  x1
So the order of the arguments has no importance
When we are considering the nth divided difference also, we can write
f (x )
f ( x0 )
f ( x1 )
n
f [ x0 , x1 ,   , x ] 


n ( x0  x1 )( x0  x2 )  ( x0  x ) ( x1  x0 )( x1  x2 )  ( x1  x )
( x  x0 )( x  x1 )  ( x  x
)
n
n
n
n
n n 1
From this expression it is clear that, whatever be the order of the arguments,
the expression is same.
Hence the divided differences are symmetrical about their arguments.
2. Divided difference operator is linear.
For example, consider two polynomials f xand g(x) .
Let h(x) af xb g(x) ,
where ‘a’ and ‘b’ are any two real constants. The first divided difference of h(x)
corresponding to the arguments x0 and x1 is h[x0,x1]
We find h[x0,x1]=af[x0,x1]+bg[x0,x1]
3. The nth divided difference of a polynomial of degree n is its leading coefficient.
Now we consider a general polynomial of degree n as, g(x)=a0xn+a1xn-1+-- +an
Since the divided difference operator is linear, we get nth divided difference of
g xas a0 which is the leading coefficient of g x.
Lecturer note -14
Problem continued and Langrange’s interpolating formula:
Related problem on Divided difference formula:
Example: Use the following data find the Newton’s divided difference
interpolating polynomial.
X:
-1
0
3
6
7
Y:
3
-6
39 822 1611
Solution: The divided difference table is
x
y
-1
3
0
-6
3
39
6
822
1st
1611
Third
Fourth divided
differ
f [ x0 , x1 ] =-9
f [ x0 , x1 , x2 ]
f [ x1 , x2 ] =15
=6
f [ x1 , x2 , x3 ]
f [ x2 , x3 ]
=261
7
2nd
f [ x3 , x4 ]
=41
f [ x2 , x3 , x4 ]
f [ x0 , x1 , x2 , x3 ]
=5
f [ x1 , x2 , x3 , x4 ]
f [ x0 , x1 , x2 , x3 , x4 ]
=1
=13
=132
=789
So the required polynomial is
f(x)=3+(x+1) f [ x0 , x1 ] +(x+1)x f [ x0 , x1 , x2 ] + (x+1)x(x-3) f [ x0 , x1 , x2 , x3 ] +(x+1)x(x-3)(x-6)
f [ x0 , x1 , x2 , x3 , x4 ] =
`x4-3x3+ 5X2-6
Example: Obtain Newton’s divided difference interpolating polynomial satisfied by
4,1245, 1,33,0,5,2,9and 5,1335.
Solution:
Newton’s divided difference interpolating polynomial is given by,
f ( x)  f ( x0 )  ( x  x0 ) f [ x0 , x1 ]      ( x  x0 )( x  x1 )    ( x  xn1 ) f [ x0 , x1,   , xn ]
Here x values are gives as, -4, -1, 0, 2 and 9. Corresponding f(x) values are
1245 ,33, 5, 9 and 1335.
Hence the divided difference as shown in the following table:
x
y
-4
1245
-1
33
0
5
2
9
5
1335
1st
2nd
Third
Fourth divided
differ
f [ x0 , x1 ] =404
f [ x0 , x1 , x2 ]
=94
f [ x1 , x2 ] =-28
f [ x1 , x2 , x3 ]
=10
f [ x2 , x3 ] =2
f [ x3 , x4 ]
f [ x2 , x3 , x4 ]
f [ x0 , x1 , x2 , x3 ]
=-14
f [ x1 , x2 , x3 , x4 ]
f [ x0 , x1 , x2 , x3 , x4 ]
=3
=13
=88
=442
Hence the interpolating polynomial is
f ( x)  1245  404( x  4)  94( x  4)( x  1)  14( x  4)( x  1) x
3( x  4)( x  1)( x)( x  2)
=3x4-5x3+6x2-14x+5
Example: Newton’s Interpolation formula with divided differences
Solution:
and x0 is
Consider two arguments x and x0. The first divided difference between x
f ( x, x0 ) 
f ( x)  f ( x0 )
x  x0
 f ( x)  f ( x0 )  ( x  x0 ) f [ x, x0 ]
Consider x,x0 and x1, wehave
f ( x, x0 , x1 ) 
f ( x0 , x1 )  f ( x, x0 )
x1  x
 f ( x, x0 )  f ( x0 )  ( x  x0 )[ f [ x0 , x1 ]  ( x  x1 ) f [ x, x0 , x1 ]]
So f ( x)  f ( x0 )  ( x  x0 ) f [ x0 , x1]  ( x  x0 )( x  x1) f [ x, x0 , x1]
Proceeding in this way we obtain
f ( x)  f ( x0 )  ( x  x0 ) f [ x0 , x1 ]  ( x  x0 )( x  x1 ) f [ x0 , x1, x2 ]      ( x  x0 )( x  x1 )  ( x  xn1 ) f [ x0 , x1,   xn ]
Since f(x) is a polynomial of degree n, so f[x,x0,x1,---,xn]=0
Lagrangian Interpolation
Another method of interpolation in the case of arbitrarily spaced pivotal values x0, x1, . . .
, xn is Lagrangian interpolation.
This method is based on Lagrange’s n+1 point interpolation formula given by
n
lk ( x)
fk
K 0 lk ( xk )
f ( x)  Ln ( x)  
Where l0(x)=(x-x1)(x-x2)----(x-xn) and ln(x)= )=(x-x0)(x-x1)----(x-xn-1)
Derivation of the formula:
Given the set of (n 1) points, (x0,f0), (x1,f1),---------(xn,fn) of x and f(x)
it is required to fit the unique polynomial pn(x) of maximum degree n, such that f
and pn(x) agree at the given set of points. The values x0,x1,--,xn may not be
equidistant.
Since the interpolating polynomial must use all the ordinates f(x0), f(x1),---f(xn) , it
can be written as a linear combination of these ordinates. That is, we can write the
polynomial as
pn ( x)  l0 ( x) f ( x0 )  l1 ( x) f ( x1 )    ln ( x) f ( xn )
At x=x0 as f(x) and pn(x) coincides we get
f ( x0 )  pn ( x0 )  l0 ( x0 ) f ( x0 )  l1 ( x0 ) f ( x1 )    ln ( x0 ) f ( xn )
This equation is satisfied only when
l0 ( x0 )  1, li ( x0 )  0, i  0
At a general point x=xi we get
f ( xi )  pn ( xi )  l0 ( xi ) f ( x0 )      ln ( xi ) f ( xn )
li ( xi )  1, l j ( xi )  0, i  j
Since li(x)=0 at x=x0,x1,-----,xn so (x-x0)-------(x-xn) are the factors of li(x)
The product of these factors is a polynomial of degree n.
Therefore, we can write li ( x)  c( x  x0 )( x  x1 )    ( x  xn ) where c is constant
Lecturer note -15
Langrange’s interpolating formula derivation and problems:
Now since li(xi)=1, we get 1  c( xi  x0 )( xi  x1 )    ( xi  xn )
1
( xi  x0 )( xi  x1 )    ( xi  xn )
c
Hence
li ( x) 
So
( x  x0 )( x  x1 )      ( x  xn )
( xi  x0 )( xi  x1 )    ( xi  xn )
Now the polynomial
pn ( x)  l0 ( x) f ( x0 )  l1 ( x) f ( x1 )    ln ( x) f ( xn )
With li(x) as defined above is called Lagrange interpolating
polynomial and li(x) are called Lagrange fundamental polynomials.
Related problems:
Example: Given f(2) = 9, and f(6) = 17. Find an approximate value for f(5) by the
method of Lagrange’s interpolation.
Solution: For the given two points (2,9) and (6,17), the Lagrangian polynomial of
degree 1 is p(x)=l0(x) f(x0)+l1(x)f(x1)
Where
l0 ( x) 
( x  x0 )
( x  x1 )
, l1 ( x) 
( x0  x1 )
( x1  x0 )
So the required polynomial is p( x)  f ( x) 
( x  6)
( x  2)
9
17
(2  6)
(6  2)
Hence f(5)=15
Example: Use Lagrange’s formula, to find the quadratic polynomial that takes the
values
X: 0
Y: 0
1
1
3
0
Solution: For the given three points (0,0) , (1,1) and (3,0), the quadratic
polynomial by Lagrange’s interpolation is
P(x)=l0(x)f(x0)+l1(x)f(x1)+l2(x)f(x2)
Where
l0 ( x) 
( x  x0 )( x  x2 )
( x  x0 )( x  x1 )
( x  x1 )( x  x2 )
, l1 ( x) 
, l2 ( x ) 
( x0  x1 )( x0  x2 )
( x1  x0 )( x1  x2 )
( x2  x0 )( x2  x1 )
We are considering the given x values 0,1, and 3 as x0,x1,x2. Now f(x0) and
f(x2)=0, f(x1)=1
So the required polynomial is
( x  0)( x  3)
3x  x 2
P(x)= l1(x) f(x1)=
1
(1  0)(1  3)
2
Example: Find
ln 9.2 with n 3 , using Lagrange’s interpolation formula with the given
table:
x:
9.0
9.5
y=lnx
2.197
2.251
10.0
2.302
11.0
2.397
3
Solution:
lk (9.2)
fk
K 0 lk ( xk )
ln(9.2)  f (9.2)  L3 (9.20)  
(9.2  9.5)(9.2  10.0)(9.2  11.0)
(9.2  9.0)(9.2  10.0)(9.2  11.0)
2.19722 
2.25129
(9.0  9.5)(9.0  10.0)(9.0  11)
(9.5  9.0)(9.5  10.0)(9.5  11)
(9.2  9.0)(9.2  9.5)(9.2  11.0)
(9.2  9.0)(9.2  9.5)(9.2  10.0)

2.30259 
2.39790
(10.0  9.0)(10.0  9.5)(10.0  11)
(11.0  9.0)(11.0  9.5)(11.0  10)

=2.21920
Inverse Lagrangian Interpolation Formula
Interchanging x and y in the Lagrangian Interpolation Formula, we obtain the
n
l ( y)
inverseLagrangian interpolation formula given by x  Ln ( y)   k
xk
K 0 lk ( yk )
Example: If y1=4, y3=12, y4=19,yx=7, find x
2
lk (7)
xk
K 0 lk ( yk )
Solution: Using the inverse interpolation formula, we have x  Ln (7)  
Where x0=1, x1=3, x2=4, y0=4,y1=12, y2=19 and y=7
x
(7  y0 )(7  y2 )
(7  y0 )(7  y1 )
(7  y1 )(7  y2 )
x0 
x1 
x2 Putting the values we
( y0  y1 )( y0  y2 )
( y1  y0 )( y1  y2 )
( y2  y0 )( y2  y1 )
get x=1.86
Lecturer note -16
Error in interpolation and numerical integration:
Error or remainder term in interpolation: In this section we would like to provide
estimates on the “error” we make when interpolating data that is taken from sampling
an underlying function f(x). While the interpolant and the function agree with each
other at the interpolation points, there is, in general, no reason to expect them to be
close to each other elsewhere. Nevertheless we can estimate the difference between
them, a difference which we refer to as the interpolation error.
Let f(x) be a function in the interval (a,b) and suppose that f n+1(x) exists in (a,b). then
the error is
( x  x0 )( x  x1 )    ( x  xn ) n1
f(x)-pn(x)=.
f ( )
(n  1)!
where 𝜉 depends upon x,x0,-----,xn and f and min(x,x0,-----,xn)< 𝜉<max(x,x0,-----,xn)
Numerical Integration:
In this chapter we are going to explore various ways for approximating the integral of
a function over a given domain.
Since we can not analytically integrate every function, the need for approximate
integration formulas is obvious. In addition, there might be situations where the given
function can be integrated analytically, and still, an approximation formula may end up
being a more efficient alternative to evaluating the exact expression of the integral.
We want to construct numerical algorithms that can perform definite integrals
b
of the form
 f ( x)dx
a
Calculating these definite integrals numerically is called numerical integration,
numerical quadrature, or more simply quadrature.
THE TRAPEZOIDAL RULE
In this method to evaluate , we partition the interval of integration [a, b]
c
and replace f by a straight line segment on each subinterval.
The vertical lines from the ends of the segments to the partition points create a
collection of trapezoids that approximate the region between the curve and the x-axis.
We add the areas of the trapezoids counting area above the x-axis as positive and area
below the axis as negative and denote the sum by T.
1
1
1
( y0  y1 )h  ( y1  y2 ) h       ( yn 1  yn ) h
2
2
2
Then T=
h
 ( y0  yn  2( y1  y2      yn 1 )
2
Where y0=f(a), y1=f(x1)--------yn-1=f(xn-1), yn=f(b)
b
SIMPSON’S 1/3 RULE
Simpson’s rule for approximating the integral
 f ( x)dx
is
a
based on approximating f with quadratic polynomials instead of linear polynomials. We
approximate the graph with parabolic arcs instead of line segments .
The integral of the quadratic polynomial y=Ax2+Bx+c from x=-h to x=h is
ℎ
ℎ
∫−ℎ(Ax2 + Bx + c)dx = 3(y0+4y1+y2)
Simpson’s rule follows from partitioning [a, b] into an even number of subintervals of
equal length h, applying Eq. to successive interval pairs, and adding the results.
Algorithm: Simpson’s 1/3 Rule
To approximate the integral we use S=
h
( y0  4 y1  2 y2  4 y3      2 yn  2  4 yn 1  yn )
3
The y’s are the values of f at the partition points x0=a, x1=a+h,x2=a+2h,------,xn=b
In particular we have
h
S  ( s0  4s1  2s2 ), s0  y0  yn ,
3
s1  y1  y3     yn1 , s2  y2  y4     yn2
Where h=(b-a)|n nis the number of subinterval of the partitions.
Derivation of Trapezoidal and Simpson’s 1/3 rules of integration from Lagrangian
Interpolation
Integrating the formula in Lagrangian interpolation, we obtain
b
b
b
n
fk
a f ( x)dx  a Ln ( x)dx  K0 lk ( xk ) a lk ( x)dx
For n=1 we have only one interval [x0, x1] such that a = x0 and b = x1 and then the
above integration formula gives trapezoidal rule.
For n = 2 , we have two subintervals [x0, x1] and [x1, x2] of equal width h such that a
= x0 and b = x2 and then the above integration formula becomes
x2
b
h
f
(
x
)
dx

a
x f ( x)dx  3 ( f0  4 f1  f 2 )
0
and is the Simpson’s 1/3 rule of integration.
For n = 3 the above integration formula becomes
x3
b
3h
f
(
x
)
dx

a
x f ( x)dx  8 ( f0  3 f1  3 f 2  f3 )
0
and is known as Simpson’s 3/8 rule of integration.
Lecturer note -17
Theory continued and related problems on numerical integration:
Note: Using langranges interpolation we know
n
Pn ( x)   li ( x) f ( xi ) So , we
i 0
can approximate

b
a
b
n
b
n
a
i 0
a
i 0
f ( x)dx   pn ( x)dx   f ( xi )  li ( x)dx   Ai f ( xi ) is called a
Newton-Cotes formula
Note: General integration formulas
We recall that a weight function is a continuous, non-negative function with a positive
mass. We assume that such a weight function w(x) is given and would like to write a
quadrature of the form

b
a
f ( x)w( x)dx 
n
 A f (x )
i 0
i
i
Such quadratures are called general (weighted) quadratures.
Note: Composite Integration Rules
In a composite quadrature, we divide the interval into subintervals and apply an
integration rule to each subinterval.
Note:
Throughout this section we assumed that all functions we are interested in
integrating are actually integrable in the domain of interest. We also assumed that they
are bounded and that they are defined at every point, so that whenever we need to
evaluate a function at a point, we can do it. We will go on and use these assumptions
throughout the chapter.
Note: Simpson’s one third rule is applied subject to even number of subintervals that is
number of sub intervals should be multiple of 2. Similarly simpson’s three eight rule is
applied when the number of subinterval is multiple of three.
Note: Degree of precision is nothing but the maximum degree of the poly which can be
fitted through the data.
Degree of precision of trapezoidal rule is 1 and of Simpson’s one third rule is 2
Gauss quadrature having maxm degree of precision.
Related problem:
𝟐
Example: Use the trapezoidal rule with n 4 to estimate ∫𝟏 𝒙𝟐 dx
Compare the estimate with the exact value of the integral
Solution: To find the trapezoidal approximation, we divide the interval of integration into
four subintervals of equal length and list the values of y=x2 at the endpoints and
partition points.
j
xj
Yj=xj2
Yj=xj2
0
1.0
1.0000
1
1.25
1.5625
2
1.50
2.2500
3
1.75
3.0625
4
2.00
4.0000
𝒃−𝒂
With n 4 and h= = 0.25
𝒏
So the approximate value of the integral is T=
h
1
[ y0  y4  2( y1  y2  y3 )]  [1.4  2(6.875)]  2.34375
2
8
2
2
x3
7
  2.33334
The exact value of the integral is  x dx 
31 3
1
2
The approximation is a slight overestimate. Each trapezoid contains slightly more than
the corresponding strip under the curve.
1
1
dx with four
Example: Using Trapezoidal rule solve the integral,  2
x  6 x  10
0
subintervals.
Solution: For four subintervals we have the trapezoidal rule is
h
f ( x)dx  [ y0  2 y1  2 y2  2 y3  y4 ]
a
2
the range of integral [0,1] is divided into four equal subinterval of width h=0.25, by the
points, 0.0,0.25,0.50,0.75 and 1 .
Considering them as the x values, corresponding values of the integrand denoted by
y0,y1,y2,y3,y4 are 0.10, 0.08649,0.07547,0.06639, and 0.05882

b
1
Hence  x
0
2
1
0.25
dx 
[0.10  2  0.08649  2  0.07547
 6 x  10
2
2  0.06639  0.05882]  0.07694
5
Example: Find an approximate value of loge5 by calculating
dx
 4x  5
by Simpson’s
0
1/3 rule of integration.
5
calculate the value of 
0
5
dx
1
1
1

0 4 x  5   4 log(4 x  5) 0  4 [log 25  log 5]  4 log 5 .Now to
5
Solution: we note that
dx
by Simpson’s rule of integration, divide the interval
4x  5
𝑏−𝑎
[0, 5] into n = 10 equal subintervals, each of length h=
𝟏
𝑛
=0.5
Since x0=0, y0=0.2, x1=0.5, y1= similarly we find the values at the other points
𝟕
5
Hence
dx
0.5
0 4 x  5  S  3 [( y0  y11 )  4( y1  y3  y5  y7  y9 )
2( y2  y4  y6  y8  y10 )] 
0.5
[0.24  4(0.3963)  2(0.2944)]
3
=0.4023
and loge 5 = 4(0.4023) = 1.6092.
10
1
Example: Find 
dx using Simpson’s one third rule by taking 10 subintervals
1  x2
0
Solution: By simpson’s one third rule we have
b
h
a f ( x)dx  3 [ y0  4( y1  y3  )  2( y2  y4  )  yn ]
let the range [0,10] is subdivided into 10 equal interval of
width h=1, by the x values 0,1,2,3,4,5,6,7,8,9 and 10. Corresponding y values of the
1
function f ( x) 
are listed below:
1  x2
x
0 1 2 3 4
5
6
7
8
9
10
y
1 0.5 0.2 0.1 0.0588 0.0385 0.0270 0.02 0.0154 0.0122 0.0099
10
1
 1 x
0
2
1
dx  [1  4(0.5  0.1  0.0385  0.02  0.0122)
3
2(0.2  0.0588  0.027  0.0154)  0.0099]
1
 (4.2951)  1.4317
3
Hence
Example: Evaluate
using Simpson’s three eight rule.
6
1
0 3  x 2 dx
Let the limit of integral [0,6] be divided into six equal parts with interval h=1, using the
x values 0,1,2,3,4,5 and 6. Corresponding y values of the given integrand function
are,
6
1
 3 x
0
x
0
1
2
3
4
5
6
y 0.333 0.25 0.1429 0.1 0.0526 0.0357 0.0256
2
3
3
dx  [ y0  3( y1  y2  y4  y5 )  2 y3  y6 ]  [2.0022]  0.7508
8
8
Lecturer note -18
Gauss quadrature:
b
So far, all the quadratures we encountered were of the form

n
f ( x)dx   Ai f ( xi ) . An
i 0
a
approximation of this form was shown to be exact for polynomials of degree ≤ n for an
appropriate choice of the quadrature coefficients Ai. In all cases, the quadrature points
x0, x1,-----,xn were given up front. In other words, given a set of nodes x0, x1,-----,xn the
coefficients {𝐴𝑖 } where i=0,1,2,---,n were determined such that the approximation was
exact in the respective set of polynomials.
We are now interested in investigating the possibility of writing more accurate
quadratures without increasing the total number of quadrature points. This will be
possible if we allow for the freedom of choosing the quadrature points.
The quadrature problem becomes now a problem of choosing the quadrature points in
addition to determining the corresponding coefficients in a way that the quadrature is
exact for polynomials of a maximal degree. Quadratures that are obtained that way are
called Gaussian quadratures.
Gaussian integral formula and Gauss legender two point formula:
This formula is based on unequispaced nodes.
b
Let we have to integrate
 f ( x)dx
a
Change the limit of the integral from a to b to -1 and 1 in order to maintain
orthogonal property of legender polynomial by using the transformation
x
ba
ab
u
. So we are looking for a quadrature of the form
2
2
1
 f ( x)dx  A f ( x )  A f ( x )
0
0
1
1
1
A straightforward computation will amount to making this quadrature exact for the
polynomials of degree ≤3. The linearity of the quadrature means that it is sufficient to
make the quadrature exact for 1, x, x2, x3. Hence we write the system of equations
1

1
1
f ( x)dx   xi dx  A0 x0i  A1 x1i
From this we can write
Solving we get
i=0,1,2,3
1
𝟐
A0 + A1 = 2, A0x0+A1x1=0, A0x02+A1x12=𝟑 , A0x03+A1x13=0
A0=A1=1
1
x0= -x1= √3
1
 f ( x)dx  f (
so that the desired quadrature is
1
1
1
) f ( )
3
3
Similarly for Gauss legender three point rule we have the coefficients are
A0=0.555, A1=0.888, A2=0.555
x0=-0.7745, x1=0, x2=0.7745
Related problem:
1
Example: Solve
1
 1  x dx by using Gauss legender 2 point and 3 point rule
0
Now a=0, b=1 So using the transform x 
Solution:
when x=0,u=-1,
𝒖+𝟏
ba
ab
we have x=
u
𝟐
2
2
x=1,u=1
1
1
1
1
1
1
0 1  xdx  1 u  3du  A0 f ( 3 )  A1 f ( 3 )
1
1


1
1
3
3
3
3
So
Now by using 3 point formula we have
1
1
1
1
1
0 1  xdx  1 u  3du  0.555 f (0.7745)  0.888 f (0)  0.555 f (0.7745) Where f(u)=𝑢+3
b
Quadrature error. If the trapezoidal rule is
 f ( x)dx 
a
ba
[ f (a)  f (b)] then the
2
f ( )
f ( )
( x  a)( x  b)dx  
(b  a )3 ,   (a, b)

2 a
12
b
E
interpolation error is
Surprisingly, Simpson’s quadrature is exact for polynomials of degree ≤ 3 and not only
𝒃−𝒂
for polynomials of degree ≤ 2. Let h= 𝟐
This means that the quadrature error for Simpson’s rule is
h
1
F (a  2h)  [ f (a)  4 f (a  h)  f (a  2h)]   h5 f 4 (a)   
3
90
a2h
Where F (a  2h) 

f ( x) dx
a
1 b  a 5 (4)
(
) f ( ),  [ a, b]
90 2
Since the fourth derivative of any polynomial of degree ≤ 3 is identically zero, the
quadrature error formula implies that Simpson’s quadrature is exact for polynomials
of degree ≤ 3.
Hence the error is E  
Lecturer note -19
Numerical solution of Ordinary differential equation:
There are differential equations that cannot be solved using the standard methods
even though they possess solutions. In such situations, we apply numerical methods for
obtaining approximate solutions, where the accuracy is sufficient. These methods yield
the solution in one of the following forms:
(i) Single-step method: A series for y in terms of powers of x, from which the value
of y at a particular value of x can be obtained by direct substitution.
(ii) Multi-step method: In multi step methods, the solution at any point x is obtained
using the solution at a number of previous points.
Taylor’s, Picard’s, Euler’s and Modified Euler’s methods are coming under singlestep
method of solving an ordinary differential equation.
The need for finding the solution of the initial value problems occur frequently in
Engineering and Physics. There are some first order differential equations that cannot
be solved using the standard methods. In such situations we apply numerical methods.
These methods yield the solution in one of the two forms:
(iii)A series for y in terms of powers of x, from which the value of y can be obtained
by direct substitution.
(iv)A set of tabulated values of x and y.
The methods of Taylor and Picard belong to class (i), whereas those of Euler, RungeKutta, etc., belong to the class (ii)
EULER METHOD:
Consider the initial value problem of first order y′=f(x,y), y(x0)=y0----------------(1)
Starting with given x0 and the value of h is chosen so small, we suppose x0,x1,x2----be
equally spaced x values (called mesh points) with interval h.
i.e x1=x0+h, x2=x1+h,-------Also denote y(x0)=y0, y(x1)=y1,-------By separating variables, the differential equation in (1) becomes dy=f(x,y)dx----------(2)
Integrating (2) from x0 to x1 with respect to x, (at the same time y changes from y0 to y1)
we get
This causes
y1
x1
y0
x0
 dy   f ( x, y)dx
𝒙
y1=y0+∫𝒙 𝟎 𝒇(𝒙, 𝒚)dx
𝟏
Assuming that
So
f(x,y)≈f(x0,y0), we have y1=y0+ (x1-x0) f(x0,y0),
y1=y0+ (x1-x0) f(x0,y0)= y0+ hf(x0,y0) in the range
Proceeding in this way, we obtain the general formula
x0<x<x1
yn+1= yn+hf(xn,yn)
The above is called the Euler method or Euler-Cauchy method.
Related problems:
Example: Use Euler’s method with h = 0.1 to solve the initial value problem
dy
 x2  y2
dx
With y(0)=0 in the range 0≤ 𝑥 ≤ 0.5
Solution: Here f(x,y) =x2+y2, x0=0, y0=0, h=0.1
Hence x1=x0+h=0.1, x2=x1+h=0.2, x3=x2+h=0.3, x4=x3+h=0.4, x5=x4+h=0.5
We know the iterative formula for Euler method is yn+1= yn+hf(xn,yn)
So yn+1= yn+h(xn2+yn2)
y1  y0  0.1( x0 2  y0 2 )  0
y2  y1  0.1( x12  y12 )  0.001
y3  y2  0.1( x2 2  y2 2 )  0.005
y4  y3  0.1( x32  y32 )  0.014
So continuing this way to get y5 i.e y(0.5)
Example: Using Euler method solve the equation
𝑑𝑦
𝑑𝑥
= 2xy +1, y(0)=0, h=0.02 for
x=0.1
Solution: Here f(x,y) =2xy+1, x0=0, y0=0, h=0.02
Hence x1=x0+h=0.02, x2=x1+h=0.04, x3=x2+h=0.06, x4=x3+h=0.08, x5=x4+h=0.1
We know the iterative formula for Euler method is yn+1= yn+hf(xn,yn)
So
yn+1= yn+h(2xnyn+1)
y1  y0  0.02(2 x0 y0  1)  0.02
y2  y1  0.02(2 x1 y1  1)  0.04
y3  y2  0.02(2 x2 y2  1)  0.06
y4  y3  0.02(2 x3 y3  1)  0.08 , y5  y(0.1)  y4  0.02(2 x4 y4  1)  0.1
Modified Euler Method: Modified Euler method is given by the iteration formula
h
y1( n 1)  y0  [ f ( x0 , y0 )  f ( x1 , y1( n ) )]
2
(n)
Where y1 is the nth approximation to y1 .The iteration formula can be started by
choosing y1(0) from Euler’s formula y1(0)  y0  hf ( x0 , y0 )
Lecturer note -20
Improved Euler method continued and Related problems: As per theory
of previous section we conclude that in each step we modify or improve the
approximation by Euler’s method by the process recommended to minimize the
error in numerical computation.
Example:
given that
Using modified Euler’s method, determine the value of y when x 0.1
dy
y(0)=1, take h=0.5
 x2  y
dx
Solution: Here f(x,y) =x2+y, x0=0, y0=1, h=0.5
Now y1(0)  y0  hf ( x0 , y0 )  1  0.05  1.05 The predicted value
h
y1(1)  y0  [ f ( x0 , y0 )  f ( x1 , y1(0) )]
2
 1  0.025[ f (0,1)  f (0.05,1.05)]
The corrected value
 1  0.025[1  (0.05)  1.05]
2
 1.0513
Hence we take y1=1.0513, which is correct to four decimal places.
h
Formula takes the form y2 ( n 1)  y1  [ f ( x1 , y1 )  f ( x2 , y2 ( n ) )]
2
where we first evaluate y2(0) using the Euler formula
y2 (0)  y1  h[ f ( x1 , y1 )]  1.0513  0.05[(0.05)2  1.0513]  1.1040
h
y2 (1)  y1  [ f ( x1 , y1 )  f ( x2 , y2(0) )]
2
0.05
 1
{(0.05) 2  1.0513  (0.1) 2  1.1040}
2
 1.1055
Hence we take y2= 1.1055 .
So the value of y when x 0.1 is 1.1055 correct to four decimal places.
Example: Using modified Euler’s method, determine the value of y when x 0.2 given
𝒅𝒚
that
= x+√𝑦 y(0)=1 Take h=0.2
𝒅𝒙
Solution: Here f(x,y) =x+√y, x0=0, y0=1, h=0.2
y1(0)  y0  h[ f ( x0 , y0 )]  1  0.2  1.2
Now
Which is the predicted value
h
y1(1)  y0  [ f ( x0 , y0 )  f ( x1 , y1(0) ]
2
0.2
 1
[1  (0.2  1.2)  1.2295
2
Which is the corrected value of y1
Also we find
h
y1(2)  y0  [ f ( x0 , y0 )  f ( x1 , y1(1) ]
2
 1  0.1[1  (0.2  1.2295)]  1.2309
Hence we take y(0.2) y1 1.2309.
Note: The Taylor series method has desirable features, particularly in its ability to keep
the errors small, but that it also has the strong disadvantage of requiring the evaluation
of higher derivatives of the function f(x,y) .
We observed that the Euler method could be improved by computing the function f(x,y)
at a predicted point at the far end of the step in x
The Runge-Kutta approach is to aim for the desirable features of the Taylor series
method, but with the replacement of the requirement for the evaluation of higher order
derivatives with the requirement to evaluate f(x,y) at some points within the step x i to
xi+1.
RUNGE KUTTA METHODS
We use the fact that Runge-Kutta method of rth order agree with Taylor’s series
solution up to the terms of hr.
Second Order Runge-Kutta Method
Computationally, most efficient methods in terms of accuracy were developed by two
German mathematicians, Carl Runge and Wilhelm Kutta. These methods are well
known as Runge-Kutta methods (R-K methods). In this and the coming section we
consider second and fourth order R-K methods.
There are several second order Runge-Kutta formulas and we consider one among
them.
Working Method (Second Order Runge-Kutta Method)
Given the initial value problem. Suppose x0, x1, x2,----be equally spaced x values with
space length h.
xn1  xn  h
Alogrithm:
kn  hf ( xn , yn ), ln  hf ( xn 1 , yn  kn )
1
yn1  yn  (kn  ln )
2
Lecturer note -21
Runge kutta method continued and Related problems:
Note: Modified Euler method is a special case of second order Runge-Kutta method
Fourth Order Runge-Kutta method:
The Runge-Kutta method of fourth order (also known as classical Runge-Kutta
method) gives greater accuracy and is most widely used for finding the approximate
solution of first order ordinary differential equations.
The method is well suited for computers.
The method is shown in the following algorithm.
Algorithm (The Runge-Kutta method)
Given the initial value problem. Suppose x0, x1, x2,----be equally spaced x values with
space length h.
Also denote y(x0)=y0, y(x1)=y1 ---------- For n 0, 1,----- until termination do:
xn 1  xn  h
h
1
An  hf ( xn , yn ), Bn  hf ( xn  , yn  An )
2
2
h
1
Cn  hf ( xn  , yn  Bn )
2
2
Dn  hf ( xn  h, yn  Cn )
1
yn 1  yn  ( An  2 Bn  2Cn  Dn )
6
Related problem:
Example:
Use Runge-Kutta method with h 0.1 to find y(0.2) given
With y(0)=0
Solution:
Here f(x,y) =x2+y2, x0=0, y0=0, h=0.1
Hence x1=x0+h=0.1, x2=x1+h=0.2
Now as per the process we have
dy
 x2  y2
dx
An  0.1( xn 2  yn 2 )
1
An ) 2 ]
2
1
Cn  0.1[( xn  0.05) 2  ( yn  Bn ) 2 ]
2
2
2
Dn  0.1[ xn 1  ( yn  Cn ) ]
1
yn 1  yn  ( An  2 Bn  2Cn  Dn )
6
Bn  0.1[( xn  0.05) 2  ( yn 
Now x1= x0+0.1=0.1, we have
A0  (0.1)0  0
B0  0.1[(0.05) 2  0)  0.00025
C0  0.1[(0.05) 2  (0.000125) 2 ]  0.00025
D0  0.1[(0.1) 2  (0.00025) 2 ]  0.001
1
y1  y0  ( A0  2 B0  2C0  D0 )  0.00033
6
I.e y(0.1)=0.00033
Similarly
1
y(0.2)=y1+ (A1+2B1+2C1+D1)
6
A1  0.1( x12  y12 )  0.1[(0.1)2  (0.00033)2 ]  0.001
1
B1  0.1[( x1  0.05) 2  ( y1  A1 ) 2 ]  0.00225
2
1
C1  0.1[( x1  0.05) 2  ( y1  B1 ) 2 ]  0.00025
2
2
2
D1  0.1[ x 2 ( y1  C1 ) ]  0.004
1
Y2=y(0.2)= y1+ (A1+2B1+2C1+D1)=0.002663
6
Lecturer note -22
Probability :
Probability theory is the branch of mathematics that studies the possible
outcomes of given events together with the outcomes' relative likelihoods
and distributions.
In common usage, the word \probability" is used to mean the chance that a particular
event (or set of events) will occur expressed on a linear scale from 0 (impossibility) to 1
(certainty), also expressed as a percentage between 0 and 100%.
The analysis of data (possibly generated by probability models) is called statistics.
Probability is a way of summarizing the uncertainty of statements or events.
It gives a numerical measure for the degree of certainty (or degree of uncertainty) of
the occurrence of an event.
Another way to define probability is the ratio of the number of favorable
outcomes to the total number of all possible outcomes.
This is true if the outcomes are assumed to be equally likely.
The collection of all possible outcomes is called the sample space.
Example: When we flip a coin then sample space is
S={H,T},
Where H denotes that the coin lands ”Heads up” and T denotes that the coin lands
”Tails up”.
For a ”fair coin ” we expect H and T to have the same ”chance ” of
occurring, i.e., if we flip the coin many times then about 50 % of the
outcomes will be H.
We say that the probability of H to occur is 0.5 (or 50 %) .
The probability of T to occur is then also 0.5.
If there are n total possible outcomes in a sample space S, and m of those
are favorable for an event A, then probability of event A is given as
P ( A) 
number of favorable outcomes
m
𝒏(𝑨)
=
=
n total number of possible outcomes 𝒏(𝑺)
Example: Find the probability of getting a 3 or 5 while throwing a die.
Solution. Sample space S = {1,2,3,4,5,6} and event A = {3,5}.
We have n(A) = 2 and n(S) = 6.
So,
𝒏(𝑨)
P(A) =𝒏(𝑺) = 0.3333
Experiments and random events. In probability theory, random experiment
means a repeatable process that yields a result or an observation.
Tossing a coin, rolling a die, extracting a ball from a box are random experiments.
When tossing a coin, we get one of the following elementary results:(heads); (tails):
Event: Any subset of the sample space is known as event
A random event is an event that either happens or fails to happen as a
result of an experiment.
When tossing a coin, the event (heads) may happen or may fail to happen, so this is a
random event.
A random experiment is the process of observing the outcome of a chance event.
Complement
The complement of event A is the set of all outcomes in a sample that are
not included in the event A. The complement of event A is denoted by A′
If the probability that an event occurs is P, then the probability that the
event does not occur is q=1-p. i.e. probability of the complement of an
event = 1- probability of the event. i.e. P(A′)= 1- P(A)
Intersections of Events
The event A ∩ 𝐵 is the intersection of the events A and B and consists of
outcomes that are contained within both events A and B. The probability of
this event, is the probability that both events A and B occur.
Mutually Exclusive Events
Two events are said to be mutually exclusive if A ∩ 𝐵 = ∅ ; (i.e. they have
empty intersection) so that they have no outcomes in common.
Unions of Events
The event A∪ 𝐵 is the union of events A and B and consists of the
outcomes that are contained within at least one of the events A and B.
Types of Probability
There are three ways to define probability, namely classical, empirical and
subjective probability.
. Classical probability
Classical or theoretical probability is used when each outcome in a sample
space is equally likely to occur. Roll a die and observe that P(A) = P(rolling a 3) = 1|6
Empirical probability
Empirical (or statistical) probability is based on observed data. The empirical
probability of an event A is the relative frequency of event A
Subjective Probability: Subjective probabilities result from intuition, educated
guesses, and estimates. For example: given a patient's health and
extent of injuries a doctor may feel that the patient has a 90% chance of a
full recovery.
Laws of Probability
As we have seen in the previous section, the probabilities are not always
based on the assumption of equal outcomes.
Axioms of Probability
For an experiment with a sample space S ={e1,e2,---,en} we can assign probabilities
P(e1), P(e2),---P(en) provided that
(1) 0≤P(ei)≤ 1
(2) P(s)= ∑𝑛𝑖=1 𝑃(𝑒𝑖 )=1
(3) If a set (event) A consists of outcomes
Complement Rule
For any event A, we have P(A′)= 1- P(A)
{e1,e2,---,ek} then P(A)=∑𝑘𝑖=1 𝑃(𝑒𝑖 )
Addition Law
If A and B are two different events then
P(A ∪ B) = P(A) + P(B) - P(A ∩ B)
Example Probability that John passes a Math exam is 4/5 and that he
passes a Chemistry exam is 5/6. If the probability that he passes both exams
is 3/4, and the probability that he will pass at least one exam.
Solution.
Let M = John passes Math exam, and C = John passes Chemistry exam.
P(John passes at least one exam) = P(M ∪ C) = P(M) + P(C) - P(M ∩ C) = 53|60
Note: If two events A and B are mutually exclusive, then
P(A ∪ B) = P(A) + P(B):
This follows immediately since A and B are mutually exclusive, P(A ∩ B) = 0.
Conditional probability and independence:
Conditional probability is the probability of an event occurring given the knowledge that
another event has occurred
The conditional probability of event A occurring, given that event B has occurred is
denoted by P(A|B) and is read that probability of A given B.
The conditional probability of event A given B is P( A | B) 
P( A  B)
, P( B)  0
P( B)
Lecturer note -23
Conditional Probability and related problems :
Note: In case when all the outcomes are equally likely, it is sometimes easier to find
conditional probabilities directly, without apply) the above condition directly.
If we already know that B has happened, we need only to consider outcomes
in B, thus reducing our sample space to B. Then,
Numberofoutcome sin A  B
Numberofoutcome sin B
Example: Let A = {a family has two boys}
boy} Find P(A|B).
P(A|B) =
and B = {a family of two has at least one
Solution: The event B contains the following outcomes: (B;B), (B;G) and
(G;B). Only one of these is in A. Thus, P(A|B) = 1|3.
However, if I know that the family has two children, and I see one of
the children and it's a boy, then the probability suddenly changes to 1/2.
There is a difference in the language and this changes the conditional
probability
Multiplication rule for probabilities
we have the rule for multiplication is
Using the concept of conditional probability
P( A  B)  P( A | B) P( B)
Statistical independence of events:
The events A and B are called (statistically) independent if P( A  B)  P( A) P( B)
Another way to express independence is to say that the knowledge of B occurring
does not change our assessment of P(A).
This means that P(A|B) =P(A). (The probability that a person is female given that he or
she was born in March is just the same as the probability that the person is female.)
Example: For a coin tossed twice, denote H1 the event that we got Heads on the first
toss, and H2 is the Heads on the second. Clearly, P(H1) = P(H2) = 1/2.
Then, counting the outcomes, P(H1H2) = 1/4 = P(H1)P(H2), therefore H1 and
H2 are independent events. This agrees with our intuition that the result
of the first toss should not affect the chances for H2 to occur.
Example: Three bits (0 or 1 digits) are transmitted over a noisy channel, so they will
Be flipped independently with probability 0.1 each. What is the probability that
a) At least one bit is flipped
Solution.
a) Using the complement rule, P(at least one) = 1 - P(none).
If we denote Fk the event that kth bit is flipped, then P(no bits are flipped) =
P(F1′ F2′ F3′ ) = (1 – 0.1)3 due to independence. Then,
P(at least one) = 1 – (0.9)3 = 0.271
Note: If an object is selected and then replaced before the next object is selected,
this is known as sampling with replacement. Otherwise, it is called sampling without
replacement.
Rolling a die is equivalent to sampling with replacement, whereas dealing
a deck of cards to players is sampling without replacement.
Example:
If we randomly pick two television sets in succession from a
shipment of 240 television sets of which 15 are defective, what is the probability
that they will be both defective?
Solution:
Let A denote the event that the first television picked was defective.
Let B denote the event that the second television picked was defective. Then
A|B will denote the event that both televisions picked were defective. Using
the conditional probability, we can calculate
15 14
7
P(A ∩ B) = P(A) P(B|A)= (
)(
)
240 239 1912
we assume that we are sampling without replacement
Example:
A box of fuses contains 20 fuses, of which 5 are defective. If
3 of the fuses are selected at random and removed from the box in succession
without replacement, what is the probability that all three fuses are defective?
Solution:
Let A be the event that the first fuse selected is defective. Let B
be the event that the second fuse selected is defective. Let C be the event
that the third fuse selected is defective. The probability that all three fuses
selected are defective is P(A ∩ B ∩ C). Hence P(A ∩ B ∩ C) = P(A) P(B|A) P(C|A ∩B)
5 4 3
1
= ( )( )( ) 
20 19 18 114
Note: The concept of independence is fundamental. In fact, it is this
concept that justifies the mathematical development of probability as a separate
discipline from measure theory. “independence of events
is not a purely mathematical concept.” It can, however, be made plausible that it should
be interpreted by the rule of multiplication of probabilities and this leads to the
mathematical definition of independence.
Example: Flip a coin and then independently cast a die. What is the
probability of observing heads on the coin and a 2 or 3 on the die?
Solution:
Let A denote the event of observing a head on the coin and let B be the
1 2
1
event of observing a 2 or 3 on the die. Then P(A ∩ B) = P(A) P(B) = ( )( ) 
2 6
6
Example: Two possible mutually exclusive events are always dependent
(that is not independent).
P(A ∩B) = P(A) P(B)
P(∅) = P(A) P(B)
0 = P(A) P(B).
Hence, we get either P(A) = 0 or P(B) = 0.
This is a contradiction to the
fact that A and B are possible events. This completes the result.
Solution: Suppose not. Then
Example:
Two possible independent events are not mutually exclusive.
Solution: Let A and B be two independent events and suppose A and B are
mutually exclusive. Then P(A) P(B) = P(A ∩B)
= P(∅)
= 0.
Therefore, we get either P(A) = 0 or P(B) = 0.
This is a contradiction to the fact that A and B are possible events.
The possible events A and B exclusive implies A and B are not independent;
and A and B independent implies A and B are not exclusive.
If A and B are independent events. Then A′ and B are
independent. Similarly A and B′ are independent.
Example:
Solution:
We know that A and B are independent, that is
P(A ∩ B) = P(A) P(B)
and we want to show that A′ and B are independent, that is
P(A′ ∩ B) = P(A′) P(B).
Since
P(A′ ∩ B) = P(A′|B) P(B)
= [1 − P(A|B)] P(B)
= P(B) − P(A|B)P(B)
= P(B) − P(A ∩ B)
= P(B) − P(A) P(B)
= P(B) [1 − P(A)]
= P(B)P(A′),
′
the events A and B are independent. Similarly, it can be shown that A and
B′ are independent.
Note:
If the events {Bi}, i=1 to m constitute a partition of the sample
space S and P(Bi) ≠ 0 for i = 1, 2, ...,m, then for any event A in S, we have
m
P( A)   P( Bi ) P( A | Bi )
i 1
Lecturer note -24
Random variable and Probability distributions:
Random variables: A variable whose numerical value is determined by the
outcome(or result) of a random experiment is called a random variable or chance
variable.
In probability theory and statistics it would be extremely useful to be able to work
with symbols representing “the numeric outcome that the chance experiment
will provide when carried out”.
Such symbols are random variables; random variables are most frequently denoted by
capital letters (such as X, Y , Z).
Note:
Consider a random experiment whose sample space is S. A
random variable X is a function from the sample space S into the set of real
numbers R such that for each interval I in R, the set {s  S | X ( s)  I } is an
event in S.
Example: Consider a random experiment of tossing 3 coins. Define the random
variable by this example.
Solution: Since we toss 3 coins. So the sample space S is given by
S={HHH,HHT,HTH,HTT,THH,THT,TTH,TTT}
Let X denote the number of heads observed. Then X=0 if the outcome is TTT
Similarly X=1 provided the outcome is HTT or THT or TTH
So X is the random variable whose values are determined by the outcomes of random
experiment of tossing three coins, and it is a function with domain S and range {0,1,2,3}
Hence X(TTT)=0, X(HTH)=2 etc
Note: Given a random experiment, there can be many random variables.
This is due to the fact that given two (finite) sets A and B, the number
of distinct functions one can come up with is |B|.|A| Here |A| means the
cardinality of the set A.
Note: A random variable is neither random nor variable, it is simply a function.
The values it takes on are both random and variable.
Note: The set {x ∈ R| x = X(s), s ∈ S} is called the space of the
random variable X.
There are three types of random variables: discrete, continuous, and
mixed. However, in most applications we encounter either discrete or continuous
random variable.
Discrete Random variable: If the space of random variable X is countable, then X is
called a discrete random variable. This means that , in practice, we may consider a list
of possible outcomes x1,x2,-------xn even if (n→ ∞) for any discrete random variable X
Distribution Functions of Discrete Random Variables
Every random variable is characterized through its probability density function.
Let RX be the space of the random variable X.
The function f : RX→ 𝑅 defined by f(x) = P(X = x) is called the probability density
function (pdf) of X.
Example:
A fair coin is tossed 3 times. Let the random variable X
denote the number of heads in 3 tosses of the coin. Find the sample space,
the space of the random variable, and the probability density function of X.
Solution: The sample space S of this experiment consists of all binary sequences
of length 3, that is S = {TTT, TTH, THT, HTT, THH, HTH, HHT, HHH}
The space of this random variable is given by RX = {0, 1, 2, 3}
Therefore, the probability density function of X is given by
1
3
3
1
f(0) = P(X = 0) =
f(1) = P(X = 1) =
f(2) = P(X = 2) = f(3) = P(X = 3) =
8
8
8
8
Note: If X is a discrete random variable with space RX and probability
density function f(x), then
(a) f(x) ≥ 0 for all x in RX, and (b)

f ( x)  1
xRX
Example:
If the probability of a random variable X with space RX = {1, 2, 3, ..., 12} is
given by f(x) = k (2x − 1), then, what is the value of the constant k?
Solution:
So
Note:
x 12
12
x 1
x 1
1   k (2 x  1)  k{2 x  12}  k{12 13  12}  144k
k
1
144
The cumulative distribution function F(x) of a random variable X is defined as
F(x) = P(X ≤ x) for all real numbers x.
Note: If X is a random variable with the space RX , then
F(X)=∑𝒕≤𝐱 𝒇(𝒕) for x∈ RX , and t≤ x
Example:
If the probability density function of the random variable X
144
is given by f(x)=
for x = 1, 2, 3, ..., 12
2x 1
Solution:
The space of the random variable X is given by Rx = {1, 2, 3, ..., 12}.
1
F (1)   f (t )  f (1) 
144
t 1
1
3
4
F (2)   f (t )  f (1)  f (2) 


144 144 144
t 2
1
3
5
9
F (3)   f (t )  f (1)  f (2)  f (3) 



144 144 144 144
t 3
Similarly F (12)   f (t )  f (1)  f (2)  f (3)     f (12)  1
t 12
Note:
Let X be a random variable with cumulative distribution
function F(x). Then the cumulative distribution function satisfies the followings:
(a) F(−∞) = 0,
(b) F(∞) = 1, and
(c) F(x) is an increasing function, that is if x < y, then F(x) ≤ F(y) for
all reals x, y.
Example: Find the probability density function of the random variable
X whose cumulative distribution function is
Find P(X≤3), P(X=3), and P(X<3)
Solution: The space of this random variable is given by
RX = {−1, 1, 3, 5}.
The probability density function of X is given by
f(−1) = 0.25
f(1) = 0.50 − 0.25 = 0.25
f(3) = 0.75 − 0.50 = 0.25
f(5) = 1.00 − 0.75 = 0.25.
The probability P(X ≤ 3) can be computed by using the definition of F.
Hence
P(X ≤ 3) = F(3) = 0.75.
The probability P(X = 3) can be computed from
P(X = 3) = F(3) − F(1) = 0.75 − 0.50 = 0.25.
Finally, we get P(X < 3) from
P(X < 3) = P(X ≤ 1) = F(1) = 0.5.
Lecturer note -25
Continuous random variable and Probability distributions:
All of the random variables discussed previously were discrete, meaning they
can take only a _nite (or, at most, countable) number of values. However,
many of the random variables seen in practice have more than a countable
collection of possible values. For example, the proportions of impurities in
ore samples may run from 0.10 to 0.80. Such random variables can take any
value in an interval of real numbers. Since the random variables of this type
have a continuum of possible values, they are called continuous random
variables.
Distribution Functions of Continuous Random Variables
A random variable X is said to be continuous if its space is either an
interval or a union of intervals.
The function f(x) is a probability density function (PDF) for the
continuous random variable X, defined over the set of real numbers R, if
(a) f ( x)  0, forallx
(b) 


f ( x)dx  1
b
(c) P(a  x  b)   f ( x)dx
a
Example: Is the real valued function f:R→R defined by
a probability density function for some random variable X?
Solution: We have to show that f is nonnegative and the area under f(x)
is unity. Since the domain of f is the interval (0, 1), it is clear that f is
nonnegative. Next, we calculate



2
1
f ( x)dx   2 x dx  2    1
 x 1
1
2
2
Thus f is a probability density function.
Example: For what value of the constant c, the real valued function
f : R→ R given by
where a, b are real constants, is a probability density function for random
variable X?
Solution: Since f is a pdf, k is nonnegative. Further, since the area under f
is unity, we get

I


c
So
Note:
b
f ( x)dx   cdx  c  x a  c(b  a ) =1
b
a
1
ba
Let f(x) be the probability density function of a continuous
random variable X. The cumulative distribution function F(x) of X is
x
defined as
F ( x)  P( X  x) 

f (t )dt

The cumulative distribution function F(x) represents the area under the
probability density function f(x) on the interval (−∞, x)
Note:
If F(x) is the cumulative distribution function of a continuous
random variable X, the probability density function f(x) of X is the
d
F ( x)  f ( x)
derivative of F(x), that is
dx
By Fundamental Theorem of Calculus, we get
This tells us that if the random variable is continuous, then we can
find the pdf given cdf by taking the derivative of the cdf. Recall that for
discrete random variable, the pdf at a point in space of the random variable
can be obtained from the cdf by taking the difference between the cdf at the
point and the cdf immediately below the point.
Example:
What is the probability density function of the random
1
  x  
1  e x
Solution: The pdf of the random variable is given by
d
d
1
e x
f ( x)  F ( x )  (
)

dx
dx 1  e x
(1  e x )2
Note: Let X be a continuous random variable whose cdf is F(x).
Then followings are true:
(a) P(X < x) = F(x),
(b) P(X > x) = 1 − F(x),
(c) P(X = x) = 0 , and
(d) P(a < X < b) = F(b) − F(a).
Note: We will say that the random variable X is symmetric with respect to the
point c if the following conditions are satisfied:
i) if c + a is a value of the random variable X, then c- a is also a value of the random
variable X;
ii) P(X = c + a) = P(X = c- a).
The condition ( ii) can be rewritten as
P(X- c = a) = P(c- X = a)
which shows that "X is symmetric with respect to the point c".
X- c and c- X have the same distribution.
variable whose cdf is F ( x) 
Expected values of Random Variables:
One of the most important things we'd like to know about a random variable
is: what value does it take on average? What is the average price of a
computer? What is the average value of a number that rolls on a die?
Expected value (mean)
The mean or expected value of a discrete random variable
mass function p(x) is given by E ( X )   xp( x)
x with probability
x
We will sometimes use the notation E(X)=𝜇
Note: Expected value of a function
If X is a discrete random variable with probability mass function p(x) and if
g(x) is a real valued function of x, then E ( g ( X ))   g ( x) p( x)
x
Note: Variance
given by
The variance of a random variable X with expected value 𝜇 is
V ( x)   2  E ( X   ) 2  E ( X 2 )   2
Where E ( X 2 )   x 2 p( x)
x
The variance defines the average (or expected) value of the squared difference
from the mean.
If we use V(X)= E ( X   )2 as a definition, we can see that
V ( X )  E ( X   ) 2  E ( X 2  2 x   2 )  E ( X 2 )  2 E ( X )   2
 E( X 2 )   2
Lecturer note -26
Mean , variance continued and Probability distributions:
Note: Standard deviation
The standard deviation of a random variable X is the square root of the
variance, and is given by    2  E ( X   ) 2
The mean describes the center of the probability distribution, while standard
deviation describes the spread. Larger values of 𝜎 signify a distribution
with larger variation. This will be undesirable in some situations, e.g. industrial
process control, where we would like the manufactured items to have
identical characteristics.
Note: For any random variable x and constants a and b, we have
E (aX  b)  aE ( X )  b , V (aX  b)  a 2V ( X )  a 2 2
And for several random variables x1, x2,-------xk we have
E(x1+x2+-----+xk)= E(x1)+E(x2)+--------+E(xk)
Example: The number of fire emergencies at a rural county in a week, has the
following distribution
x
0
1
2
3
4
P(X = x) 0.52 0.28 0.14
0.04
0.02
Find E(X), V (X) and 𝜎
Solution: From Definition , we see that
E(X) = 0(0.52) + 1(0.28) + 2(0.14) + 3(0.04) + 4(0.02) = 0.76 = 𝜇
and from definition of E(x2), we get
E(x2) = 02(0.52) + 12(0.28) + 22(0.14) + 32(0.04) + 42(0.02) = 1.52
Hence, from Definition we get V ( X )  E ( X 2 )   2  0.9424
Now, from Definition the standard deviation 𝜎 =√0.9424
Example: Let X be a random variable having probability mass function given in the
above Example. Calculate the mean and variance of g(X) = 4X + 3.
Solution: In the above Example , we found E(X) = 𝜇= 0.88 and V (X) = 0.8456.
E(g(X)) = 4E(X) + 3 = 4(0.88) + 3 = 3.52 + 3 = 6.52
and V (g(X)) = 42 V (X) = 16(0.8456) = 13.5296
Note: Moments of Random Variables
The nth moment about the origin of a random variable X,
as denoted by E(Xn), is defined to be
If n = 1, then E(X) is called the first moment about the origin. If n = 2, then E(X2) is
called the second moment of X about the origin. In general, these moments may or may
not exist for a given random variable. If for a random variable, a particular moment does
not exist, then we say that the random variable does not have that moment.
Note; Let X be a random variable with space RX and probability
density function f(x). The mean μX of the random variable X is defined as
if the right hand side exists.
The mean of a random variable is a composite of its values weighted by the
corresponding probabilities. The mean is a measure of central tendency
Example: If the probability density function of the random variable X
Is
then what is the expected value of X?
Solution The expected value of X is
Example: Let X have the density function
For what value of k is the variance of X equal to 2
Solution: The expected value of X is
So the variance is var(X)= E ( X 2 )  ( E ( X )) 2 
1 2
k
18
Bernoulli Distribution:
A Bernoulli trial is a random experiment in which there are precisely two
possible outcomes, which we conveniently call ‘failure’ (F) and ‘success’ (S).
We can define a random variable from the sample space {S, F} into the set
of real numbers as follows:
X(F) = 0 X(S) = 1.
Note: The random variable X is called the Bernoulli random variable if its probability
density function is of the form f(x) = px (1 − p)1-x, x = 0, 1
where p is the probability of success.
Note: If X is a Bernoulli random variable with parameter p, then the mean, variance are
 X    p,
respectively given by
 X 2   2  p (1  p )
The mean of the Bernoulli random variable is
Similarly the variance of X is given by
x 1
x 1
x 0
x 0
 X 2   ( x   X ) 2 f ( x)  
 p(1  p)
( x  p) 2 p x (1  p)1 x
Lecturer note -27
Binomial and Poission distributions:
Binomial Distribution
Consider a fixed number n of mutually independent Bernoulli trails. Suppose
these trials have same probability of success, say p. A random variable
X is called a binomial random variable if it represents the total number of
successes in n independent Bernoulli trials.
Now we determine the probability density function of a binomial random
variable. Recall that the probability density function of X is defined as
f(x) = P(X = x).
Thus, to find the probability density function of X we have to find the probability
of x successes in n independent trails.
If we have x successes in n trails, then the probability of each n-tuple
with x successes and n − x failures is px (1 − p)n-x.
n
However, there are   tuples with x successes and n − x failures in n trials.
 x
n
Hence
P( X  x)    p x (1  p) n  x
 x
Therefore, the probability density function of X is
n
f ( x)  P( X  x)    p x (1  p) n  x , x  0,1, , n
 x
Note:
The Bernoulli trials are formally defined by the following properties:
a) The result of each trial is either a success or a failure
b) The probability of success p is constant from trial to trial.
c) The trials are independent
d) The random variable X is defined to be the number of successes in n
repeated trials
The mean and variance of the binomial distribution are
E(X) = 𝜇 = np and V(X) = 𝜎 2= npq:
Example: On a five-question multiple-choice test there are five possible
answers, of which one is correct. If a student guesses randomly and independently,
what is the probability that she is correct only on questions 1 and 4?
Solution: Here the probability of success is p = 1/5 , and thus 1 − p = 4/5
.
Therefore, the probability that she is correct on questions 1 and 4 is
P(correct on questions 1 and 4) =p2(1 − p)3= 0.02048
Example: On a five-question multiple-choice test there are five possible
answers, of which one is correct. If a student guesses randomly and independently,
what is the probability that she is correct only on two questions?
Solution: Here the probability of success is p = 1/5 , and thus 1−p = 4/5
5
. There are   different ways she can be correct on two questions
 2
Therefore, the probability that she is correct on two questions is
5
P(correct on two questions) =   p2(1 − p)3=0.2048
 2
Example: What is the probability of rolling two sixes and three non sixes
in 5 independent casts of a fair die?
Solution: Let the random variable X denote the number of sixes in 5 independent
casts of a fair die. Then X is a binomial random variable with probability of success p
and n = 5. The probability of getting a six is p = 1/6
.
 5 1 2 53
f (2)  P( X  2)   
 0.160751
Hence
 2 6 6
Example:
What is the probability of rolling at most two sixes in 5
independent casts of a fair die?
Solution: Let the random variable X denote number of sixes in 5 independent
casts of a fair die. Then X is a binomial random variable with probability
of success p and n = 5. The probability of getting a six is p = 1/6
. Hence, the probability of rolling at most two sixes is
P(X ≤ 2) = F(2) = f(0) + f(1) + f(2)
0
5
1
4
2
3
 5  1   5   5 1   5   5  1   5 
                      0.9577
 0 6   6  1 6   6   2 6   6 
Poisson Distribution:
It is often useful to define a random variable that counts the number of events
that occur within certain specified boundaries.
For example, the average number of telephone calls received by customer service
within a certain time limit.
The Poisson distribution is often appropriate to model such situations.
A random variable X is said to have a Poisson distribution if its probability density
e   x
function is given by f ( x) 
, x  0,1, 2    
x!
where 0 < 𝜆 < 1 is a parameter.
Mean and variance of Poisson RV
For Poisson RV with parameter 𝜇
E(X) = V (X) = 𝜇
E(X)=
Similarly V(X)=
Where
Example: A random variable X has a Poisson distribution with a
mean of 3. What is the probability that X is bounded by 1 and 3, that is,
P(1 ≤ X ≤ 3)?
Solution:
μX = 3 =𝛌
e   x
, x  0,1, 2    
x!
e3 3x
f ( x) 
, x  0,1, 2    
x!
f ( x) 
Hence
So P(1 ≤ X ≤ 3) = f(1) + f(2) + f(3)=12e-3
Note: Poisson distribution is the limiting form of binomial distribution when the number
of trials n becomes sufficiently large and the probability p of success in a trial is very
small.
Example: The number of traffic accidents per week in a small city
has a Poisson distribution with mean equal to 3. What is the probability of
exactly 2 accidents occur in 2 weeks?
Solution: The mean traffic accident is 3. Thus, the mean accidents in two
weeks are 𝜆 = (3) (2) = 6
e   x
, x  0,1, 2    
Since f ( x) 
x!
we get f(2) = 18 e-6
Example:
During a laboratory experiment, the average number of radioactive particles
passing through a counter in one millisecond is 4. What is the probability
that 6 particles enter the counter in a given millisecond? What is the
probability of at least 6 particles?
Solution: Using the Poisson distribution with x = 6 and 𝜇 = 4, we get
P(X = 6) = P(5 < X ≤ 6) = F(6) - F(5).
Lecturer note -28
Poission distributions continued and Hypergeometric distribution:
Poisson approximation for Binomial
Poisson distribution was originally derived as a limit of Binomial when n → ∞
while p = 𝜇/n, with fixed 𝜇. We can use this fact to estimate Binomial
probabilities for large n and small p.
Example: At a certain industrial facility, accidents occur infrequently. It is known that
the probability of an accident on any given day is 0.005 and the accidents
are independent of each other. For a given period of 400 days, what is the
probability that
(a) there will be an accident on only one day?
(b) there are at most two days with an accident?
Solution: Let X be a binomial random variable with n = 400 and p = 0.005.
Thus 𝜇 = np = (400)(0.005) = 2.
Using the Poisson approximation, we have
e2 21
P(X = 1) = f (1) 
 0.271
1!
P(X ≤ 2) = P(X = 0)+P(X = 1)+P(X = 2)=f(0)+f(1)+f(2) = 0.1353 + 0.271 + 0.271 =
0.6766
Hypergeometric distribution:
Consider the Hypergeometric experiment, that is, one that possesses the
following two properties:
a) A random sample of size n is selected without replacement from N
items.
b) Of the N items overall, k may be classified as successes and N- k are
classified as failures.
We will be interested, as before, in the number of successes X, but now
the probability of success is not constant
For a hypergeometric random variable X, the number of successes
in a random sample of size n selected from N items of which k are labeled
success and N- k labeled failure, is
f(x)
Note: The mean and variance of a hypergeometric distribution are
𝜇= n
𝑘
𝑁
and 𝝈𝟐 = n
𝒌
𝑵
𝒌
𝑵−𝒏
(1-𝑵)(𝑵−𝟏)
Example: Lots of 40 components each are called unacceptable if they contain as many
as 3 defectives or more. The procedure for sampling the lot is to select 5 components at
random and to reject the lot if a defective is found. What is the probability that exactly 1
defective is found in the sample if there are 3 defectives in the entire lot?
Solution: Using the above distribution with n = 5,N = 40, k = 3 and x = 1,
we can find the probability of obtaining one defective to be
 3  37   40 
f (1; 40,5,3)     /    0.3011
 1  4   5 
Example: A shipment of 20 tape recorders contains 5 that are defective. If 10 of them
are randomly chosen for inspection, what is the probability that 2 of the 10 will be
defective?
Solution: Substituting x = 2, n = 10, k = 5, and N = 20 into the formula, we
 5 15   20 
get f (2)  P( X  2)     /    0.348
 2  8   10 
Example: Suppose there are 3 defective items in a lot of 50 items. A
sample of size 10 is taken at random and without replacement. Let X denote
the number of defective items in the sample. What is the probability that
the sample contains at most one defective item?
Solution: Clearly, X = HY P(3, 47, 10). Hence the probability that the
sample contains at most one defective item is
 3  47   50   3  47   50 
P(X ≤ 1) = P(X = 0) + P(X = 1)     /       /   =0.504+0.4=0.904
 0  10   10   1  9   10 
Example: A radio supply house has 200 transistor radios, of which
3 are improperly soldered and 197 are properly soldered. The supply house
randomly draws 4 radios without replacement and sends them to a customer.
What is the probability that the supply house sends 2 improperly soldered
radios to its customer?
Solution: The probability that the supply house sends 2 improperly soldered radios to
 3 197   200 
its customer is P( X  2)   
/
  0.000895
 2  2   4 
Example: A random sample of 5 students is drawn without replacement
from among 300 seniors, and each of these 5 seniors is asked if she/he
has tried a certain drug. Suppose 50% of the seniors actually have tried the
drug. What is the probability that two of the students interviewed have tried
the drug?
Solution: Let X denote the number of students interviewed who have tried
the drug. Hence the probability that two of the students interviewed have
150 150   300 
tried the drug is P( X  2)  

/
  0.3146
 2  3   5 
Lecturer note -29
Continuous probability distributions , Uniform and normal distribution:
All of the random variables discussed previously were discrete, meaning they
can take only a finite (or, at most, countable) number of values.
However, many of the random variables seen in practice have more than a countable
collection of possible values.
For example, the proportions of impurities in ore samples may run from 0.10 to 0.80.
Such random variables can take any value in an interval of real numbers.
Since the random variables of this type have a continuum of possible values, they are
called continuous random variables.
The function f(x) is a probability density function (PDF) for the
continuous random variable X, defined over the set of real numbers R, if
Note: The cumulative distribution function (CDF) F(x) of a continuous
random variable X, with density function f(x), is
We have folowing two results follows
Uniform distribution:
One of the simplest continuous distributions is the continuous uniform distribution.
This distribution is characterized by a density function that is flat
and thus the probability is uniform in a definite interval, say [a, b].
The density function of the continuous uniform random variable X on the interval [a, b]
Is
The commulative distribution function for uniform distribution is defined as
1
xa
dt 
,a  x  b
ba
ba
a
x
F ( x)  
Mean and variance of Uniform distribution:
b
b
a
a
1
1
 xf ( x)dx  x b  adx  2 (b  a)
Mean=E(X)=
1
(a  b) 2
Variance= Var ( X )  E ( X 2 )  ( E( X )) 2  (b2  ab  a 2 ) 
3
4
b
b
a
a
Where E ( X 2 )   x 2 f ( x)dx   x 2
1
1
dx  (b 2  ab  a 2 )
ba
3
Example:
If X has a uniform distribution on the interval from 0 to 10,
10
then what is P(X+ 𝑋 ≥7)
Solution: the probability density function of X is f(x)=
1
10
for 0≤ 𝑥 ≤10
10
 7)  P( x 2  10  7 x)  P( x  2orx  5)
x
5
1
7
 1  P(2  x  5)  1   dx 
10
10
2
Normal distribution:
The most widely used of all the continuous probability distributions is the
normal distribution (also known as Gaussian). It serves as a popular model
for measurement errors, particle displacements under Brownian motion, stock
market uctuations, human intelligence and many other things. It is also
used as an approximation for Binomial (for large n) and Gamma (for large 𝛼)
distributions.
P( x 
The normal density follows the well-known symmetric bell-shaped curve.
The curve is centered at the mean value 𝜇 and its spread is, of course, measured
by the standard deviation 𝜎. These two parameters, 𝜇 and 𝜎 2 completely
determine the shape and center of the normal density function.
Note:
A random variable X is said to have a normal distribution
if its probability density function is given by f ( x) 
1 x 2
 (
)
1
e 2  ,   x  
 2
It will be denoted as X N ( ,  2 )
Note: The normal random variable Z with 𝜇= 0 and 𝜎 = 1 is said to have the
standard normal distribution.
Direct integration would show that E(Z) = 0 and V(Z) = 1:
Usefulness of Z
We are able to transform the observations of any normal random
variable X to a new set of observations of a standard normal random variable Z.
x
This can be done by means of the transformation Z 

1 x 2
 (
)
1
Example: Is the real valued function defined by f ( x) 
e 2  ,   x  
 2
a probability density function of some random variable X?
Solution: To answer this question, we must check that f is nonnegative
and it integrates to 1. The nonnegative part is trivial since the exponential
function is always positive. Hence using property of the gamma function, we
show that f integrates to 1 on R.
Note:
A normal random variable is said to be standard normal, if
its mean is zero and variance is one. We denote a standard normal random
variable X by X ~ N(0, 1).
The probability density function of standard normal distribution is the
following:
x2

1
e 2 ,   x  
2
Example: If X ~ N(0, 1), what is the probability of the random
variable X less than or equal to −1.72?
Solution: P( X  1.72)  1  P ( X  1.72)  1  0.9573  0.0427
f ( x) 
Lecturer note -30
Normal distribution continued and joint distribution:
Example: If Z ~ N(0, 1), what is the value of the constant c such
that P(| z | c)  0.95
0.95  P(| z | c)  P(c  z  c)
Solution:
 P ( z  c )  P ( z  c )  2 P ( z  c )  1
So P(z≤c)=0.975
𝑥−𝜇
Note:
If X ~ N(μ, 𝜎 2 ), then the random variable Z = 𝜎 ~ N(0, 1).
We will show that Z is standard normal by finding the probability
density function of Z. We compute the probability density of Z by cumulative
distribution function method.
Example: If X ~ N(3, 16), then what is P(4 ≤ X ≤ 8)?
43 x 3 83
P(4  x  8)  P(


)  P( z  1.25)  P( z  0.25)
Solution:
4
4
4
 0.8944  0.5987  0.2957
Example: If X ~ N(25, 36), then what is the value of the constant c
such that P(|x-25|≤c)=0.9544
Solution:
c x  25 c
0.9544  P(| x  25 | c)  P( 
 )
6
6
6
c
c
c
 P( z  )  P( z  )  2 P( z  )  1
6
6
6
c
P( Z  )  0.9772
6
Example: Given a random variable X having a normal distribution with 𝜇 = 300 and
𝜎 = 50. Find the probability that X is greater than 362.
Solution: To find P(X > 362), we need to evaluate the area under the normal
curve to the right of x = 362. This can be done by transforming x = 362 to
the corresponding Z-value. We get
x   362  300
z

 1.24

50
Hence P(X > 362) = P(Z > 1.24) = P(Z < -1.24)=0.1075
Example: A diameter X of a shaft produced has a normal distribution with parameters
𝜇 = 1.005; 𝜎 = 0.01. The shaft will meet specifications if its diameter is
between 0.98 and 1.02 cm. Which percent of shafts will not meet specifiations?
Solution: 1 - P(0.98 < X < 1.02)
0.98  1.005
1.02  1.05
 1  P(
z
)  1  (0.4938  0.4332)  0.0730
0.01
0.01
Note: The famous 68% - 95% rule
For a Normal population, 68% of all values lie in the interval [𝜇 − 𝜎, 𝜇 + 𝜎],
and 95% lie in [𝜇 − 2𝜎, 𝜇 + 2𝜎].
In addition, 99.7% of the population lies in [𝜇 − 3𝜎, 𝜇 + 3𝜎].
Example: Let X = monthly sick leave time have normal distribution with parameters
𝜇 = 200 hours and 𝜎 = 20 hours.
a) What percentage of months will have sick leave below 150 hours?
b) What amount of time x0 should be budgeted for sick leave so that the
budget will not be exceeded with 80% probability?
Solution: (a) P(X < 150) = P(Z < -2.25) = 0.5 – 0.4938 = 0.0062
(b) P(X < x0) = P(Z < z0) = 0.8, which leaves a table area for z0 of 0.3.
Thus, z0 = 0.84 and hence x0 = 200 + 20(0.84) = 216.8 hours
Normal approximation to Binomial:
As another example of using the Normal distribution, consider the Normal
approximation to Binomial distribution. This will be also used when discussing
sample proportions.
Note: If X is a Binomial random variable with mean 𝜇 = np and variance 𝜎 2 =npq,
X  np
then the random variables Z n 
approach the standard Normal as n gets large.
npq
We already know one Binomial approximation (by Poisson). It mostly
applies when the Binomial distribution in question has a skewed shape, that
is, when p is close to 0 or 1. When the shape of Binomial distribution is
close to symmetric, the Normal appoximation will work better. Practically,
we will require that both np and n(1 - p) ≥ 5.
Lecturer note -31
Joint Probability distribution:
Example: The probability that a patient recovers from a rare blood disease is 0.4. If
100 people are known to have contracted this disease, what is the probability
that at most 30 survive?
Solution: Let the binomial variable X represent the number of patients that
survive. Since n = 100 and p = 0.4, we have
𝜇 = np = (100)(0.4) = 40
And 𝜎 2 = npq = (100)(0.4)(0.6) = 24;
To obtain the desired probability, we compute z-value
x
 1.94 and the probability of fewer than 30 of the 100
for x = 30.5. Thus, Z 

patients surviving is P(X <30) = P(Z <-1.94) = 0.5 – 0.4738 = 0.0262.
Example: Suppose X is Binomial with parameters n = 15, and p = 0.4,
we are interested in the probability that X assumes a value from 7 to 9 inclusive, that is,
P(7 ≤X ≤ 9):
Solution: The exact probability is given by
9
P(7  x  9)   bin( x;15, 0.4)  0.1771  0.1181  0.0612  0.3564
7
For Normal approximation we find the area between x1 = 6.5 and x2 = 9.5
x  np x1  
9.5  6
using z-values which are z1  1

 0.26, z2 
 1.85

1.897
npq
Adding or removing 0.5 is called continuity correction. It arises when we try
to approximate a distribution with integer values (here, Binomial) through
the use of a continuous distribution (here, Normal). the
sum over the discrete set 7 ≤X≤ 9 is approximated by the integral of
the continuous density from 6.5 to 9.5.
P(7 ≤X≤ 9 ) = P(0.26 < Z < 1.85) = 0.4678 -0.1026 = 0.3652
Therefore, the normal approximation provides a value that agrees very closely
with the exact value of 0.3564. The degree of accuracy depends on both n
and p. The approximation is very good when n is large and if p is not too
near 0 or 1.
Distribution of several random variables:
There are many random experiments that involve more than one random
variable. For example, an educator may study the joint behavior of grades
and time devoted to study; a physician may study the joint behavior of blood
pressure and weight. Similarly an economist may study the joint behavior of
business volume and profit. In fact, most real problems we come across will
have more than one underlying random variable of interest.
Bivariate Discrete Random Variables
A discrete bivariate random variable (X, Y ) is an ordered pair of discrete random
variables.
If X and Y are two discrete random variables, the probability that X equals
x while Y equals y is described by p(x, y) = P(X = x, Y = y). That is, the
function p(x, y) describes the probability behavior of the pair ( X, Y)
Note: A real valued function f of two variables is a joint probability
density function of a pair of discrete random variables X and Y (with range
spaces RX and RY , respectively) if and only if
(a) f ( x, y )  0, ( x, y )  RX  Ry
 f ( x, y )  1
yR
X
Y
Bivariate Continuous Random Variables:
The joint probability density function of the random variables
X and Y is an integrable function f(x, y) such that
(a) f ( x, y )  0 forall ( x, y)  R 2
(b)

xR
 
(b) 

f ( x, y )dxdy  1
 
Example:
For what value of the constant k the function given by following
is a joint probability density function of some random variables X and Y ?
3
3
3
3
Solution: 1   f ( x, y)  kxy  k (1  2  3  2  4  6  3  6  9)  36k
x 1 y 1
x 1 y 1
So k=1/36
Example
Let the joint density function of X and Y be given as follows, what is the value of
constant k
Solution:
 
1

 
y
1
f ( x, y)dxdy   ky
0
2
 xdxdy
0
=k/10
So k=10
Note:
If we know the joint probability density function f of the random variables
X and Y , then we can compute the probability of the event A from P( A)    f ( x, y)dxdy
A
Example: Let the joint density of the continuous random variables X
and Y as follows
What is the probability of the event X≤y
Solution: Let A = (X ≤ Y ). we want to find
Note: Let (X, Y ) be a continuous bivariate random variable. Let
f(x, y) be the joint probability density function of X and Y . The function

f1 ( x) 

f ( x, y )dy is called the marginal probability density function of X. Similarly,


the function f 2 ( y ) 

f ( x, y )dx is called the marginal probability density function of Y

Example: If the joint density function for X and Y is given by
then what is the marginal density function of X, for 0 < x < 1?
Solution: The domain of the f consists of the region bounded by the curve
x = y2 and the vertical line x = 1.
Lecturer note -32
Joint Probability distribution continued and related problems:
Note: Let X and Y be the continuous random variables with
joint probability density function f(x, y). The joint cumulative distribution
function F(x, y) of X and Y is defined as
y
F ( x, y )  P ( X  x , Y  y ) 
x

f (u , v)dudv
 
From the fundamental theorem of calculus, we again obtain f ( x, y) 
2 F
xy
Example: If the joint cumulative distribution function of X and Y is
given by
then what is the joint density of X and Y ?
1  
6
Solution: f ( x, y ) 
(2 x3 y  3x 2 y 2 )  ( x 2  2 xy )
5 x y
5
Hence, the joint density of X and Y is given by
Example
Solution:
Covariance of Bivariate Random Variables
First, we define the notion of product moment of two random variables
and then using this product moment, we give the definition of covariance
between two random variables.
Let X and Y be any two random variables with joint density
function f(x, y). The product moment of X and Y , denoted by E(XY ), is
defined as
Here, RX and RY represent the range spaces of X and Y respectively
Note:
Let X and Y be any two random variables with joint density
function f(x, y). The covariance between X and Y , denoted by Cov(X, Y )
(or 𝜎 XY), is defined as
Cov(X, Y ) = E( (X – μX) (Y – μY ) ),
where μX and μY are mean of X and Y , respectively
The covariance helps us assess the relationship between two variables.
Positive covariance means positive association between X and Y meaning
that, as X increases, Y also tends to increase. Negative covariance means
negative association.
Note: The mean of μX is given by  x  E ( X ) 

 
 xf ( x)dx    xf ( x, y)dxdy
1

Similarly the mean of μY is given by  y  E (Y ) 
 
 

 yf ( y)dy    yf ( x, y)dxdy
2

 
Note:
Let X and Y be any two random variables. Then
Cov(X, Y ) = E(XY ) − E(X)E(Y ).
Cov(X, Y ) = E((X – μX) (Y – μY ))
= E(XY – μX Y – μY X + μX μY )= E(XY ) – μX E(Y ) – μY E(X) + μX μY
= E(XY ) – μX μY – μY μX + μX μY
= E(XY ) – μX μY
= E(XY ) − E(X)E(Y ).
Example: Let X and Y be discrete random variables with joint density
What is the covariance 𝜎
XY
between X and Y .
x  2y 1
 (2 x  6)
18
18
y 1
2
28
Hence the expected value of X is E ( X )   xf1 ( x) 
18
x 1
2
x  2y 1
Similarly, the marginal of Y is f 2 ( y )  
 (3  4 y )
18
18
x 1
2
Solution:
The marginal of X is f1 ( x)  
y 2
29
18
y 1
Further the product moment of X and Y is given by
E ( XY )
x  2 y  2 xyf ( x, y )  f (1,1)  2 f (1, 2)  2 f (2,1)  4 f (2, 2)
=45/18
 
Hence the expected value of Yis E ( y)   yf 2 ( y) 
x 1 y 1
Hence, the covariance between X and Y is given by
Cov(X, Y ) = E(XY ) − E(X)E(Y )=-0.00617
Note:
For an arbitrary random variable, the product moment and
covariance may or may not exist. Further, note that unlike variance, the
covariance between two random variables may be negative.
Note:
If X and Y are independent random variables, then
E(XY ) = E(X)E(Y ).
Recall that X and Y are independent if and only if
f(x, y) = f1(x) f2(y).
Let us assume that X and Y are continuous. Therefore
If X and Y are discrete, then replace the integrals by appropriate sums to
prove the same result.
Note: If X and Y are independent random variables, then the
covariance between X and Y is always zero, that is
Cov(X, Y ) = 0.
Of course, if covariance is 0, then so is the correlation coefficient. Such
random variables are called uncorrelated. The inverse of this Note is
not true, meaning that zero covariance does not necessarily imply
independence.
Variance of sums
If X and Y are random variables and U = aX + bY + c, then
V (U) = V (aX + bY + c) = a2 V (X) + b2 V (Y ) + 2ab Cov(X; Y )
If X and Y are independent then V (U) = V (aX + bY ) = a2 V (X) + b2 V (Y )
Example: If X and Y are random variables with variances V (X) = 2, V (Y ) = 4,
and covariance Cov(X; Y ) = -2, and the variance of the random variable
Z = 3X - 4Y + 8.
Solution: V (Z) = V (3X - 4Y + 8) = 9V (X) + 16V (Y ) - 24Cov(X; Y )
so V (Z) = (9)(2) + (16)(4) - 24(-2) = 130:
Lecturer note -33
Mathematical statistics:
What is Statistics?
Statistics and probabilities are two strongly connected, but still distinct fields of
mathematics. It
is said that "probabilities are the vehicle of statistics". This is true, meaning that if it
weren't for the probabilistic laws, statistics wouldn't be possible.
To illustrate the difference between probabilities and statistics, let us consider two
boxes: a probabilistic and a statistical one.
For the probabilistic box we know that it contains 5 white, 5 black and 5 red balls; the
probabilistic problem is that if we take a ball, what is the chance that it were white? For
a statistical box we do not know the combination of balls in the box.
The probability sets the question of the chance that something (an event) happens
when we know the probabilities (we know the population).
Statistics asks us to make a sample, to analyze it and then make a prediction
concerning the population, based on the information provided by the sample.
Basics:
The population is a collection (a set) of individuals, objects or numerical
data obtained by measurements, whose properties need to be analyzed.
Note: The population is the complete collection of individuals, objects or numerical
data obtained by measurements and which are of interest (to the one collecting the
sample).
In statistics the population concept is fundamental. The population has to be carefully
defined and is considered completely defined only if the member list is specified.
The set of the Mathematics and Informatics' students is a well defined population.
Usually if we hear the word population, we think of a set of people. In statistics, the
population can be a set of animals, of manufactured objects or of numerical data
obtained through measurements.
For example, the set of the "heights" of the students of the Faculty
of Mathematics and Informatics is a population.
Note: The sample is a subset of a population.
A sample is made of individuals, objects or measured data selected from a
Population
Note:
A response variable (or simply variable) is a characteristic (usually a
numerical one) which is of interest for each element (individual) of a population.
Example: The age of the student, his grade point average, his hair color, his height,
are answer variables for the population: the students from the Faculty of
Mathematics and Informatics.
Note:
A simple random sample (SRS) is a sample for which each object in the
population has the same probability to be picked as any other object, and is picked
independently of any other object.
Sample mean and variance:
The easiest and most popular summary for a data set is its mean X . The
mean is a measure of location for the data set. We often need also a measure
of spread. One such measure is the sample standard deviation.
Sample variance and standard deviation:
The sample variance is denoted as S2 and equals to
n
S2 
( Xi  X )
i 1
2
n

 ( X i 2 )  nX
2
i 1
n 1
n 1
Sample standard deviation S is the square root of S2.
A little algebra may show that both expressions in the above formula are
equivalent. Denominator in the formula is n- 1 which is called degrees of
freedom. A simple explanation is that the calculation starts with n numbers
and is then constrained by finding X ., thus n-1 degrees of freedom are left.
Note that if n = 1 then the calculation of sample variance is not possible.
Example: The heights of last 8 US presidents are (in cm)k : 185, 182, 188, 188, 185,
177, 182, 193. Find the mean and standard deviation of these heights.
Solution: The average height is X . = 185. To make the calculations more
compact, let's subtract 180 from each number, as it will not affect the standard
deviation: 5, 2, 8, 8, 5,-3, 2, 13, and X . = 5. Then,  X i 2  364
We get S 2 
364  25  8
 23.43 and S=4.84
7
Statistical inference:
In previous sections we emphasized properties of the sample mean. In this
section we will discuss the problem of estimation of population parameters, in
general. A point estimate of some population parameter 𝜃 is a single value
𝜃̂ of a statistic. For example, the value X is the point estimate of population parameter
𝜇.
One of the basic problems is how to find an estimator of population
parameter 𝜃. The numerical value of this statistic is called
an estimate of 𝜃. The estimator of the parameter 𝜃 is denoted by 𝜃̂
Maximum Likelihood Method: The maximum likelihood method was first used by
Sir Ronald Fisher in 1922 (see Fisher (1922)) for finding estimator of a unknown
parameter.
However, the method originated in the works of Gauss and Bernoulli.
Lecturer note -34
Maximum likelihood method:
Let X1,X2, ...,Xn be a random sample from a population
X with probability density function f(x; 𝜃), where 𝜃 is an unknown parameter.
The likelihood function, L(𝜃), is the distribution of the sample
n
That is
L( )   f ( xi ; )
i 1
This definition says that the likelihood function of a random sample
X1,X2, ...,Xn is the joint density of the random variables X1,X2, ...,Xn.
The 𝜃 that maximizes the likelihood function L(𝜃) is called the maximum
likelihood estimator of 𝜃, and it is denoted by by 𝜃̂ .
Example:
If X1,X2, ...,Xn is a random sample from a distribution
with density function as follows. What is the maximum likelihood estimate of the
parameter 𝜃.
Solution: The likelihood function of the sample is given by
n
L( )   f ( xi ; )
i 1
So
Example: What is the basic principle of maximum likelihood estimation?
Solution: To choose a value of the parameter for which the observed data
have as high a probability or density as possible. In other words a maximum
likelihood estimate is a parameter value under which the sample data have
the highest probability.
The simple binomial model:
Lecturer note -35
Confidence interval:
The confidence interval (CI) or interval estimate is an interval within
which we would expect to find the “true" value of the parameter.
Interval estimates, say, for population mean, are often desirable because
the point estimate X . varies from sample to sample. Instead of a single
estimate for the mean, a confidence interval generates a lower and an upper
bound for the mean.
The interval estimate provides a measure of uncertainty in our estimate of the true
mean 𝜇. The narrower the interval, the more precise is our estimate.
Confidence limits are evaluated in terms of a confidence level.
Although the choice of confidence level is somewhat arbitrary, in practice 90%, 95%,
and 99% intervals are often used, with 95% being the most commonly used.
Interval Estimators and Confidence Intervals for Parameters:
The interval estimation problem can be stated as follow: Given a random
sample X 1,X 2, ...,Xn and a probability value 1 − 𝛼, find a pair of statistics
L = L(X1,X2, ...,Xn) and U = U(X1,X2, ...,Xn) with L ≤ U such that the
probability of 𝜃 being on the random interval [L, U] is 1 − 𝛼.
That is
P (L ≤ 𝜽 ≤ U) = 1 − 𝜶.
The random variable L is called the lower confidence limit and U is called the
upper confidence limit. The number (1−𝛼) is called the confidence coefficient
or degree of confidence.
The interval [l, u] will be denoted as an interval estimate of 𝜃 whereas the
random interval [L,U] will denote the interval estimator of 𝜃. Notice that
the interval estimator of 𝜃 is the random interval [L, U]. Next, we define the
100(1 − 𝛼)% confidence interval for the unknown parameter 𝜃.
CI for the mean:
If X is the mean of a random sample of size n from a normal population
with known variance 𝜎 2 , an approximate (1 -𝛼)100% confidence interval for


X C
   X C
whereC  Z 
𝜇 is given by
n
n
2

where Z is the Z-value leaving an area of
to the right.

2
2
Note: a 95% confidence interval does not mean that there is a 95%
probability that the interval contains the true mean. The interval computed from a given
sample either contains the true mean or it does not. Instead, the level of confidence is
associated with the method of calculating the interval. For example, for a 95%
confidence intererval, if many samples are collected and a confidence interval is
computed for each, in the long run about 95% of these intervals would contain the true
mean.
X 
Note: Using Central limit theory we have Z 

n
 
So
P( z


X   z


)  1 
2
2
n
Note: If 𝜎 is unknown, it can replaced by S, the sample standard deviation,
with no serious loss in accuracy for the large sample case. Later, we will
discuss what happens for small samples.
Example: The drying times, in hours, of a certain brand of latex paint are
3.4 2.5 4.8 2.9 3.6 2.8 3.3 5.6
3.7 2.8 4.4 4.0 5.2 3.0 4.8
Compute the 95% confidence interval for the mean drying time. Assume that
𝜎 = 1.
Solution: We compute X = 3.79 and z 1:96
2
(𝛼= 0.05, upper-tail probability = 0.025, table area = 0.5-0.025 = 0.475)
Then, using (8.1), the 95% C.I. for the mean is 3.79  1(1.96) / 15
Example: An important property of plastic clays is the amount of shrinkage on drying.
For a certain type of plastic clay 45 test specimens showed an average shrinkage
percentage of 18.4 and a standard deviation of 1.2. Estimate the \true"
average shrinkage 𝜇 for clays of this type with a 95% confidence interval.
Solution: For these data, a point estimate of 𝜇 is X = 18.4. The sample
standard deviation is S = 1:2. Since n is fairly large, we can replace 𝜎 by S.
Hence, 95% confidence interval for 𝜇 is
1.2
1.2
18.4  1.96
   18.4  1.96
 (18.05,18.75)
45
45
Thus we are 95% confident that the true mean lies between 18.05 and 18.75
Example: The average zinc concentration recovered from a sample of zinc
measurements in 36 different locations in the river is found to be 2.6 milligrams per liter.
Find the 95% and 99% confidence intervals for the mean zinc concentration
𝜇. Assume that the population standard deviation is 0.3.Solution: The point estimate of
𝜇 is X = 2:6. For 95% confidence, z =1.96. Hence, the 95% confidence interval is
2
0.3
0.3
   2.6  1.96
 (2.50, 2.70) Similarly 99% confidence interval
36
36
0.3
0.3
2.6  2.575
   2.6  2.575
 (2.47, 2.73)
36
36
2.6  1.96
Lecturer note -36
Confidence interval continued and testing of hypothesis:
Sample size calculations:
In practice, another problem often arises: how many data should be collected
to determine an unknown parameter with a given accuracy? That is, let m be
the desired size of the margin of error, for a given confidence level 100%(1-𝛼) we have

m   z
n
2
Since CI has the following structure X - m where m is called margin of error.
Our question ,What is the sample size n to achieve this goal?
To do this, assume that some estimate of 𝜎 is available
z 
Then, solving for n, we have
n  ( 2 )2
m
Example: We would like to estimate the pH of a certain type of soil to within 0.1,
with 99% confidence. From past experience, we know that the soils of this
type usually have pH in the 5 to 7 range. Find the sample size necessary to
achieve our goal.
Solution: Let us take the reported 5 to 7 range as the ±2𝜎 range. This
way, the crude estimate of 𝜎is (7 -5)/4 = 0.5. For 99% confidence, we

find the upper tail area
= (1 – 0.99)/2 = 0.005, thus z = 2.576, and
2
2
n = (2.576 x 0.5/0.1)2 = 166
Example: In sampling from a nonnormal distribution with a variance
of 25, how large must the sample size be so that the length of a 95% confidence
interval for the mean is 1.96 ?
Solution: The confidence interval when the sample is taken from a normal


population with a variance of 25 is X  C
   X C
whereC  Z 
n
n
2
Thus the length of the confidence interval is
2
25
 n  100
n
n
2
So far, we have discussed the method of construction of confidence interval
for the parameter population mean when the variance is known. It is
very unlikely that one will know the variance without knowing the population
mean, and thus what we have treated so far in this section is not very
realistic. Now we treat case of constructing the confidence interval for population
mean when the population variance is also unknown.
l  2 z
 1.96  2(1.96)
Note: Suppose X1,X2, ...,Xn is random sample from a normal population X
with mean μ and variance S2 > 0. Let the sample mean and sample variances
be X and S2 respectively. Then the 100(1 − 𝛼)% confidence interval for μ when the
population X is normal with the unknown variance 𝜎 2 is given by
 S 
 S 
[X  
 t (n  1), X  
 t (n  1)
 n 2
 n 2
Where t is the t distribution with (n-1) degrees of freedom.
Example
A random sample of 9 observations from a normal population yields the observed
1 9
statistics x = 5 and  ( x i  x) 2  36 What is the 95% confidence interval for μ ?
8 i 1
Solution: Since n = 9 x = 5
S2 = 36 and 1 − 𝛼 = 0.95,
the 95% confidence interval for μ is given by [5  (
6
6
)t0.025 (8),5  ( )t0.025 (8)]
9
9
6
6
)2.306,5  ( )2.306]  [0.388,9.612]
9
9
Confidence Interval for Population Variance
Let X1,X2, ...,Xn be a random sample from a normal population X
with known mean μ and unknown variance 𝜎 2 We would like to construct
a 100(1 − 𝛼)% confidence interval for the variance𝜎 2 , that is, we would like
to find the estimate of L and U such that P( L   2  U )  1  
Therefore, the (1 − 𝛼)% confidence interval for 𝜎 2 when mean is known is
 [5  (
Example: A random sample of 9 observations from a normal pop-
Solution:
Lecturer note -37
Testing of hypothesis:
Statistical hypotheses
A Statistical hypothesis is an assertion or conjecture concerning one or more
populations.
The goal of a statistical hypothesis test is to make a decision about an
unknown parameter (or parameters). This decision is usually expressed in
terms of rejecting or accepting a certain value of parameter or parameters.
Some common situations to consider:
Is the coin fair? That is, we would like to test if p = 1/2 where
p = P(Heads).
Is the new drug more effective than the old one? In this case, we would
like to compare two parameters, e.g. the average effectiveness of the
old drug versus the new one.
In making the decision, we will compare the statement (say, p = 1/2) with
the available data and will reject the claim p = 1/2 if it contradicts the data.
In the subsequent sections we will learn how to set up and test the hypotheses
in various situations.
Null and alternative hypotheses
A statement like p = 1/2 is called the Null hypothesis (denoted by H0). It
expresses the idea that the parameter (or a function of parameters) is equal
to some fixed value. For the coin example, it's
H0 : p = 1/2
and for the drug example it's
H0 : 𝜇1 = 𝜇2
where 𝜇1 is the mean effectiveness of the old drug compared to 𝜇2 for the
new one. Alternative hypothesis (denoted by HA) seeks to disprove the
null.
For example, we may consider two-sided alternatives
HA : p ≠ 1/2 or, in the drug case, HA : 𝜇1 ≠ 𝜇2
Hypothesis tests of a population mean
A null hypothesis H0 for the population mean 𝜇 is a statement that designates
the value 𝜇0 for the population mean to be tested. It is associated with
an alternative hypothesis HA, which is a statement incompatible with the
null.
A two-sided (or two-tailed) hypothesis setup is
H0 : 𝜇 = 𝜇0 versus HA : 𝜇 ≠ 𝜇0 for a specified value of 𝜇0 and a one-sided (or one-tailed)
hypothesis setup is either
H0 : 𝜇 = 𝜇0 versus HA : 𝜇 > 𝜇0 (right-tailed test)
or
H0 : 𝜇 = 𝜇0 versus HA : 𝜇 < 𝜇0 (left-tailed test)
Error in testing Hypothesis
There are two source of error in hypothesis testing.We can either commit an error of
type I or error of type II. The first arises if we reject the null hypothesis even though it is
true where as the second refers to the event of failing to reject a false null.
So we have
A type I error: occurs when H0 is true and H0 is rejected
A type II error: occurs when H0 is false but we fail to reject it
The probability of the type I error is called 𝛼 and the one of the type II error, 𝛽:
The level of significance is the probability 𝛼 to commit a type I error, that is to reject H0
when it is true. Currently 𝛼is given from the beginning, and it determines the critical
region.
For the example, if 𝛼 = 0.033, then from P(x ≥ 5) = 0.0327 we have that the critical
region is x = 5, 6, 7, 8, 9, 10.
Note: The critical region is the set of values (W) for which P(X ∈ W) ≥ 𝛼 and it
determines us to reject the null hypothesis H0.
The critical value is the first value from the critical region.
Hypothesis test:
A classical approach
In this section we present the hypothesis testing for assertions regarding the mean ¹ of
a population. To simplify this presentation, we first suppose that the standard deviation
𝜎 of the population is known.
The following three examples refer to different formulations of the null hypothesis H0
and of the alternative hypothesis HA
Example: An ecologist claims that Timi'soara has an air pollution problem.
Specifically, he claims that the mean level of carbon monoxide in downtown air is
higher than 4; 9/`106= the normal mean value.
Solution: To formulate the hypotheses H0 and HA, we have to identify: the population,
the population parameter in question and the value to which it is being compared.
In this case, the population can be the set of the downtown Timisoara inhabitants.
The variable X is the carbon monoxide concentration, whose values x vary according to
the location, and the population parameter is the mean value 𝜇 of this variable.
The ecologist makes an assertion concerning
the value of 𝜇This value can be: 𝜇< 4; 9/`106 or 𝜇> 4; 9/`106 or. 𝜇= 4; 9/`106
The ecologist claims that 𝜇> 4; 9/`106 . To formulate H0 and HA, we remind that:
1) generally, H0 states that the mean 𝜇(parameter in question) has a specified value.
2) The inference regarding the mean 𝜇 of the population is based on the mean of a
sample, and the sample means are approximatively normally distributed. (according to
the central mean theorem).
3) A normal distribution is completely determined if the mean value and the standard
deviation of the distribution are known.
All these suggest that 𝜇= 4; 9/`106 should be the null hypothesis and 𝜇> 4; 9/`106 the
alternative hypothesis:
Recall that once the null hypothesis is stated, we proceed with the hypothesis test that
H0 is true. So H 0 :   4,9 /106
If we admit that the statement 𝜇= 4; 9/`106 or 𝜇<4; 9/`106 is the null hypothesis H0,
then:
H0:
𝜇 ≤4; 9/`106
HA:
𝜇>4; 9/`106
The equal sign must always be present in the null hypothesis.
Steps of a Hypothesis Test
The value of 𝛼 is called level of significance, and it represents the risk (probability) of
rejecting H0 when it is actually true. We cannot determine whether H0 is true or
false. We can only decide whether to reject it or to accept it.
The probability with which we reject the true hypothesis is 𝛼 but we do not know the
probability with which we make a wrong decision. A type I error and a decision error are
two different things.
Lecturer note -38
Testing of hypothesis Continued and Chi square test:
Example: It has been claimed that the mean weight of women students at a college is
𝜇= 54; 4 kg, and the standard deviation 𝜎= 5; 4 kg. The sports professor does not
believe this statement. To test the claim, he makes a random sample of size 100 among
the women students and finds the mean X = 53; 75 kg. Is this sufficient evidence to
reject the statement at the significance level 𝛼 = 0; 05?
Solution:
Statistical inference about the population mean when the
standard deviation is not known
This section deals with inferences about the mean ¹ when the standard deviation
𝜎 is unknown.
If the sample size is sufficiently large (generally talking, samples of size greater than n =
30 are considered sufficiently large), the sample standard deviation s is a good estimate
of the standard deviation of the population and we can substitute 𝜎 with s in the already
discussed procedure.
If the population investigated is approximately normal and n ·≥ 30, we will base our
procedure on the Student's t distribution.
The Student's t distribution (or simple, the t distribution) is the distribution of the t
x 
statistic which is defined as: t 
s
n

The degrees of freedom, df, is a parameter that is difficult to defne. It is an index used to
identify the correct distribution to be used. In our considerations df = n - 1, where n is
the sample size. The critical value of the test t that we should use either in the
estimation of the confidence interval or in the hypothesis test is obtained from the
above given table. In order to
obtain this value we need to know:
1) df - the degrees of freedom;
2) the 𝛼 area determined by the distribution curved situated to the right of the critical
value.
We will denote this value t(df; 𝛼).
Example: Let us return to the example concerning the air pollution; the ecologist's point
of view: "the level of carbon monoxide in the air, is higher than4; 9/`106. Does a sample
of 25 readings with the mean X = 5; 1/106 and s = 2; 1/106 present sufficient evidence
to sustain the statement? We will use the level of significance 𝛼 = 0; 05.
Solution:
Example:
Solution:
Lecturer note -39
Chi square distribution and chi square test for goodness of fit:
Since the normal population is very important in statistics, the sampling
distributions associated with the normal population are very important. The
most important sampling distributions which are associated with the normal
population are the followings: the chi-square distribution.
Chi-square distribution
A continuous random variable X is said to have a chisquare
distribution with r degrees of freedom if its probability density function
is of the form
Example:
If X  2 (7) , then what are values of the constants a and
b such that P(a < X < b) = 0.95?
Solution: Since 0.95 = P(a < X < b) = P(X < b) − P(X < a),
we get P(X < b) = 0.95 + P(X < a).
We choose a = 1.690, so that P(X < 1.690) = 0.025.
From this, we get P(X < b) = 0.95 + 0.025 = 0.975
Thus, from chi-square table, we get b = 16.01.
GOODNESS OF FITS TESTS:
In point estimation, interval estimation or hypothesis test we always
started with a random sample X1,X2, ...,Xn of size n from a known distribution.
Goodness of fit tests are performed to validate experimenter opinion
about the distribution of the population from where the sample is drawn.
The most commonly known and most frequently used goodness of fit tests
are the Kolmogorov-Smirnov (KS) test and the Pearson chi-square  2 test.
A test for goodness-of fit, that is, how well do the observed counts Xi fit a given
distribution.
Chi-square goodness-of-fit test
This is a test for the fit of the sample proportions to given numbers. Suppose
that we have observations that can be classified into each of k groups (categorical
data). We would like to test
Example:
we reject H0 and claim that the earthquake frequency does change during
the week.
Example: A die was rolled 30 times with the results shown below:
Number of spots 1 2 3 4 5 6
Frequency (xi) 1 4 9 9 2 5
If a chi-square goodness of fit test is used to test the hypothesis that the die
is fair at a significance level 𝛼 = 0.05, then what is the value of the chi-square
statistic and decision reached?
Solution: In this problem, the null hypothesis is
Ho : p1 = p2 = · · · = p6 =1/6
The alternative hypothesis is that not all pi’s are equal to 1/6. The test will
be based on 30 trials, so n = 30. The test statistic
Example: It is hypothesized that an experiment results in outcomes
K, L, M and N with probabilities 1/5, 3/10 , 1/10 and 2/5 respectively. Forty
independent repetitions of the experiment have results as follows:
Outcome K
L M N
Frequency 11
14 5 10
significance level 𝛼 = 0.01, then what is the value of the chi-square statistic
and the decision reached?
Solution;
=3.958<𝝌𝟐 =11.35,accept
Lecturer note -40
Co rrelation:
In statistics, there occur problems of the following type: for the same population we
have two sets of data corresponding to two distinct variables and the question arises
whether there is a relationship between those two variables? If the answer is yes, what
is that relationship? How are these variables correlated? The relationships discussed
here are not necessary of the type cause-and-effect. They are mathematic relationships
which predict the behavior of one variable from knowledge about the second variable.
Here we have some examples:
The students spend their time at the university, learning or taking exams. The question
arises whether the more they study, the higher grades they will have.
The problems from the example require the analysis of the correlation between two
variables.
When for a population we have two sets of data corresponding to two distinct variables,
we form the pairs (x; y), where x is the value of the first variable and y is the value of the
second one.
For example, x is the height and y is the weight.
An ordered pair (x; y) is called bivariate data.
Traditionally, the variable X (having the values x) is called input variable (independent
variable), and the variable Y (having the values y) is called output variable (dependent
variable).
The input variable X is the one measured or controlled to predict the variable Y .
In problems that deal with the analysis of the correlation between two variables, the
sample data are presented as a scatter diagram.
Note: A scatter diagram is the graphical representation of the pairs of data in
an orthogonal coordinate system. The values x of the input variable X are represented
on the X axis, and the values y of the output variable Y are represented on the y axis.
Note: The primary purpose of the correlation analysis is to establish a relationship
between the two variables.
Note: If for the increasing values x of the input variable X there is no definite
displacement of the values y of the variable Y , we then say that there is no correlation
or no relationship between X and Y .
Note: If for the increasing values x of the input variable X there is a definite
displacement of the values y of the variable Y , we then say that there is a correlation.
We have a positive correlation if y tends to increase, and we have a negative correlation
if y tends to decrease while x increases.
Note: If the pairs (x; y) tend to follow a line, we say that we have a linear
correlation. If all the pairs (x; y) are on a line (that is not horizontal nor vertical) we
say that we have a perfect linear correlation.
Coefficient of correlation
The coefficient of linear correlation r measures the strength of the
linear correlation between the two variables. It reflects the consistency of the effect that
a change in one variable has on the other.
Note: The value of the coefficient of linear correlation r allows us to formulate an
answer to the question: is there a linear correlation between the two considered
variables? The coefficient of linear correlation r has a value between -1 and +1. The
value r = +1 signifies a perfect positive correlation, and the value r = -1 signifies a
perfect negative correlation.
Note:
The coefficient of linear correlation r for a sample, is by definition:
Note: In also another way we find the correlation coefficient as
Or
Example: Determine the linear correlation coefficient r for a random sample of size 10 if
the data table is
Solution:
Lecturer note -41
Regression and fitting of straight lines:
If the value of the linear correlation coefficient r indicates a high linear correlation, there
arises the problem of the establishment of an exact numerical relationship. This exact
relationship is obtained by linear regression.
Generally, the statistician looks for an equation to describe the relationship between the
two variables. The chosen equation is the best fitting of the scatter diagram. The
equations found are called prediction equations. Here are some examples of such
equations:
y = b0 + b1 x - linear
y = a + b x + c x2 - quadratic
The linear regression establishes the mean linear dependency of y in
terms of x.
Next, we shall describe how to establish the best linear dependency for a set of data
(x;y).
If a straight-line relationship seems appropriate, the best-fitting straight line is found by
using the method of least squares.

Suppose that y = b0 + b1 x is the best linear relationship. The least squares method
requires

that b0 and b1 are such that  ( y  y)2 is minimum.
From Fermat's theorem we have that the minimum values of the function:
F (b0 , b1 )   ( y  b0  b1 x)2 are obtained for
Where b1 is the slope and b0 is the y intercept
To determine the slope b1 we also use the equivalent formula
Example:
Solution:
Example: For a sample of 10 individuals let us consider following set of data
Find the regression line
Solution: