Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistics for Analytical Chemistry Reading –lots to revise and learn     Chapter 3 Chapter 4 Chapter 5-1 and 5-2 Chapter 5-3 will be necessary background for the AA lab  Chapter 5-4 we will use later Data Analysis  Most data quantitative - derived from measurements  Never really know error  With more measurements you get a better idea what it might be  Don’t spend a lot of time on an answer -where only 20% accuracy is required -or where sampling error is big although you don’t want to make the error worse Significant Figure Convention  Final answer should only contain figures that are certain, plus the first uncertain number  eg 45.2%  error less than 1% or we would only write 45%  error larger than 0.05% or would write 45.23% Remember     Leading zeros are not significant Trailing zeros are significant 0.06037 - 4 significant figures 0.060370 - 5 significant figures  1200 ????  12 x 102 - 2 significant figures Rounding Off  Round a 5 to nearest even number  4.55 to 4.6  Carry an extra figure all through calculations  BUT NOT 6 EXTRA  Just round off at the end Adding  Absolute uncertainty of answer must not exceed that of most uncertain number  Simple rule: Decimal places in answer = decimal places in number with fewest places 12.2 00.365 01.04 13.605 goes to 13.6 When errors are known  Rr =(A a) + (B b) + (C c)  where r2 = a2 + b2 + c2  Example: Calculate the error in the MW of FeS from the following atomic weights:  Fe:55.847 0.004 S:32.064 0.003  r = (0.0042 + 0.0032)1/2  MW = 87.911 0.005 Multiplication and Division  Simplest rule: Sig figs in answer = smallest number of sig figs in any value used  This can lead to problems - particularly if the first digit of the number is 9.  1.07400 x 0.993 = 1.07  1.07400 x 1.002 = 1.076  Error is ~ 1/1000 therefore 4 significant figs in answer Multiplication and Division  The relative uncertainty of the answer must fall between 0.2 and 2.0 times the largest relative uncertainty in the data used in the calculation.  Unless otherwise specified, the absolute uncertainty in an experimental measurement is taken to be +/- the last digit Multiplication and Division  With known errors - add squares of relative uncertainties  r/R = [(a/A)2 + (b/B)2 +(c/C)2]1/2 Logs  Only figures in the mantissa (after the decimal point) are significant figures  Use as many places in mantissa as there are significant figures in the corresponding number  pH = 2.45 has 2 sig figs Definitions  Arithmetic mean, (average)  Median -middle value  for N=even number, use average of central pair Accuracy  Deviation from true answer  Difficult to know  Best way is to use Reference standards  National Bureau of Standards  Traceable Standards Precision  Describes reproducibility of results  What is used to calculate the confidence limit  Can use deviation from mean  or relative deviation  0.1/5 x 1000 = 20ppt (parts per thousand)  0.1/5 x 100% = 2% Precision of Analytical Methods       Absolute standard deviation s or sd Relative standard deviation (RSD) Standard deviation of the mean sm Sm = s/N½ Coefficient of variation (CV) s/x x 100% Variance s2 Standard Curve Readout Not necessarily linear. Linear is mathematically easier to deal with. 15 10 5 0 y = 1.9311x + 1.1127 2 R = 0.9888 0 1 2 3 4 5 [Ca] (mg/L) 6 7 Correlation coefficients  Show how good a fit you have.  R or R2  For perfect correlation, R = 1, R2 = 1 [( xi  x)( yi  y)] R  2 2 ( xi  x) ( yi  y) 2 2 LINEST  Calculates slope and intercept  Calculates the uncertainty in the slope and the intercept  Calculates R2  Calculates s.d. of the population of y values  See page pp 68-72, Harris. Use these values to determine the number of sig figs for the slope and intercept Dealing with Random Errors Indeterminate Error  Repeating a coarse measurement gives the same result  eg weighing 50 g object to nearest g - only error would be determinate - such as there being a fault in the balance  If same object was weighed to several decimal places -get random errors How many eggs in a dozen?  How wide is your desk?  Will everyone get the same answer?  What does this depend on? With a few measurements, the mean won’t reflect the true mean as well as if you take a lot of measurements Random errors  With many measurements, more will be close to the mean  Various little errors add in different ways  Some cancel - sometimes will all be one way  A plot of frequency versus value gives a bell curve or Gaussian curve or normal error curve  Errors in a chemical analysis will fit this curve Equation for Gaussian Curve e y  2  ( xi  u ) 2 2 2 Let z  xi  u  Then y e z 2 2  2 If z is abscissa (x axis)  Same curve is always obtained as z expresses the deviation from the mean in units of standard deviation Statistics  Statistics apply to an infinite number of results  Often we only do an analysis 2 or 3 times and want to use the results to estimate the mean and the precision 6868.3%: ±1 , 95.4%: ±2 , 99.7%: ±3 Standard deviation  68.3% of area is within ± 1 of mean  95.5% of area is within ± 2 of mean  99.7% of area is within ± 3 of mean  For any analysis, chances are 95.5 in 100 that error is ± 2  Can say answer is within  ± 2 with 95.5% confidence For a large data set  Get a good estimate of the mean,    i N i 1 ( xi  u ) N 2  Know this formula -but use a calculator  2 = variance  Useful because additive Small set of data  Average (x )    An extra uncertainty  The standard deviation calculated will differ for each small set of data used  It will be smaller than the value calculated over the larger set  Could call that a negative bias s i N s  (x i 1 i  x) N 1 2  For  use N in denominator  For s use N-1 in denominator (we have one less degree of freedom - don’t know )  At end, round s to 2 sig figs or less if there are not enough sig figs in data Confidence Interval  We are doing an analysis to find the true mean  - it is unknown  What we measure is x but it may not be the same as   Set a confidence limit eg 4.5 ± 0.3 g  The mean of the measurements was 4.5 g  The true mean is in the interval 4.2-4.8 with some specified degree of confidence Confidence limit  A measure of the reliability (Re)  The reliability of a mean (x ) increases as more measurements are taken  Re = k(n)1/2  Reliability increases with square root of number of measurements  Quickly reach a condition of limiting return Reliability  Would you want a car that is 95% reliable?  How often would that break down? Confidence Interval  For 100 % confidence - need a huge interval  Often use 95 %  The confidence level chosen can change with the reason for the analysis Confidence Interval when s ~   µ ± xi = 1.96  for 95 % confidence  z = (xi - µ)/  =1.96  Appropriate z values are given as a table  This applies to a single measurement  The confidence limit decreases as (N)1/2 as more measurements are taken Confidence Interval  In the lab this year I will make you go home before you can get enough data for s to =   Therefore we will have to do a different kind of calculation to estimate the precision. Student’s t-test The Student's t-Test was formulated by W. Gossett in the early 1900's. His employer (brewery) had regulations concerning trade secrets that prevented him from publishing his discovery, but in light of the importance of the t distribution, Gossett was allowed to publish under the pseudonym "Student". The t-Test is typically used to compare the means of two populations t-test ( xi  u ) t s  t depends on desired confidence limit  degrees of freedom (N-1) Degrees of Values of t for Various degrees of Freedom Probability 80% 90% 95% 99.9% 1 3.08 6.31 12.7 637 2 1.89 2.92 4.30 31.6 3 1.64 2.35 3.18 12.9 4 1.53 2.02 2.78 8.60 5 1.48 1.94 2.57 6.86 6 1.44 1.90 2.45 5.96 7 1.42 1.86 2.36 5.40 8 1.40 1.83 2.31 5.04  1.29 1.64 1.96 3.29 For practical purposes  Assume  = s if you have made 20 measurements  Sometimes  can be evaluated for a particular technique rather than for each sample  Usually too time consuming to do 20 replicate measurements on each sample CONFIDENCE ts   x  N Example  Cal Culator obtained the following results for replicate determinations of calcium in limestone  14.35%, 14.41%, 14.40%, 14.32%, 14.37%  each is xi  Calculate the confidence interval Answer         Average = 14.37 % S = 0.037% Choose a 95 % confidence limit Degrees of freedom = N-1 = 5-1 =4 From t-table, t = 2.78 14.37% ± ts/N½ 14.37 % ± 2.78 x 0.037% / 5 ½ 14.37 ± 0.05 % Significant figures  I say: Use two or less significant figures in a confidence limit. Then use the same number of decimal places in both (guided by the CL)  When less than two sig figs in the CL?  When using two would require you to have more decimal places than were in the actual data. The bunny gave up Pooled standard deviation s (n1  1)  s (n2  1)  ...... sp  N  ns 2 1 2 2 ns  no of groups of samples i  n1 sp  i  n2  (x  x )   (x  x i 1 2 i 1 i 1 n1  n2  2 i 2 ) 2 Comparison of Means  We analyze several samples and want to know if they are the same or different  For each sample we take several measurements and obtain a mean 2 1 2 2 s s If x1  x 2  t  n1 n2 there is no significan t difference Comparing two means Compare x1  x2 to ts1 ts2  n1 n2 If s is a pooled sd x1  x 2 tcalc  s n1n2 n1  n2 If tcalc  ttable then the difference is not significan t at the chosen CL Comparing two means s1 2 s2 2   x  t. ( ) ( ) n1 n2 If s is the pooled s x1  x 2 t s n1n2 n1  n2 If tcalc  ttable then the difference is not significan t at the chosen CL Example  Two barrels of wine were analyzed for their alcohol content to determine whether or not they were from different sources:  12.61% (6 analyses),  12.53% (4 analyses)  Pooled standard deviation = 0.07 % 12.61%  12.53% 6 * 4 t 1.77 0.07% 64     Degrees of freedom = 6+4-2=8 t at 95% CL for 8 deg of freedom =2.3 tcalc < ttable therefore difference is not significant at the 95% CL – the two samples are the same at the 95% CL Rejection of data- Q Test  Qexp= questionable value-nearest numerical value  range  Look up Table of Qcritical  If Qexp < Qcritical, keep the point  If more observations are taken it is easier to determine if a point is an outlier Calibration Sensitivity  The slope of the calibration curve at the concentration of interest  Doesn’t take precision into account Analytical Sensitivity  Slope/s.d. = m/s.d.  Where s = standard deviation of the signal  Analytical sensitivity is independent of gain, but can vary with the concentration as s can depend on concentration Limit of detection  The minimum concentration detectable at a known confidence level  Is the concentration corresponding to the lowest usable reading (LUR)  LUR = average blank + k s.d.blank  k determines the confidence level  We use k = 3 for a 95% C.L.  Do not confuse LOD and LUR Harris page 103  LUR corresponds to Signal detection limit  LOD corresponds to Concentration detection limit  When doing this in lab WE CHEAT  We should have 20 measurements of the blank and we never do because of time constraints. To publish a result or for a paying client, we would need 20. Readout 8 4 y = 1.9311x + 1.1127 2 R = 0.9888 0 0 1 2 3 4  Ideally, the average blank = b (the intercept)  However, if b > average blank, then recalculate LUR using LUR = b + k s.d.blank  Usually say LUR = b + 3 sd  LOD = 5.2 mg/L (k = 3)  Note the 2 significant figures Quality Assurance     Begins with sampling Calibration Check Run standards every few samples. Reference standards are of known concentration. Do you get the right answer?  Include in Table of Results.  SOP’s are very important SOP (Standard operating procedure)  Set of written instructions that document a routine or repetitive activity which is followed by employees in an organization.  The development and use of SOPs is an integral part of a successful quality system.  Provides information to perform a job properly and consistently in order to achieve pre-determined specifications and quality.  http://people.stfx.ca/tsmithpa/Chem361/ Numerical Criteria for Selecting Analytical Methods       Precision Bias Sensitivity Detection Limit Concentration Range Selectivity Other characteristics to be considered      Speed Ease and convenience Skill required of operator Cost and availibility of equipment Per-sample-cost Criterion Figure of Merit Precision Absolute sd, relative sd, coefficient of variation, variance Bias Absolute systematic error, relative systematic error Sensitivity Calibration sensitivity, analytical sensitivity Limit of detection Av.Blank + 3 sd blank Concentratio n range LOQ to LOL (limit of linearity) Selectivity Coefficient of selectivity