Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction to biostatistics Lecture plan Basics Variable types Descriptive statistics: 1. 2. 3. Categorical data Numerical data Inferential statistics 4. Confidence intervals Hipotheses testing 1 DEFINITIONS STATISTICS can mean 2 things: - the numbers we get when we measure and count things (data) - a collection of procedures for describing and anlysing data. BIOSTATISTICS – application of statistics in nature sciences, when biomedical and problems are analysed. 2 Why do we need statistics? ???? 3 Basic parts of statistics: Descriptive Inferential 4 Terminology Population Sample Variables 5 Variable types Categorical (qualitative) Numerical (quantitative) Combined 6 Categorical data Nominal 2 categories >2 categories Ordinal 7 Numerical data Continuous Discrete 8 Description of categorical data Arranging data Frequencies, tables Visualization (graphical presentation) 9 Frequencies and contingency tables From those who were unsatisfied 4 were males, 6 were females. Total Males Females Satisfied 40 80% 14 26 77,8% 81,3% Unsatisfied 10 20 % 4 6 22,2% 18,7% Total 50 100% 18 100% 32 100% 10 Graphical presentation Lyčių struktūra Lietuvoje 1993 m. Lyčių struktūra Lietuvoje 1991 m. vyrų vyrų moterų moterų 11 Graphical presentation Lyčių struktūra Lietuvoje 54% 53% 52% 51% 50% 49% 48% 47% 46% 45% 44% vyrų moterų 1993 m. 1996 m. 12 Graphical presentation Lyčių struktūra Lietuvoje 120% 100% 80% moterų 60% vyrų 40% 20% 0% 1993 m. 1996 m. 13 Graphical presentation 100% 80% 60% 40% 20% J01A Tetraciklinai J01C Penicilinai J01D Kiti β-laktam iniai antibiotikai J01E Sulfonam idai ir trim etoprim as J01F Makrolidai, linkozam idai, s treptogram inai J01M Chinolonai Lietuva Slovenija Slovakija Rusija Norvegija Airija Prancūzija Suomija Švedija Danija Kroatija 0% J01X Kiti 14 Graphical presentation •Other: - Maps - Chernoff faces - Star plots, etc. 15 Description of numerical data Arranging data Frequencies (relative and cumulative), graphical presentation Measures of central tendency and variance Assessing normality 16 Grouping Sorting data Groups (5-17 gr.) according researcher’s criteria. To assess distribution, for graphical presentation in excel 17 Frequencies, their comparison and calculation number of litas 197 students were asked about the amount of money (litas) they had in cash at the moment. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Total Frequency n 1 2 4 8 15 24 29 31 29 24 15 8 4 2 1 197 Cumulative frequency % n % 0,5 1 0,5 1,0 1+2=3 1,5 2,0 3+4=7 3,6 4,1 7+8=15 7,6 7,6 15+15=30 15,2 12,2 30+24=54 27,4 14,7 54+29=83 42,1 15,783+31=114 57,9 14,7 114+29=143 72,6 12,2 143+24=167 84,8 7,6 167+15=182 92,4 4,1182+8=190 96,4 2,0190+4=194 98,5 1,0194+2=196 99,5 0,5196+1=197 100,0 18 100,0 Gaphical presentation of frequencies 19 Normal distributions Most of them around center Less above and lower central values, approximately the same proportions Most often Gaussian distribution 20 Not normal distributions More observations in one part. 21 Asymmetrical distribution 22 How would you describe/present your respondents if the data are numeric? 2 groups of measures: 1. Central tendency (central value, average) 2. Variance 23 MEASURES OF CENTRAL TENDENCY Means/averages (arithmetic, geometric, harmonic, etc.) Mode Median Quartiles 24 MEASURES OF CENTRAL TENDENCY Arithmetic mean (X, μ) 25 MEASURES OF CENTRAL TENDENCY Median (Me) – the middle value or 50th procentile (the value of the observation, that divides the sorted data in almost equal parts). It is found this way n 1 2 When n odd: median is the middle observation When n even: median is the average of values of two middle observations 26 MEASURES OF CENTRAL TENDENCY Mode (Mo) – the most common values Can be more than one mode 27 MEASURES OF CENTRAL TENDENCY Quartiles (Q1, Q2, Q3, Q4) – sample size is divided into 4 equal parts getting 25% of observations in each of them. 28 Is it enough measure of central tendency to describe respondents? 29 MEASURES OF VARIANCE Min and max Range Standard deviation – sqrt of variance (SD) Variance - V= ∑(xi - x)2/n-1 Interquartile range (Q3-Q1 or 75%25%) IQRT 30 What measures are to be used for sample description? If distribution is NORMAL Mean Variance (or standard deviation) If distribution is NOT NORMAL Median IQRT or min/max Those measures are used also with numeric ordinal data 31 X, Mo, Me Mean~Median~Mode, SD ir empyric rule 32 EMPYRICAL RULE Number of observations (%) 1, 2 ir 2.5 SD from mean if distribution is normal 33 Example X=8 SD=2,5 -2SD X +2SD 34 Normality assessment Summary Graphical Comparison of measures of central tendency; empyrical rule (mean and standard deviation) Skewness and kurtosis (if Gaussian =0) Kolmogorov-Smirnov test 35 Boxplot 75th Procentile 75th Procentile Mean( *) Median 25th Procentile 25th Procentile Outliers Boxplot example 26,00 24,67 mokinio_ses1_mea n 23,33 22,00 20,67 19,33 18,00 16,67 15,33 14,00 440 Central limit theorem Inferential statistics Confidence intervals Hipotheses testing 39 Confidence intervals Interval where the “true” value most likely could occur. 40 The variance of samples and their measures X2, SD2; p2 X1, SD1; p1 X3, SD3; p3 X4; SD4; p4 X μ, σ, p0 41 The variance of samples and confidence intervals μ, p0 42 Confidence interval Statistical definition: If the study was carried out 100 times, 100 results ir 100 CI were got, 95 times of 100 the “true” value will be in that interval. But it will not appear in that interval 5 times of 100. 43 Confidence intervals (general, most common calculation) 95% CI : X ± 1.96 SE Xmin; Xmax Note: for normal distribution, when n is large 95% CI : p ± 1.96 SE pmin ; pmax Note: when p ir 1-p > 5/n 44 Standard error (SE) Numeric data (X ) Categorical data (p) 45 Width of confidence inerval depends on: a) Sample size; b) Confidence level (guaranty - usually 95%, but available any %); c) dispersion. 46 Hipotheses testing H0: μ1=μ2; p1=p2; (RR=1, OR=1, difference=0) HA: μ1≠μ2; p1≠p2 (two sided, one sided) 47 Hipotheses testing Significance level α (agreed 0.05). Test for P value (t-test, χ2 , etc.). P value is the probability to get the difference (association), if the null hypothesis is true. OR P value is the probability to get the difference (association) due to chance alone, when the null hypothesis is true. 48 Statistical agreements If P<0.05, we say, that results can’t be explained by chance alone, therefore we reject H0 and accept HA. If P≥0.05, we say, that found difference can be due to chance alone, therefore we don’t reject H0. 49 Tests Test depends on Study design, Variable type distribution, Number of groups, etc. Tests (probability distributions): z test t test (one sample, two independent, paired) Χ2 (+ trend) F test Fisher exact test Mann-Whitney Wilcoxon and others. 50 Inferential statistics Summary P value tells, if there is statistically significant difference (association). CI gives interval where true value can be. 51 Inferential statistics Summary Neither P value, nor CI give other explanations of the result (bias and confounding). Neither P value, nor CI tell anything about the biological, clinical or public health meaning of the results. 52