Download Descriptive Stats: Types of Data (Scales of Measurement)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Introduction to biostatistics
Lecture plan
Basics
Variable types
Descriptive statistics:
1.
2.
3.


Categorical data
Numerical data
Inferential statistics
4.


Confidence intervals
Hipotheses testing
1
DEFINITIONS
STATISTICS can mean 2 things:
- the numbers we get when we measure and
count things (data)
- a collection of procedures for describing and
anlysing data.
BIOSTATISTICS – application of statistics in
nature sciences, when biomedical and
problems are analysed.
2
Why do we need statistics?

????
3
Basic parts of statistics:


Descriptive
Inferential
4
Terminology

Population
Sample

Variables

5
Variable types

Categorical (qualitative)

Numerical (quantitative)

Combined
6
Categorical data
Nominal


2 categories
>2 categories
Ordinal
7
Numerical data


Continuous
Discrete
8
Description of categorical data
Arranging data
 Frequencies, tables
 Visualization (graphical presentation)

9
Frequencies and contingency
tables
From those who
were
unsatisfied 4
were males, 6
were females.
Total
Males
Females
Satisfied
40
80%
14
26
77,8% 81,3%
Unsatisfied
10
20 %
4
6
22,2% 18,7%
Total
50
100%
18
100%
32
100%
10
Graphical presentation
Lyčių struktūra Lietuvoje 1993 m.
Lyčių struktūra Lietuvoje 1991 m.
vyrų
vyrų
moterų
moterų
11
Graphical presentation
Lyčių struktūra Lietuvoje
54%
53%
52%
51%
50%
49%
48%
47%
46%
45%
44%
vyrų
moterų
1993 m.
1996 m.
12
Graphical presentation
Lyčių struktūra Lietuvoje
120%
100%
80%
moterų
60%
vyrų
40%
20%
0%
1993 m.
1996 m.
13
Graphical presentation
100%
80%
60%
40%
20%
J01A Tetraciklinai
J01C Penicilinai
J01D Kiti β-laktam iniai antibiotikai
J01E Sulfonam idai ir trim etoprim as
J01F Makrolidai, linkozam idai, s treptogram inai
J01M Chinolonai
Lietuva
Slovenija
Slovakija
Rusija
Norvegija
Airija
Prancūzija
Suomija
Švedija
Danija
Kroatija
0%
J01X Kiti
14
Graphical presentation
•Other:
- Maps
- Chernoff faces
- Star plots, etc.
15
Description of numerical data




Arranging data
Frequencies (relative and cumulative),
graphical presentation
Measures of central tendency and variance
Assessing normality
16
Grouping


Sorting data
Groups (5-17 gr.) according researcher’s
criteria.
To assess distribution, for graphical presentation in excel
17
Frequencies, their comparison and
calculation
number of litas
197
students
were
asked
about the
amount of
money
(litas)
they had
in cash at
the
moment.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Total
Frequency
n
1
2
4
8
15
24
29
31
29
24
15
8
4
2
1
197
Cumulative frequency
%
n
%
0,5
1
0,5
1,0
1+2=3
1,5
2,0
3+4=7
3,6
4,1
7+8=15
7,6
7,6 15+15=30
15,2
12,2 30+24=54
27,4
14,7 54+29=83
42,1
15,783+31=114
57,9
14,7
114+29=143
72,6
12,2
143+24=167
84,8
7,6
167+15=182
92,4
4,1182+8=190
96,4
2,0190+4=194
98,5
1,0194+2=196
99,5
0,5196+1=197
100,0
18
100,0
Gaphical presentation of
frequencies
19
Normal distributions
 Most
of them around center
 Less above and lower central
values, approximately the same
proportions
 Most often Gaussian distribution
20
Not normal distributions

More observations in one part.
21
Asymmetrical distribution
22
How would you describe/present
your respondents if the data are
numeric?
2 groups of measures:
1. Central tendency (central value,
average)
2. Variance
23
MEASURES OF CENTRAL
TENDENCY




Means/averages (arithmetic, geometric,
harmonic, etc.)
Mode
Median
Quartiles
24
MEASURES OF CENTRAL
TENDENCY

Arithmetic mean (X, μ)
25
MEASURES OF CENTRAL
TENDENCY
Median (Me) – the middle value or 50th
procentile (the value of the observation, that
divides the sorted data in almost equal parts).
It is found this way
n 1
2
When
n odd: median is the middle observation
When n even: median is the average of values of
two middle observations
26
MEASURES OF CENTRAL
TENDENCY

Mode (Mo) – the most common values

Can be more than one mode
27
MEASURES OF CENTRAL
TENDENCY

Quartiles (Q1, Q2, Q3, Q4) – sample size is
divided into 4 equal parts getting 25% of
observations in each of them.
28
Is it enough measure of central
tendency to describe
respondents?
29
MEASURES OF VARIANCE
Min and max
 Range
 Standard deviation – sqrt of
variance (SD)
 Variance - V= ∑(xi - x)2/n-1
 Interquartile range (Q3-Q1 or 75%25%) IQRT

30
What measures are to be used for
sample description?
If distribution is NORMAL


Mean
Variance (or standard deviation)
If distribution is NOT NORMAL


Median
IQRT or min/max
Those measures are used also with numeric ordinal data
31
X, Mo, Me
Mean~Median~Mode,
SD ir empyric rule
32
EMPYRICAL RULE
Number of observations (%) 1, 2 ir 2.5
SD from mean if distribution is normal
33
Example
X=8
SD=2,5
-2SD
X
+2SD
34
Normality assessment
Summary




Graphical
Comparison of measures of central
tendency; empyrical rule (mean and
standard deviation)
Skewness and kurtosis (if Gaussian =0)
Kolmogorov-Smirnov test
35
Boxplot
75th Procentile
75th Procentile
Mean( *)
Median
25th Procentile
25th Procentile
Outliers
Boxplot example
26,00
24,67
mokinio_ses1_mea n
23,33
22,00
20,67
19,33
18,00
16,67
15,33
14,00
440
Central limit theorem
Inferential statistics


Confidence intervals
Hipotheses testing
39
Confidence intervals
Interval where the “true” value most
likely could occur.
40
The variance of samples and
their measures
X2, SD2; p2
X1, SD1; p1
X3, SD3; p3
X4; SD4; p4
X
μ, σ, p0
41
The variance of samples and
confidence intervals
μ, p0
42
Confidence interval
Statistical definition:
If the study was carried out 100 times, 100
results ir 100 CI were got, 95 times of 100 the
“true” value will be in that interval. But it will
not appear in that interval 5 times of 100.

43
Confidence intervals
(general, most common calculation)
95% CI : X ± 1.96 SE
Xmin; Xmax
Note: for normal distribution, when n is large
95% CI : p ± 1.96 SE
pmin ; pmax
Note: when p ir 1-p > 5/n
44
Standard error (SE)
Numeric data
(X )
Categorical data
(p)
45
Width of confidence inerval
depends on:
a) Sample size;
b) Confidence level (guaranty - usually 95%,
but available any %);
c) dispersion.
46
Hipotheses testing
H0: μ1=μ2; p1=p2; (RR=1, OR=1,
difference=0)
HA: μ1≠μ2; p1≠p2 (two sided, one sided)
47
Hipotheses testing
Significance level α (agreed 0.05).
Test for P value (t-test, χ2 , etc.).
P value is the probability to get the difference
(association), if the null hypothesis is true.
OR P value is the probability to get the difference
(association) due to chance alone, when the null
hypothesis is true.
48
Statistical agreements

If P<0.05, we say, that results can’t
be explained by chance alone,
therefore we reject H0 and accept HA.

If P≥0.05, we say, that found
difference can be due to chance
alone, therefore we don’t reject H0.
49
Tests
Test depends on





Study design,
Variable type
distribution,
Number of groups, etc.
Tests (probability distributions):







z test
t test (one sample, two independent, paired)
Χ2 (+ trend)
F test
Fisher exact test
Mann-Whitney
Wilcoxon and others.
50
Inferential statistics
Summary

P value tells, if there is statistically
significant difference (association).

CI gives interval where true value can
be.
51
Inferential statistics
Summary

Neither P value, nor CI give other
explanations of the result (bias and
confounding).

Neither P value, nor CI tell anything
about the biological, clinical or public
health meaning of the results.
52
Related documents