Download QA_session

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript

Først indløbne spørgsmål

Derefter er ordet frit

Own laptops in the exam: Rules do not say
anything about this, but specify printed aids
and PC – so expect this is not permitted.
Prepare for this.




Rules for the use of PCs at exams:
http://www1.itu.dk/sw118342.asp#516_9214
2
Rules for DEDA exam specially:
http://www1.itu.dk/graphics/ITUlibrary/Intranet/Uddannelse/Eksamen/PCregler/Regler%20for%20brug%20af%20PC%
20ved%20eksamen%20i%20Experimental%2
0Design%20and%20Analysis.pdf

Multiple choice sektionen:

Der kan være flere svarmuligheder til
spørgsmålene.

Der kan være flere svarmuligheder på 0+ af
spørgsmålene

Q26 i eksamen: Gennemgang

26A) Work out the formula (algorithm) for the
linear regression between the two variables.

Løsning: enten beregne manuelt eller sæt
data ind i SPSS
Using SPSS to obtain scatterplot with regression line:
Analyze > Regression > Curve Estimation...
Model Summary and Parameter Estimates
Dependent Variable: memory score
Equation
Linear
R Square
.736
Model Summary
F
df1
16.741
1
df2
6
The independent variable is number of vitamin treatments .
Sig.
.006
Parameter Estimates
Cons tant
b1
17.167
8.333
”constant" is the
intercept with yaxis, "b1" is the
slope”



Alle linear regressionformler har udseendet:
Y = a + bx --- dvs: (med tallene beregnet i SPSS)
Y = 0.602 + 1.322x

26B) Using your regression line formula, predict how many
hours per week a person installing 50 games would spend
playing them.

Løsning: Sæt tallene ind i formlen. Hvis vi bruger
ovenstående:

Y = 0.602 + 1.322*50 Y = 66.702 timer
70
y = 1.3229x + 0.6025
R² = 0.9869
60
50
40
Series1
Linear (Series1)
30
20
10
0
0
10
20
30
40
50

26C) Is the formula for the regression line the same if
we put “number of games installed” on the Y-axis and
“hours per week played” on the X-axis?

Nej – husk at regression af X on Y ikke er
det samme som regression af Y on X – der
vil være en lille smule forskel

To predict Y from X requires a line that minimizes
the deviations of the predicted Y's from actual Y's.

To predict X from Y requires a line that minimizes
the deviations of the predicted X's from actual X's a different task (although somewhat similar)!

Solution: To calculate regression of X on Y, swap
the column labels (so that the "X" values are now
the "Y" values, and vice versa); and re-do the
calculations.
 So X is now test results, Y is now stress score
Regression lines for predicting Y from X, and vice versa:
Y on X: predicts stress
score, given knowledge of
test score
X on Y: predicts test
score, given knowledge
of stress score
120
100
80
test
score
(Y)
60
40
20
0
0
10
20
30
stress score (X)
40
50
n.b.: intercept = 55

What is the difference between ordinal and interval data??
What kind of data are ratings?

Ordinal data fortæller os ikke noget om distancen imellem
to målinger. Vi ved at ”1” er før ”2” men ikke hvor meget
afstanden faktisk er.

I interval data er afstanden mellem målepunkterne
konstant, men der er ikke et sandt nulpunkt

Ratings – ratings er vist bare et udtryk. Kig på hvilke
karakteristika måleenheden der bliver brugt har. F.eks.
”ratings on a scale from 0-50” – jamen så er det
intervaldata.







Values are measureable
Measuring size of variables is important for
comparing results between studies/projects
Different measures provide different quality
of data:
Nominal (categorical) data
Non-parametric
Ordinal data
Interval data
Parametric
Ratio data

Nominal data (categorical, frequency data)

When numbers are used as names

No relationship between the size of the number and
what is being measured

Two things with same number are equivalent

Two things with different numbers are different


E.g. Numbers on the shirts of soccer players
Nominal data are only used for frequencies
 How many times ”3” occurs in a sample
 How often player 3 scores compared to player 1

Ordinal data

Provides information about the ordering of the data

Does not tell us about the relative differences
between values


For example: The order of people who
complete a race – from the winner to the last
to cross the finish line.
Typical scale for questionnaire data
 Interval data
 When measurements are made
on a scale with equal intervals
between points on the scale, but
the scale has no true zero point.

Examples:

Celsius temperature scale: 100 is water's boiling point; 0 is an
arbitrary zero-point (when water freezes), not a true absence
of temperature.

Equal intervals represent equal amounts, but ratio
statements are meaningless - e.g., 60 deg C is not twice as
hot as 30 deg!
-4
-3
-2
-1
0
1
1
2
3
4
5
6
2
7
3
4
8
9

Ratio data

When measurements are made on a scale with
equal intervals between points on the scale, and the
scale has a true zero point.

e.g. height, weight, time, distance.

Measurements of relevance include: Reaction times,
numbers correct answered, error scores in usability
tests.

Q20B – is it a correlational study or between-groups design?

Korrelation er en analysemetode. Mange typer
eksperimentelle designs kan give ophav til en
korrelationsanalyse.
Eksperimentelle design er mere grundlæggende.
I dette tilfælde er der tale om et between-groups
(independent measures) eksperiment design – 3 grupper
der måles hver for sig
Bemærk: Ikke nødvendigvis et ”true” eksperiment – står
ikke noget i opgaven om random allocation af participants




Q21B – skal vi regne SD ud i hånden?? Hvilken
SD regner SPSS ud? Den for populationen eller
den for samplet?

SD for sample: 8.644507; SD for
population: 8.869077 – hvad siger SPSS?

Note: Excel bruger SD for population (N-1)

Q21C – er det +/- 1 SD? (Vel ikke +/- en halv
SD?)

+/- 1 SD – når man siger ”within x SD of the
mean” betyder det + eller –

Fordi data er normalfordelte ville vi
forvente at 68% af de 20 scores lå indenfor
en SD i begge retninger
Relationship between the normal curve and the
standard deviation:
frequency
All normal curves share this property: the SD cuts off a
constant proportion of the distribution of scores:-
68%
95%
99.7%
-3
-2
-1
mean
+1
+2
+3
Number of standard deviations either side of mean

About 68% of scores will fall in the range
of the mean plus and minus 1 SD;

95% in the range of the mean +/- 2 SD's;

99.7% in the range of the mean +/- 3 SD's.

Q22C – skal 998 tillægges datasættet, nu hvor
det ikke kan udelades?

Fejl i opgaven: 998 findes ikke i talrækken.

Ideen var at observere hvordan mean,
median, mode og SD ændrer sig forskelligt
når scores ændrer sig, dvs. hvorvidt mean,
median eller mode er mest ”sårbare”.
Der vil ikke være fejl i eksamenssættet.


Q25 – hvorfor er der 6 values fra hver by, når du
skriver at der burde være noget andet?
&%¤&¤ ... Host host ... Nja der burde så stå ”six” i
opgaveteksten, ikke ”seven” – det er en fejl.
 Hvis det her skulle opstå, så brug ALTID de rå
data. Løs opgaven med de data I bliver givet.
 Som sagt: Eksamenssættet er grundigt tjekket.

Hvornår bruger vi z-
scores?
Going beyond the data: Z-scores
Using z-scores, we can represent a given score in terms of
how different it is from the mean of the group of scores.
SD = 2
μ = 63
Xi = 64
How to calculate z-score:
zX 
Xi  

64  63 1

  0.50 - SD from the mean
2
2
We can do the same thing to calculate the relationship of a
sample mean to the population mean:
μ = 63
64 
X
(1) we obtain a particular sample mean;
(2) we can represent this in terms of how different it is from the
mean of its parent population.

zX 
X 
64  63 1

  2.00

2
2
4
N
16

We use z-scores whenever we want to
evaluate how far from a sample mean a
score is
 E.g. to evaluate if a score is an outlier
 Hvis f.eks. en score er -1 SD fra mean, så ved vi at
den falder indenfor de 68% hyppigste scores
(grundet normalfordelingen i dataene) – dvs. ikke
statistisk signifikant ved p<0.05

Or, conversely, how far away from the
population mean our sample mean is
Hvordan finder man
ud af om scores i et
datasæt er
normalfordelt?



Parametric statistics work on the mean -> All
data must be interval or ratio level data
Parametric tests also make assumptions
about the variance between groups or
conditions
So we must BOTH have parametric measure AND
scores must adhere to the requirements of
parametric statistics

For independent-measures (between groups),
we assume that variance in one condition is the
same as the other: Homogeneity of variance
 The spread of scores in each sample should be roughly
similar
 Tested using Levene´s test (we do this in SPSS – often i
gives you Levene´s test when running e.g. T-test)

For repeated-measures (within subjects), we
operate with the sphericity assumption,
 Tested using Mauchly´s test
SPSS output (independent measures t-test)
t is calculated by dividing difference in means with
standard error: 4.58/0.84359
Sig. is < than .05, so there is a
significant difference between
alcohol/no alcohol on performance
Row 1 left show result of Levene´s test – tests the hypothesis that variance in the two
samples is equal. If Levene´s test is significant at p<0.05 the assumption of
homogenity of variance in the samples has been violated (this is annoying).
If not, we assume equal variance (use row 1)

We also assume our data come from a population with a
normal distribution

We can test how much a distribution is similar to the normal
distribution using the Kolmogorov-Smirnov test (the vodka
test) and the Shapiro-Wilk tests
 The tests compare the set of scores in the sample to a normally
distributed set of scores with the same mean and standard deviation
 If the test is non-significant (p>0.05) the distribution of the sample is
NOT significantly different from a normal distribution (i.e. it is
normal) [OPPOSITE OF LEVENE´S TEST!]
 If p<0.05, the distribution of the sample is significantly different from
normal (e.g. positively or negatively skewed).

We can run Kolmogorov-Smirnov and Shapiro-Wilk tests in
SPSS

The most important is the Kolmogorov-Smirnov Test (K-Stest)

SPSS produces an output that includes the test statistic itself
(D), the degrees of freedom (df) (= the sample size) and the
significance value of the test (sig.).

If the significance of the K-S-test is less than .05, the
distribution deviates significantly from the normal
 Brush-up on standard deviation
from the mean
 SD from the mean is just how
many SD´s your score/result
deviates from the mean value
of the sample/population