Download Activity2.old

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Probability wikipedia , lookup

Probability interpretations wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Stat 401 Lab Activity 1
Wednesday, October 12, 2005
Part I. Data displays continued.
Copy and paste the activity data (pie.data.txt and 3d.data.txt) into the first four columns
of Minitab.
1) Pie Chart:
Graph> Pie Chart>click on “Chart raw data”>enter pie.data>OK
Type “maybe” in rows 20-24 of the pie.data column in Minitab and try the above
sequence again.
2) Scatter Plot:
Graph>Scatterplot>“Simple”> OK > enter “Y” for Y and “X1” for X > OK.
3) 3D Scatter Plot:
Graph>3D Scatterplot>“Simple”> OK > enter “Y” for Z, “X1” for Y, and “X2” for
X variable > OK
Note that you can rotate the graph about each of the axes to achieve a better visual
understanding of the relation between the variables. Play with some rotations.
Alternatively, you can try
Graph>3D Scatterplot>“Simple”> OK > enter “Y” for Z, “X1” for Y, and “X2” for
X variable > Data View > check “Project lines”> OK > OK
Part II. The Binomial Random Variable.
In this activity we will learn how to generate random numbers from the binomial
distribution. This will give us intuition into the kind of variability that we can expect the
outcome of a binomial experiment to display. Recall that a binomial random variable
equals the number of ‘ones’ in a series of n trials, each of which can result in ‘one’ or
‘zero’.
1. First we will generate a sample of size 50 from the binomial distribution that
corresponds to n=10 trials with probability of 1 equal to p=0.5. (This simulates
the situation where each of 50 people toss a coin 10 times and each records the
number of heads of his/her tossing experiment.)
Calc>Random Data>Binomial> Enter “50” into Generate; enter “C11” into
Store in column(s); enter “10” into Number of trials; enter “0.5” into Probability
success> OK
Graph>Histogram>Simple>OK>enter “C11” into Graph variables>OK
If we did not know that the data were generated correspond to probability of ‘one’
being p=0.5, the histogram would allow us an empirical estimate of the probability of,
e.g. 5 ‘ones’ in a series of 10 trials.
2. To get an idea of how variable the ‘empirical estimates’ can be, we can consider
five sets of this experiment. This simulates the situation where each of 50 people
toss a coin 10 times, each records the number of heads in his/her experiment, and
then each repeats the whole thing for a total of five times.
Calc>Random Data>Binomial> Enter “50” into Generate; enter “C21-C25” into
Store in column(s); enter “10” into Number of trials; enter “0.5” into Probability
success> OK
Repeat the histogram sequence of commands for each of the columns C21-C25 to get
a feeling of the variability of the empirical estimates. Can also combine the different
histograms into one graph, though it may be difficult to discern the different groups if
you combine many histograms:
Graph>Histogram>With outline and groups>OK>enter “C21-C25” into Graph
variables>OK
3. Now we will compare the histograms, obtained from samples size 50, with the
true probabilities. The true probabilities can be obtained either exactly (using
Minitab commands), or approximated as the relative frequencies of a very large
sample.
We begin by generating the exact probabilities:
Enter the numbers 0-10 in rows 1-11 of column C16. (Enter “0” in the first row, enter
“1” in the second row, highlight both numbers, move the cursor to the bottom right
corner of the second row, left click and drag it down to row 11, and release.) The
following sequence of commands will enter, in column C17, the probabilities with
which a binomial random variable , X ~ Bin (10, 0.5), will take the value in the
corresponding row of C16.
Calc>Probability Distribution>Binomial> click on “Probability”, enter “10” in
Number of trials, enter “0.5” in Probability of success, enter “C16” in Input
column, enter “C17” in Optional storage > OK.
To plot the probabilities as a ‘probability dot histogram’ do the following:
Graph>Scatterplot> select “simple”> OK>Enter “C17” for Y and “C16” for
X>OK
We will do the relative frequency approximation of the probabilities for the next lab
activity.
This is a symmetric “probability dot histogram”.
Part III. For homework # 5
Repeat part II of this activity, using probability of success p=0.2, and turn in all
output with the already assigned homework problems.
4. Next we compare binomial random variable X, with different p value.
Such as p = 0.1, 0.3, 0.6, 0.8, and 0.9.
Generate one set of binomial data, n = 10, p = 0.1, 0.3, 0.6, 0.8, and 0.9, each of size
50.
Combined Histogram for 5 different p values.
Minitab>Calc>Random Data>Binomial> Generate “50” rows of data; store in
columns: “C10”; number of trials “10”; probability success: “0.1”> OK. Repeat it 4
times for different p values.
Graph>Histogram>With fit and groups, OK>Continue>Graph variables: “C10-C14”,
OK> Continue>Continue>
Histogram of C10, C11, C12, C13, C14
Normal
25
Variable
C10
C11
C12
C13
C14
Frequency
20
Mean
0.86
2.96
6.04
7.9
8.84
15
10
StDev
0.8332
1.678
1.511
1.374
0.9337
5
0
0
2
4
6
8
10
Data
5. Compare sample probabilities with true probabilities, for different sample sizes,
1,000, 5,000, 10,000.
Follow the steps
Minitab>Calc>Random Data>Binomial> Generate “1,000” rows of data; store in
columns: “C16”; number of trials “10”; probability success: “0.5”> OK
6. In part 2 we saw that the pmf of a binomial with n=10 and p=0.5 is symmetric.
Now we plot the pmf of a binomial with n=10 and p=0.3. Repeating the steps in
part 2 (do it) we see that it is skewed.
7. Now we will demonstrate the Central Limit Theorem which says that the sum of
many independent r.v.’s has approximately symmetric distribution even if the
individual r.v.’s have a skewed distribution. We will sum n=30 binomial r.v.’s
each with n=10 and p=0.3. The distribution of the sum is binomial with n=300
and p=0.3. Repeat the steps outlined in part 2 to see that the distribution of the
sum is symmetric.
8. Now we will demonstrate the Central Limit Theorem with the Poisson
distribution. First plot the Poisson pmf with  =3.:
In free column (e.g. in column c8) generate numbers 0,1,2,…,80. Then
N
50
50
50
50
50
Minitab>Calc>Probability Distribution>Poisson> select “Probability”, for Mean set
3, for input column “c8”, for Optional storage “c9”, then OK. Then do a scatter plot
with Y=c9 and X=c8.
Next sum n=10 independent Poisson r.v.’s each with =3. The sum will also be a
Poisson r.v. with =30. Repeat the above steps to plot the pmf of this Poisson.
Histogram
Constructing a Histogram fro continuous data: Equal Class Widths.
Download data from:
http://www.stat.psu.edu/~hma/stat401fall04/Minitab/Dataset/CH01/
Minitab>Graph>Histogram>Simple> select “C1” for Graph Variable. Then, click OK.
Move the cursor to the graph and click right mouse, select “Edit Bars”>Binning>for Type
of Interval, select “Midpoint”, for Interval Definition, select “Midpoint/cutpoint
positions”, and set “0:6000/1000”. Click OK. (this means that we want the data range
from 0 to 6000 and 1000 for each interval length.)
“Midpoint/cutpoint positions”: Choose to specify the exact positions you want, then
specify the values for each cutpoint or midpoint. Specify arguments in order, from
smallest to largest. Midpoints require equally spaced intervals; Cutpoints do not. You can
specify a column that contains all the cutpoint or midpoint values, or use shorthand
notation (for example, 10:40/5 means intervals from 10 to 40 by 5).