Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Stat 401 Lab Activity 1 Wednesday, October 12, 2005 Part I. Data displays continued. Copy and paste the activity data (pie.data.txt and 3d.data.txt) into the first four columns of Minitab. 1) Pie Chart: Graph> Pie Chart>click on “Chart raw data”>enter pie.data>OK Type “maybe” in rows 20-24 of the pie.data column in Minitab and try the above sequence again. 2) Scatter Plot: Graph>Scatterplot>“Simple”> OK > enter “Y” for Y and “X1” for X > OK. 3) 3D Scatter Plot: Graph>3D Scatterplot>“Simple”> OK > enter “Y” for Z, “X1” for Y, and “X2” for X variable > OK Note that you can rotate the graph about each of the axes to achieve a better visual understanding of the relation between the variables. Play with some rotations. Alternatively, you can try Graph>3D Scatterplot>“Simple”> OK > enter “Y” for Z, “X1” for Y, and “X2” for X variable > Data View > check “Project lines”> OK > OK Part II. The Binomial Random Variable. In this activity we will learn how to generate random numbers from the binomial distribution. This will give us intuition into the kind of variability that we can expect the outcome of a binomial experiment to display. Recall that a binomial random variable equals the number of ‘ones’ in a series of n trials, each of which can result in ‘one’ or ‘zero’. 1. First we will generate a sample of size 50 from the binomial distribution that corresponds to n=10 trials with probability of 1 equal to p=0.5. (This simulates the situation where each of 50 people toss a coin 10 times and each records the number of heads of his/her tossing experiment.) Calc>Random Data>Binomial> Enter “50” into Generate; enter “C11” into Store in column(s); enter “10” into Number of trials; enter “0.5” into Probability success> OK Graph>Histogram>Simple>OK>enter “C11” into Graph variables>OK If we did not know that the data were generated correspond to probability of ‘one’ being p=0.5, the histogram would allow us an empirical estimate of the probability of, e.g. 5 ‘ones’ in a series of 10 trials. 2. To get an idea of how variable the ‘empirical estimates’ can be, we can consider five sets of this experiment. This simulates the situation where each of 50 people toss a coin 10 times, each records the number of heads in his/her experiment, and then each repeats the whole thing for a total of five times. Calc>Random Data>Binomial> Enter “50” into Generate; enter “C21-C25” into Store in column(s); enter “10” into Number of trials; enter “0.5” into Probability success> OK Repeat the histogram sequence of commands for each of the columns C21-C25 to get a feeling of the variability of the empirical estimates. Can also combine the different histograms into one graph, though it may be difficult to discern the different groups if you combine many histograms: Graph>Histogram>With outline and groups>OK>enter “C21-C25” into Graph variables>OK 3. Now we will compare the histograms, obtained from samples size 50, with the true probabilities. The true probabilities can be obtained either exactly (using Minitab commands), or approximated as the relative frequencies of a very large sample. We begin by generating the exact probabilities: Enter the numbers 0-10 in rows 1-11 of column C16. (Enter “0” in the first row, enter “1” in the second row, highlight both numbers, move the cursor to the bottom right corner of the second row, left click and drag it down to row 11, and release.) The following sequence of commands will enter, in column C17, the probabilities with which a binomial random variable , X ~ Bin (10, 0.5), will take the value in the corresponding row of C16. Calc>Probability Distribution>Binomial> click on “Probability”, enter “10” in Number of trials, enter “0.5” in Probability of success, enter “C16” in Input column, enter “C17” in Optional storage > OK. To plot the probabilities as a ‘probability dot histogram’ do the following: Graph>Scatterplot> select “simple”> OK>Enter “C17” for Y and “C16” for X>OK We will do the relative frequency approximation of the probabilities for the next lab activity. This is a symmetric “probability dot histogram”. Part III. For homework # 5 Repeat part II of this activity, using probability of success p=0.2, and turn in all output with the already assigned homework problems. 4. Next we compare binomial random variable X, with different p value. Such as p = 0.1, 0.3, 0.6, 0.8, and 0.9. Generate one set of binomial data, n = 10, p = 0.1, 0.3, 0.6, 0.8, and 0.9, each of size 50. Combined Histogram for 5 different p values. Minitab>Calc>Random Data>Binomial> Generate “50” rows of data; store in columns: “C10”; number of trials “10”; probability success: “0.1”> OK. Repeat it 4 times for different p values. Graph>Histogram>With fit and groups, OK>Continue>Graph variables: “C10-C14”, OK> Continue>Continue> Histogram of C10, C11, C12, C13, C14 Normal 25 Variable C10 C11 C12 C13 C14 Frequency 20 Mean 0.86 2.96 6.04 7.9 8.84 15 10 StDev 0.8332 1.678 1.511 1.374 0.9337 5 0 0 2 4 6 8 10 Data 5. Compare sample probabilities with true probabilities, for different sample sizes, 1,000, 5,000, 10,000. Follow the steps Minitab>Calc>Random Data>Binomial> Generate “1,000” rows of data; store in columns: “C16”; number of trials “10”; probability success: “0.5”> OK 6. In part 2 we saw that the pmf of a binomial with n=10 and p=0.5 is symmetric. Now we plot the pmf of a binomial with n=10 and p=0.3. Repeating the steps in part 2 (do it) we see that it is skewed. 7. Now we will demonstrate the Central Limit Theorem which says that the sum of many independent r.v.’s has approximately symmetric distribution even if the individual r.v.’s have a skewed distribution. We will sum n=30 binomial r.v.’s each with n=10 and p=0.3. The distribution of the sum is binomial with n=300 and p=0.3. Repeat the steps outlined in part 2 to see that the distribution of the sum is symmetric. 8. Now we will demonstrate the Central Limit Theorem with the Poisson distribution. First plot the Poisson pmf with =3.: In free column (e.g. in column c8) generate numbers 0,1,2,…,80. Then N 50 50 50 50 50 Minitab>Calc>Probability Distribution>Poisson> select “Probability”, for Mean set 3, for input column “c8”, for Optional storage “c9”, then OK. Then do a scatter plot with Y=c9 and X=c8. Next sum n=10 independent Poisson r.v.’s each with =3. The sum will also be a Poisson r.v. with =30. Repeat the above steps to plot the pmf of this Poisson. Histogram Constructing a Histogram fro continuous data: Equal Class Widths. Download data from: http://www.stat.psu.edu/~hma/stat401fall04/Minitab/Dataset/CH01/ Minitab>Graph>Histogram>Simple> select “C1” for Graph Variable. Then, click OK. Move the cursor to the graph and click right mouse, select “Edit Bars”>Binning>for Type of Interval, select “Midpoint”, for Interval Definition, select “Midpoint/cutpoint positions”, and set “0:6000/1000”. Click OK. (this means that we want the data range from 0 to 6000 and 1000 for each interval length.) “Midpoint/cutpoint positions”: Choose to specify the exact positions you want, then specify the values for each cutpoint or midpoint. Specify arguments in order, from smallest to largest. Midpoints require equally spaced intervals; Cutpoints do not. You can specify a column that contains all the cutpoint or midpoint values, or use shorthand notation (for example, 10:40/5 means intervals from 10 to 40 by 5).