Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Qualitative Variables in a Regression Model using Dummy Variables Differences in Two Population Means Differences Among More Than Two Means Mixtures of Quantitative and Qualitative Independent Variables 1. Differences in Two Population Means 1.1 Ways to express the values of two means o Two means can be expressed as two values or as one mean and the difference between the means. o Example: You are measuring the starting income of two majors. The average starting salary of major 1 is $25 per hour and the average starting salary of major 2 is $15. This could also be expressed as the average of major 1 is $25 and when you go to major 2 from major 1, the mean is reduced by $10. 1.2 Using regression to model these values. o The intercept is the average value of value when x=0 and the slope is the change in the mean of y when x increases by 1. o In the example above: The intercept would be the average starting salary of major 1 and the slope is the change in the average starting salary when you go to major 2. The following population model can be used to express this: Mean of Y = 0 + 1X = $25 - $10X where X = 0 for major 1 and X=1 for major 2 1.3 Interpretation of the coefficients o 0 + 1 = 2 which is the average value of y for the second population 0 = 1 which is the average value of y for the first population 1 = 2 - 1 the mean of second population minus the mean of the first o o Example: $25 - $10 average starting salary for major 2 $25 average starting salary for major 1 -$10 is the average starting salary of the first major minus the average starting salary of the second major 1.4 Estimates o Least square equation yˆ b0 b1 X X = 0 or 1 b0 is the estimated mean of the first population (the first sample mean) b1 is the estimated difference in means (second sample mean minus the first) o Example: Supposed you wanted to compare the average time spent by males and females watching a particular cable channel. You found the following least squares line: yˆ 6 2 X where X = 0 for Males 1 for females. The sample intercept is the __________________________________________________ While the sample slope is the _________________________________________________ __________________________________________________ 1.5 Inferences: o Requires the same assumptions and has the same degrees of freedom as simple linear regression o The test and confidence interval for the slope is identical in values and meanings to the test and confidence interval for the difference in means (independent sample case) found in an earlier chapter. 2. More than two means 2.1 Number of differences o Two means require one mean and one difference; i.e., one dummy variable o Three means require one mean and two differences; i.e., two dummy variables o K means require one mean and k-1 differences; i.e. k-1 dummy variables o Example: average days sick per month for type 1 workers is 4, mean for type 2 is 6 and the mean for type 3 is 1 can also be expressed as: The mean for the first type of worker is 4, when you go from the first worker population to the second the mean increases by 2, and when you go from the first to the third the mean decreases by 3 2.2 Modeling using dummy variables 2.2.1 Notation and interpretation for three means oThe intercept is the average value of value when x1=0 and x2=0 and the first slope is the change in the mean of y when x1 increases by 1 and the second slope is the change in the mean of y when x2 increases by 1. E(y) = 0 + 1X1 + 2X2 0 + 1 = 2 which is the average value of y for the second population 0 + 2 = 3 which is the average value of y for the third population 0 = 1 which is the average value of y for the first population 1 =2 - 1 the mean of second population minus the mean of the first 2 =3 - 1 the mean of third population minus the mean of the first o In the example above: . The following population model can be used to express this: Mean days absent = 0 + 1X1 + 2X2 = 4 +2X1 – 3X2 Where X1 = 1 indicates the second type of worker and 0 if not X2 = 1 indicates the third type of worker and 0 if not Mean for worker 1 is ____ Mean for worker 2 is _______ Mean for worker 3 is _______ Differences in means = (neither worker 2 or 3) (worker 2 and not 3) (worker 3 and not 2) 2.2.1 Notation and interpretation for k means o The intercept is the average value of value when all k-1 dummy variables is zero and the slope of the ith (i =1, 2, … , k-1) dummy variable is the change in the mean of y when Xi increases by 1. E(y) = 0 + 1X1 + 2X2 + … + k-1Xk-1 0 + i = i which is the average value of y for the ith population 0 = 1 which is the average value of y for the first population i =i - 1 the mean of ith population minus the mean of the first 2.3 Inferences o Requires same assumptions and uses same degrees of freedom as does a regression model with k - 1 variables o F test for regression tests the null hypothesis that all the coefficients are zero. Here if all the coefficients are zero then all the means are equal. o A t-test or a confidence interval for i will make inferences about the difference in the mean of the ith level and the mean of the first level. Example: Supposed you wanted to compare the average time spent by adult males, adult females, and children watching a particular cable channel. From a sample of 30, you found the following least squares line yˆ 4.7 1.5 X 1 4 X 2 when X1 = 1 if males and 0 otherwise and X2= 1 if children and 0 otherwise The slope estimate of 1.5 could be interpreted as: 4 could be interpreted as: Additionally using multiple regression for testing all the coefficients you found an F test value of 4.5Complete the following hypothesis test H0 H1 Test Statistic Rejection Region Conclusion: 3. Mixtures of Quantitative and Qualitative Variables Consider the following Y = time spent watching a cable channel, X1 is the total time spent on all channels during the same time period and category (adult males, adult females, and children). Examine the following model: E(y) = 0 + 1X1 + 2X2+ 3X3 Where X2 = 1 if males and 0 otherwise X3= 1 if children and 0 otherwise What is the equation for males? What is the equation for children? What is the equation for females? What is the interpretation of 1? How would you test for the effect of category? 4. Examples from Bureau of Labor Statistics: Pricing of College Textbooks http://www.bls.gov/cpi/cpictb.htm Pricing of Microwave ovens http://www.bls.gov/cpi/cpimwo.htm Creating Occupational Pay relatives http://www.bls.gov/news.release/ncspay.tn.htm