Download Hypothesis Testing - Learn Via Web .com

Associate Professor Arthur Dryver, PhD School of Business Administration, NIDA Email: dryver@gmail.com url: www.LearnViaWeb.com Know your respondents Taking a sample from the population Population Example: 10 million customers Example: A sample of 80 respondents Are the sample results important?  The sample is important in what it says about the population.  A company with 10 million customers will survive if 80 clients are unhappy and ultimately leave due to disappointment in the product.  A company will not survive if those 80 unhappy customers represent the 10 million customers and most 10 million are considering leaving.  How can we tell? Do you have enough data? Gender breakout in a sample survey Do you have enough data? Age breakout in a sample survey Different ages probably different concerns Don’t make conclusions based on only a few respondents. Do you have enough data to answer your questions? Does your data represent your population? Gender breakout in a sample survey Different ages possibly different concerns. If so this survey can be misleading if your population consists of 50% each. The responses biased toward females. Does your data represent your population? Age Breakout in a sample survey Different ages probably different concerns Know your respondents  Does your data represent your population of interest?  If about VIP customers then does it represent your typical VIP customer  Also, think about the respondents  What is their incentive to respond?  Don’t make broad conclusions based on a misrepresentative sample.  Make statements considering taking into account your data Probability  Flipping a coin – probability of heads (1/2)  Probability of 2 heads out of 2 = ¼  Rolling a die with 6 sides.  Probability of a 4 = 1/6  What if we don’t know anything how can we calculate probabilities?  How much people spend on phone apps per month?  Are they satisfied with your service?  etc. Central Limit Theorem sample size = 1 Density 0.2 0.0 0 2 4 0 20 x Uniform Beta 30 0.5 4 Density 1.0 6 1.5 x 0 0.0 Density 5 10 2 Density -2 0.00 0.10 0.20 Gamma 0.4 Normal 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 x x Central Limit Theorem sample size = 3 -1 0 1 2 0 5 10 x Uniform Beta 0.0 1.0 Density 1.0 0.0 15 2.0 x 2.0 -2 Density 0.15 0.00 0.4 Density Gamma 0.0 Density Normal 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 x x Central Limit Theorem sample size = 10 Gamma -0.5 0.5 1.5 2 6 Uniform Beta Density 3 2 0.4 0.6 x 0.8 8 0.0 1.0 2.0 3.0 x 0 1 0.2 4 x 4 -1.5 Density 0.2 0.0 Density 0.4 0.0 0.4 0.8 1.2 Density Normal 0.2 0.4 0.6 x 0.8 1.0 Central Limit Theorem sample size = 20 0.0 0.5 1.0 1 3 4 Uniform Beta 6 7 0 1 2 Density 4 2 5 3 4 x 0 Density 2 x 6 -0.5 0.0 0.2 0.4 0.6 1.0 Density Gamma 0.0 Density Normal 0.3 0.4 0.5 0.6 0.7 x 0.3 0.5 0.7 x 0.9 Central Limit Theorem sample size = 30 Gamma 0.0 0.4 Density 1.0 0.0 Density 2.0 0.8 Normal -0.2 0.2 0.6 2 3 4 x x Uniform Beta 5 4 0 2 Density 6 4 2 0 Density 8 -0.6 0.3 0.4 0.5 x 0.6 0.7 0.3 0.5 0.7 x 6 Central Limit Theorem sample size = 300 0.0 1.0 Density 4 2 0 0.0 0.1 0.2 2.4 2.8 3.2 Uniform Beta Density 15 0 0 5 10 15 x 25 x 3.6 5 Density -0.2 -0.1 Density 2.0 Gamma 6 Normal 0.44 0.48 0.52 x 0.56 0.50 0.55 0.60 x 0.65 Central Limit Theorem (CLT) The central limit theorem: Let X 1 , X 2 ,, X n be a random sample from i.i.d. random variables from any distribution with finite mean,  , and finite variance,  2 . Then the limiting distribution of X  lim n  n   N (0,1),  / n  where X n is the average of the n sampled observations. For a large sample size n>30 x  t n 1. lim n t n1   N (0,1). s/ n The importance of central limit theorem  Generally we cannot calculate probabilities on a single observation because we don’t know enough.  We can calculate probabilities on the sample average making some assumptions because we know its probability distribution function. Hypothesis Testing  Why????  Why not just look at the mean and percentages??? Men spend more than women. Done or are we??? Hypothesis Testing  What if this came from 10 men and 10 women?  What if 100 of each? How can we take what we’ve learned from the survey and discuss about our 10 million customers?  This is the benefit behind hypothesis testing - going from a sample to population. Yes 300>250, yes men spend more than women in the sample. Now can we make a definite statement about the population – 15 million customers? Taking a sample from the population How can we examine 200 and talk about 15 million???? Population Example: 15 million customers Example: A sample of 200 respondents Hypothesis testing  In general within hypothesis testing we wish to test a theory, belief or simply something of interest.  It is desired to test if a quantity concerning the population, called a parameter, is either not equal to, greater than or less than some value (Alternative hypothesis).  Null hypothesis: =, <=, or >=.  Often within hypothesis testing one may want to compare two groups/samples to each other  E.g. comparing the population average salary of men, to the population average salary of women. Examples • average income in Bangkok is greater than 30,000 Baht/month: - H 0 :   30,000 and H A :  > 30,000 . • average income in Bangkok of men is greater than that of women: - H 0 : men  women and H A :  men >  women . • percent of women in Hong Kong is less than 50%: - H 0 :   50% and H A :  < 50% . P-Value  The p-value is a probability.  The null hypothesis is used to calculate the probability.  The p-value is the probability of observing the test statistic or more extreme given the null hypothesis. Steps Within Hypothesis Testing: P-value Approach Determine the population of interest – Example: True customers Determine the null hypothesis, and the alternative hypothesis. Decide on the appropriate level of significance, alpha. Determine the sample size and sampling design to use. 1. 2. 3. 4. The tests in this chapter are appropriate when the data comes from a simple random sample. Most statistical tests are not appropriate when the data comes from a convenience or other types of non-probability sample. 1. 2.  5. 6. 7. 8. What is done and what should be done are often not the same. Determine the appropriate test statistic given the data and sampling design. Collect the data and calculate the appropriate test statistic. Calculate the p-value for the null and alternative hypothesis combination. Make a decision whether to fail to reject or reject the null hypothesis by comparing the p-value to alpha. Conclusions from a hypothesis Test  When a hypothesis test is performed, the result is either fail to reject the null hypothesis or reject the null hypothesis.  Do not say "accept" the null hypothesis. There is a huge difference between not having enough evidence to disprove something and proving something. Conclusions from a hypothesis Test  Null hypothesis Men spend <= Women on phone apps  Alternative hypothesis Men spend > Women on phone apps  Reject Null hypothesis means we are confident that men spend more than women.  Fail to reject means we are not confident – need more evidence. Yes 300>250 but this is only the sample, can we be certain about the population? Types of Error H0 Fail to reject H0 Reject H0 is true P(No error)= 1 P(Type I error)=  H0 is false P(Type II Error)=  P(No error)= 1  Hypothesis Tests: Formulas H 0 :  = 0 H0 :  =  0 z = ˆ 0 p  0 (1   0 )/n H 0 : 1 = 2 t= x  0 n ( x  0 ) t=  s s/ n x1  x2 2 1 2 2 s s  n1 n2 , A larger n and everything else assuming the same leads to a larger test statistic – increases the probability of rejecting the null hypothesis. Statistical Significance Versus Practical Significance Don’t make big marketing decisions over small differences – Example: like a 1 Baht difference. Yes 251>250, yes men spend more than women in the sample. With enough data this will be statistically significant, but it is not practically significant. What to do when. First off, what is the number of variables and the type of variable(s):  One variable:    Two variables:     categorical and categorical categorical and continuous continuous and continuous Multiple variables:    categorical continuous Continuous dependent Categorical dependent Exception: Time series data - use time series techniques One Variable  Categorical  Two Categories  Binomial Test or Z approximation for test of proportions  More than two categories  Chi-square test  Continuous  t-test Two Variables  Categorical and Categorical (nominal data both)  Two Categories For Each Variable If one or two sided test: Two Sample Z-test of proportions  If two sided test: Chi-Square test  More than two categories for at least one of the variables.  Chi-square test   Categorical (nominal) and Categorical (ordinal data e.g. strongly disagree to strongly agree 1 to 5 – with a single question – not average of say 5 questions)  Two categories  Consider non-parametric statistical tests  E.g. Mann-Whitney and Wilcoxon tests  Multiple categories  Consider non-parametric statistical tests Two Variables  Categorical and Continuous  Two Categories Two independent sample t-test  Paired t-test (if data is paired)  More than two Categories  ANOVA   Continuous and Continuous  Correlation and simple linear regression Multiple Variables  Multiple variables: Continuous dependent  General linear model  Multiple variables: Categorical dependent  Two Categories  Logistic regression  More than two Categories  Discriminant analysis  Time Series  Time series data - use time series techniques Practice Practice Practice  www.LearnViaWeb.com  www.LearnViaWeb.com/teachstat

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Hypothesis Testing - Learn Via Web .com