Download Week 1 Lecture

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
BA 275
Winter 2007
Exploring Data
Exploring Data: Summary and Outline
Qualitative Data
(Categorical Data)
e.g. gender, college
major, etc.
Graphical Methods
Pie charts
Bar graphs
Numerical Methods (Descriptive Statistics)
Frequency tables
Display one variable:
 Histograms
 Stem-and-Leaf Displays
 Dot plots
Measures of Location:

Median:
 Arrange the observations in ascending order.
 If n is odd, median = the middle number
 If n is even, median = the simple average of the
middle two observations.

Mode:
 The measurement that occurs most frequently in
the data set. It might not be unique, or not even
exist.
Quantitative Data
(Numerical Data)
e.g. age, income,
SAT scores, etc.
1 n
 X i  ”simple average”
n i 1

Display two variables:
 Scatter plots
Display one variable over time:
 Time series plots
Mean: X 
Measures of Spread/Variability:
1 n
 ( X i  X )2
n  1 i 1

Variance: s 



Standard deviation: s  s
Range = the largest – the smallest
Interquartile range = IQR = Q3 – Q1
2
2
Measures of Relative Standing:
 Percentiles:
 The pth percentile is a number such that p% of n
observations fall below it and (100-p)% fall
above it.


Quartiles
 Q1 = QL = the lower quartile = 25th percentile
 Q2 = Median = 50th percentile
 Q3 = QU = the upper quartile = 75th percentile
Z-scores =
obs - mean X  X

std
s
 Z-scores tell you how far the observation is above
or below the mean (the center of a data set.)
Hsieh, P-H
1
BA 275
Winter 2007

Exploring Data
Boxplot
Box-and-Whisker Plot
30
40
50
60
Box-and-Whisker Plot
70
2
3
Age
4
5
6
Salary
7
(X 10000)

The Empirical Rule: the observations come from a mound-shaped and symmetric
distribution.
1. Approximately 68% of the observations will fall within 1 standard deviation of the mean.
2. Approximately 95% of the observations will fall within 2 standard deviations of the mean.
3. Approximately 99.7% of the observations will fall within 3 standard deviations of the mean.
99.7%
95%
68%
0.15%
2.35%
13.5%
34%
34%
13.5%
2.35%
0.15%
x  3s
  3
x  2s
xs
x
xs
x  2s
  2
 
 
  2
3
2
1

0
x  3s
  3
1
2
3

1.
2.
3.
Questions to ask when describing and summarizing data:
Where is the approximate center of the distribution?
Are the observations close to one another, or are they widely dispersed?
Is the distribution unimodal, bimodal, or multimodal? If there is more than one mode, where
are the peaks, and where are the valleys?
4. Is the distribution symmetric? If not, is it skewed? If symmetric, is it bell-shaped?
Hsieh, P-H
2
Related documents