Download Introduction - Neas

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Regression analysis wikipedia , lookup

Data assimilation wikipedia , lookup

Transcript
xxxxxx xxxxx
NEAS Time Series Project
Fall 2011
Introduction
The United States of America has the third largest population on the planet, behind China and India. The
recent and expected population growths have been a major component in political discussions on
immigration, social security, unemployment, and the national health care system. According to the
United State Census Bureau, the state of Texas has 8.04% of the country’s population and is the fastest
growing population of the fifty states at 2.1% of the 2010 population census. Even though the
population of the state has been growing significantly, the state is showing some of the lowest recession
impacts in the state’s major cities. This naturally leads the nation’s population experts to project that
the state of Texas will continue to grow over the next decade.
In this project, I set out to fit the Texas historical population to a time series formula in order to project
the state’s population to match with this current year’s projections and until the end of the decade.
Also, I will expand the process to include the total population of the United States in order to compare
Texas’ projected population with the rest of the projected population growth.
Data
The data used for this project can be found on the website provided by the Texas State Library and
Archives Commission (https://www.tsl.state.tx.us/ref/abouttx/census.html). The population data on the
website comes from the United States Census Bureau in Washington D.C. for the years of 1900-2011.
The mid-year population numbers are developed by census counts performed by the Census Bureau and
projected for the years in between the nationwide censuses.
When looking at the graph of the Texas population, you can see some year ranges where historical
events had significant influences on the state’s population. Obvious events include the effects by World
War I combined with the Spanish Influenza, World War II, and the Korean War. The Vietnam War is not
as easy to see and historical events are more determined by the populous’ health and the state’s
immigration trends. The large sudden population increase in the 1980’s can be attributed to the
economic problems in Mexico and Latin America which directly led to the sudden increase in
immigration for Hispanics into the United States. The late 1990’s had another sudden jump in
population that has been attributed to an increase of Asian/Polynesian immigrants into the
metropolitan areas of Dallas, Houston, and Austin, along with continued Hispanic immigration.
When compared to the total population census for the United States, there are not as many obvious
events on the population graph. The graph does show the influences on the population from the First
and Second World Wars, but other national conflicts are not as obvious on the graph trend. Most of the
country’s population is linear with very small number of peaks or valleys along the graph. The steep
population increase at the end of the 20th century could be from increased immigration as a response
for the economic strength of the 1990’s in the United States. Another national statistic that can be seen
on the United States population that the population growth is slowing down as the population ages and
fertility rates have been dropping over the past decades. When compared to the Texas population
graph, the steeper slope for the state population indicates the state’s above average population growth.
As health care technology increases, mortality rates decrease, and birth rates decrease, the future
population graph would show a flat slope as the nation’s population stabilizes.
Analysis
To begin, a series of calculations using a sample autocorrelation functions will be used to determine the
best time series to fit the data. The sample autocorrelation function was calculated from lag 1 to lag 111
(for years 1900-2011) and the equation used was:


n
rk
t  k 1
(Yt  Y )(Yt k  Y )

n
t 1
(Yt  Y ) 2
The series of autocorrelations can be found on the ‘Autocorrelation Texas’ tab under the ‘E’ column. The
corresponding correlogram was produced and can be seen here:
The correlogram represents a dampened sine wave behavior which implies that a Second-Order
Autoregressive Process (AR(2)) is appropriate for the time series model. The equation for an AR(2)
model is:
𝑌t = 𝜙1𝑌t-1 + 𝜙2𝑌t-2 + 𝜃0 + 𝑒t
The population of Texas was then regressed against the one-year and two-year lags using the Excel
Regression Analysis Tool. The method of regression can be found on the “Regression Texas” worksheet.
The results of the AR(2) regression is found as such:
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.99988283
R Square
0.999765674
Adjusted R Square
0.999761294
Standard Error
97747.50782
Observations
110
ANOVA
df
Regression
Residual
Total
Intercept
X Variable 1
X Variable 2
SS
4.36188E+15
1.02234E+12
4.3629E+15
MS
F
Significance F
2.18094E+15 228261.2002 6.1005E-195
9554575284
Coefficients
Standard Error
-523.3522409
18552.40402
1.323029978
0.092169404
-0.309554
0.093949858
t Stat
P-value
Lower 95%
-0.028209403 0.977547685 -37301.32944
14.35432938 1.07634E-26 1.140314885
-3.294885221 0.001335835 -0.495798637
2
107
109
The resulting model was produced by the regression:
𝑌t = 1.323 ∗ 𝑌t-1 − 0.310 ∗ 𝑌t-2 − 523.352 + 𝑒t
The high value for the R2 statistic (99.98%) indicates that the model is a good fit to the population data.
The resulting residual plot is:
The residual plot looks like the projected model fits very well to the actual population as a majority of
the plot points are near the axis. Percentagewise, the residuals are very close to the population data as
the majority of the residuals are within 3% of the population census:
The year ranges where the residual percentages have the greatest absolute values tend to be from the
major historical events noted earlier. This makes sense as the national conflicts and sudden immigration
led to the biggest fluctuations on the population chart and the model would not project for such events
and instead only reacts to the events. Therefore, it can be concluded that the model is a good fit for the
population data. The resulting model produces the population charts as:
Since the population is consistently increasing, the time series model is non-stationary. In order to
achieve a stationary model, the regression process will be repeated with the first differences or the
annual change of population. This data is produced by taking the difference of a census population
minus the previous year’s population:
𝑊t = 𝑌t − Yt-1
If this new model appears stationary, then the model of the population would follow an integrated
autoregressive moving average (ARIMA). The chart of the first difference appears as:
As expected, the sudden dips of the population chart due to historical events can be seen on the chart of
first differences. The World War conflicts have negative differences and the immigration events have
sudden population jumps. The correlogram of the first differences appears as:
The Correlogram of the first differences do have a slight sine wave feature to the chart; however, the
graph of the first differences still do not appear to converge towards a stationary average. For this
reason, we will conclude that the population does not follow an ARIMA model. This is justified by the
very low R2 statistic provided when the first differences are run through the Regression Tool:
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.722978813
R Square
0.522698364
Adjusted R Square
0.513859444
Standard Error
109594.3497
Observations
111
In a last ditch effort to find a stationary model, the first differences of population logarithms will be
compared. This is described in the equation:
𝑊t = ln⁡(𝑌t) − ln⁡(Yt-1)
The resulting equation produces the resulting graph and correlogram:
The results seem overly complicated for projection model, so we will finish the projects conclusion using
the Autoregressive model from earlier (AR(2)).
𝑌t = 1.323 ∗ 𝑌t-1 − 0.310 ∗ 𝑌t-2 − 523.352 + 𝑒t
Conclusion
The end result of the model projection will be used to forecast the Texas population over the next ten
years. This is simply done by using the equation given by the AR(2) model and projecting over the past
two years of population census. The resulting population projections result in:
Year
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
Actual Texas
Population
20,851,820
21,334,855
21,723,220
22,103,374
22,490,022
22,928,508
23,507,783
23,904,380
24,326,974
24,782,302
25,145,561
25,674,681
Projected Texas
Population
21,771,365
22,135,658
22,518,393
22,912,262
23,372,704
24,003,367
24,348,760
24,785,096
25,256,693
25,596,347
26,183,940
26,693,914
27,210,982
27,737,214
28,273,374
28,819,832
29,376,842
29,944,626
30,523,395
Projected
Growth Rate
1.673%
1.729%
1.749%
2.010%
2.698%
1.439%
1.792%
1.903%
1.345%
2.296%
1.948%
1.937%
1.934%
1.933%
1.933%
1.933%
1.933%
1.933%
Thus, the model is projecting that the population of Texas could be growing over 30 million by the year
2020. To finish the project’s objective, the population for the United States was also run through the
Regression process, also following the AR(2) process in order to remain consistent. The following model
was produced by running the United States population through the Regression Tool:
𝑌t = 0.939 ∗ 𝑌t-1 + 0.044 ∗ 𝑌t-2 + 5,855,398 + 𝑒t
This model is very similar to the Texas Population model with a very high R2 statistic and strong growth
expected for the future. With the exception of the first initial year, the projected model fits very well to
the actual census data with a slight lag on the historical events. The model projection over the next
decade looks like:
Year
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
Actual US
Population
281,421,906
285,102,075
287,941,220
290,788,976
293,655,404
296,507,061
299,398,484
301,621,157
304,059,724
307,006,550
308,745,538
311,591,917
Projected US
Population
285,973,354
288,801,450
291,600,659
294,417,783
297,221,855
300,062,623
302,277,101
304,664,930
307,539,568
309,302,241
312,051,792
312,608,805
313,152,125
313,686,858
314,212,924
314,730,473
315,239,642
315,740,567
316,233,381
Projected
Growth Rate
0.989%
0.969%
0.966%
0.952%
0.956%
0.738%
0.790%
0.944%
0.573%
0.889%
0.179%
0.174%
0.171%
0.168%
0.165%
0.162%
0.159%
0.156%
So based on the analysis, the AR(2) model best works for modeling the population of the State of Texas
from the years of 1900-2011. The model allows us make forecasts to project future population growths.
When compared to the similar model of the United States population, the growth is projected to remain
the higher for the state of Texas compared to the national average. The projections forecast that, by the
2020 Census is finished, about %9.652 of the nation’s population could be living within Texas.