Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
xxxxxx xxxxx NEAS Time Series Project Fall 2011 Introduction The United States of America has the third largest population on the planet, behind China and India. The recent and expected population growths have been a major component in political discussions on immigration, social security, unemployment, and the national health care system. According to the United State Census Bureau, the state of Texas has 8.04% of the country’s population and is the fastest growing population of the fifty states at 2.1% of the 2010 population census. Even though the population of the state has been growing significantly, the state is showing some of the lowest recession impacts in the state’s major cities. This naturally leads the nation’s population experts to project that the state of Texas will continue to grow over the next decade. In this project, I set out to fit the Texas historical population to a time series formula in order to project the state’s population to match with this current year’s projections and until the end of the decade. Also, I will expand the process to include the total population of the United States in order to compare Texas’ projected population with the rest of the projected population growth. Data The data used for this project can be found on the website provided by the Texas State Library and Archives Commission (https://www.tsl.state.tx.us/ref/abouttx/census.html). The population data on the website comes from the United States Census Bureau in Washington D.C. for the years of 1900-2011. The mid-year population numbers are developed by census counts performed by the Census Bureau and projected for the years in between the nationwide censuses. When looking at the graph of the Texas population, you can see some year ranges where historical events had significant influences on the state’s population. Obvious events include the effects by World War I combined with the Spanish Influenza, World War II, and the Korean War. The Vietnam War is not as easy to see and historical events are more determined by the populous’ health and the state’s immigration trends. The large sudden population increase in the 1980’s can be attributed to the economic problems in Mexico and Latin America which directly led to the sudden increase in immigration for Hispanics into the United States. The late 1990’s had another sudden jump in population that has been attributed to an increase of Asian/Polynesian immigrants into the metropolitan areas of Dallas, Houston, and Austin, along with continued Hispanic immigration. When compared to the total population census for the United States, there are not as many obvious events on the population graph. The graph does show the influences on the population from the First and Second World Wars, but other national conflicts are not as obvious on the graph trend. Most of the country’s population is linear with very small number of peaks or valleys along the graph. The steep population increase at the end of the 20th century could be from increased immigration as a response for the economic strength of the 1990’s in the United States. Another national statistic that can be seen on the United States population that the population growth is slowing down as the population ages and fertility rates have been dropping over the past decades. When compared to the Texas population graph, the steeper slope for the state population indicates the state’s above average population growth. As health care technology increases, mortality rates decrease, and birth rates decrease, the future population graph would show a flat slope as the nation’s population stabilizes. Analysis To begin, a series of calculations using a sample autocorrelation functions will be used to determine the best time series to fit the data. The sample autocorrelation function was calculated from lag 1 to lag 111 (for years 1900-2011) and the equation used was: n rk t k 1 (Yt Y )(Yt k Y ) n t 1 (Yt Y ) 2 The series of autocorrelations can be found on the ‘Autocorrelation Texas’ tab under the ‘E’ column. The corresponding correlogram was produced and can be seen here: The correlogram represents a dampened sine wave behavior which implies that a Second-Order Autoregressive Process (AR(2)) is appropriate for the time series model. The equation for an AR(2) model is: 𝑌t = 𝜙1𝑌t-1 + 𝜙2𝑌t-2 + 𝜃0 + 𝑒t The population of Texas was then regressed against the one-year and two-year lags using the Excel Regression Analysis Tool. The method of regression can be found on the “Regression Texas” worksheet. The results of the AR(2) regression is found as such: SUMMARY OUTPUT Regression Statistics Multiple R 0.99988283 R Square 0.999765674 Adjusted R Square 0.999761294 Standard Error 97747.50782 Observations 110 ANOVA df Regression Residual Total Intercept X Variable 1 X Variable 2 SS 4.36188E+15 1.02234E+12 4.3629E+15 MS F Significance F 2.18094E+15 228261.2002 6.1005E-195 9554575284 Coefficients Standard Error -523.3522409 18552.40402 1.323029978 0.092169404 -0.309554 0.093949858 t Stat P-value Lower 95% -0.028209403 0.977547685 -37301.32944 14.35432938 1.07634E-26 1.140314885 -3.294885221 0.001335835 -0.495798637 2 107 109 The resulting model was produced by the regression: 𝑌t = 1.323 ∗ 𝑌t-1 − 0.310 ∗ 𝑌t-2 − 523.352 + 𝑒t The high value for the R2 statistic (99.98%) indicates that the model is a good fit to the population data. The resulting residual plot is: The residual plot looks like the projected model fits very well to the actual population as a majority of the plot points are near the axis. Percentagewise, the residuals are very close to the population data as the majority of the residuals are within 3% of the population census: The year ranges where the residual percentages have the greatest absolute values tend to be from the major historical events noted earlier. This makes sense as the national conflicts and sudden immigration led to the biggest fluctuations on the population chart and the model would not project for such events and instead only reacts to the events. Therefore, it can be concluded that the model is a good fit for the population data. The resulting model produces the population charts as: Since the population is consistently increasing, the time series model is non-stationary. In order to achieve a stationary model, the regression process will be repeated with the first differences or the annual change of population. This data is produced by taking the difference of a census population minus the previous year’s population: 𝑊t = 𝑌t − Yt-1 If this new model appears stationary, then the model of the population would follow an integrated autoregressive moving average (ARIMA). The chart of the first difference appears as: As expected, the sudden dips of the population chart due to historical events can be seen on the chart of first differences. The World War conflicts have negative differences and the immigration events have sudden population jumps. The correlogram of the first differences appears as: The Correlogram of the first differences do have a slight sine wave feature to the chart; however, the graph of the first differences still do not appear to converge towards a stationary average. For this reason, we will conclude that the population does not follow an ARIMA model. This is justified by the very low R2 statistic provided when the first differences are run through the Regression Tool: SUMMARY OUTPUT Regression Statistics Multiple R 0.722978813 R Square 0.522698364 Adjusted R Square 0.513859444 Standard Error 109594.3497 Observations 111 In a last ditch effort to find a stationary model, the first differences of population logarithms will be compared. This is described in the equation: 𝑊t = ln(𝑌t) − ln(Yt-1) The resulting equation produces the resulting graph and correlogram: The results seem overly complicated for projection model, so we will finish the projects conclusion using the Autoregressive model from earlier (AR(2)). 𝑌t = 1.323 ∗ 𝑌t-1 − 0.310 ∗ 𝑌t-2 − 523.352 + 𝑒t Conclusion The end result of the model projection will be used to forecast the Texas population over the next ten years. This is simply done by using the equation given by the AR(2) model and projecting over the past two years of population census. The resulting population projections result in: Year 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 Actual Texas Population 20,851,820 21,334,855 21,723,220 22,103,374 22,490,022 22,928,508 23,507,783 23,904,380 24,326,974 24,782,302 25,145,561 25,674,681 Projected Texas Population 21,771,365 22,135,658 22,518,393 22,912,262 23,372,704 24,003,367 24,348,760 24,785,096 25,256,693 25,596,347 26,183,940 26,693,914 27,210,982 27,737,214 28,273,374 28,819,832 29,376,842 29,944,626 30,523,395 Projected Growth Rate 1.673% 1.729% 1.749% 2.010% 2.698% 1.439% 1.792% 1.903% 1.345% 2.296% 1.948% 1.937% 1.934% 1.933% 1.933% 1.933% 1.933% 1.933% Thus, the model is projecting that the population of Texas could be growing over 30 million by the year 2020. To finish the project’s objective, the population for the United States was also run through the Regression process, also following the AR(2) process in order to remain consistent. The following model was produced by running the United States population through the Regression Tool: 𝑌t = 0.939 ∗ 𝑌t-1 + 0.044 ∗ 𝑌t-2 + 5,855,398 + 𝑒t This model is very similar to the Texas Population model with a very high R2 statistic and strong growth expected for the future. With the exception of the first initial year, the projected model fits very well to the actual census data with a slight lag on the historical events. The model projection over the next decade looks like: Year 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 Actual US Population 281,421,906 285,102,075 287,941,220 290,788,976 293,655,404 296,507,061 299,398,484 301,621,157 304,059,724 307,006,550 308,745,538 311,591,917 Projected US Population 285,973,354 288,801,450 291,600,659 294,417,783 297,221,855 300,062,623 302,277,101 304,664,930 307,539,568 309,302,241 312,051,792 312,608,805 313,152,125 313,686,858 314,212,924 314,730,473 315,239,642 315,740,567 316,233,381 Projected Growth Rate 0.989% 0.969% 0.966% 0.952% 0.956% 0.738% 0.790% 0.944% 0.573% 0.889% 0.179% 0.174% 0.171% 0.168% 0.165% 0.162% 0.159% 0.156% So based on the analysis, the AR(2) model best works for modeling the population of the State of Texas from the years of 1900-2011. The model allows us make forecasts to project future population growths. When compared to the similar model of the United States population, the growth is projected to remain the higher for the state of Texas compared to the national average. The projections forecast that, by the 2020 Census is finished, about %9.652 of the nation’s population could be living within Texas.