August 29, 2017

Prediction is very difficult, especially about the future.

Niels Bohr, Danish Physicist

Time Series Forecasting

The time-series plot is the most frequently used form of graphic design. With one dimension marching along to the regular rhythm of seconds, minutes, hours, days, weeks, months, years, or millennia, the natural ordering of the time scale gives this design a strength and efficiency of interpretation found in no other graphic arrangement.

Tufte (1983, p. 28)

Time Series Forecasting

• Source: Tufte (1983, p. 28)

Tenth century time series plot – inclinations of the planetary orbits

Introduce FRED and highlight a few series, in particular:

Forecasts are Expectations

Christoffer Koch and Julieta Yung (2017) Dallas Fed Economic Letter Vol. 12, No. 8 Impact of Macroeconomic Announcements Changed After the Zero Lower Bound

Forecasting, Planning, and Goals

• Forecasting

… is about predicting the future as accurately as possible, given all of the information available, including historical data and knowledge of any future events that might impact the forecasts.

• Goals

… are what you would like to have happen. Goals should be linked to forecasts and plans, but this does not always occur. Too often, goals are set without any plan for how to achieve them, and no forecasts for whether they are realistic.

• Planning

… is a response to forecasts and goals. Planning involves determining the appropriate actions that are required to make your forecasts match your goals.

Steps in Forecasting

1. Problem definition.

2. Gathering information.

3. Preliminary (exploratory) analysis.

4. Choosing and fitting models.

5. Using and evaluating a forecasting model.

Numerical Data Summaries

Univariate Statistics

• sample mean

• median

• interquartile range

• standard deviation

Bivariate Statistics

• correlation coefficient

Univariate Statistics

Mean

$\bar{x} = \frac{1}{N}\sum_{i=1}^N x_{i} = (x_{1} + x_{2} + x_3 + \cdots + x_{N})/N$

Standard Deviation

$s = \sqrt{\frac{1}{N-1} \sum_{i=1}^N (x_{i} - \bar{x})^2}.$

Univariate Statistics

Cars example - some data on 2009 model cars, each of which has an automatic transmission, four cylinders and an engine size under 2 liters.

subset(fuel, Litres<2)[, c(1,3,5,6,8)]
##                         Model Litres City Highway Carbon
## 20             Chevrolet Aveo    1.6   25      34    6.6
## 21           Chevrolet Aveo 5    1.6   25      34    6.6
## 19                Honda Civic    1.8   25      36    6.3
## 2          Honda Civic Hybrid    1.3   40      45    4.4
## 11                  Honda Fit    1.5   27      33    6.1
## 9                   Honda Fit    1.5   28      35    5.9
## 13             Hyundai Accent    1.6   26      35    6.3
## 14                    Kia Rio    1.6   26      35    6.1
## 12               Nissan Versa    1.8   27      33    6.3
## 31               Nissan Versa    1.8   24      32    6.8
## 22            Pontiac G3 Wave    1.6   25      34    6.6
## 23          Pontiac G3 Wave 5    1.6   25      34    6.6
## 18               Pontiac Vibe    1.8   26      31    6.6
## 33 Saturn Astra 2DR Hatchback    1.8   24      30    6.8
## 34 Saturn Astra 4DR Hatchback    1.8   24      30    6.8
## 17                   Scion xD    1.8   26      32    6.6
## 10             Toyota Corolla    1.8   27      35    6.1
## 26              Toyota Matrix    1.8   25      31    6.6
## 1                Toyota Prius    1.5   48      45    4.0
## 8                Toyota Yaris    1.5   29      35    5.9

Univariate Statistics

In this example, $$N=20$$ and $$x_i$$ denotes the carbon footprint of vehicle $$i$$. Then the average carbon footprint is

\begin{align} \bar{x} & = \frac{1}{20}\sum_{i=1}^{20} x_{i} \\ &= (x_{1} + x_{2} + x_3 + \dots + x_{20})/20 \\ &= (4.0 + 4.4 + 5.9 + \dots + 6.8 + 6.8 + 6.8)/20 \\ &= 124/20 = 6.2 \text{ tons CO}_{2}. \end{align}

The median, on the other hand, is the middle observation when the data are placed in order. In this case, there are 20 observations and so the median is the average of the 10th and 11th largest observations. That is

$\text{median} = (6.3+6.6)/2 = 6.45.$

Univariate Statistics

Cars example - consider only the carbon footprint (the 8th variable)Interquartile range - simply the difference between the 75th and 25th percentiles

$\text{IQR} = (6.6 - 6.1) = 0.5.$

fuel2 <- fuel[fuel\$Litres<2,]
summary(fuel2[,"Carbon"]) 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
##    4.00    6.10    6.45    6.20    6.60    6.80
sd(fuel2[,"Carbon"])
## [1] 0.7440996

Bivariate Statistics

Correlation Coefficient

$r = \frac{\sum (x_{i} - \bar{x})(y_{i}-\bar{y})}{\sqrt{\sum(x_{i}-\bar{x})^2}\sqrt{\sum(y_{i}-\bar{y})^2}},$

cor(fuel2[,"Carbon"], fuel2[,"City"])
## [1] -0.9688341

Bivariate Statistics

Autocorrelation

$r_{k} = \frac{\sum\limits_{t=k+1}^T (y_{t}-\bar{y})(y_{t-k}-\bar{y})}{\sum\limits_{t=1}^T (y_{t}-\bar{y})^2}$

Bivariate Statistics – Autocorrelation

beer2 <- window(ausbeer, start=1992, end=2006-.1)
lag.plot(beer2, lags=9, do.lines=FALSE)

Autocorrelation Function

Acf(beer2, las = 1, lwd = 2, main = "Autocorrelation Function")