Purpose: To learn how to apply simple forecast evaluation metrics such as Mean Absolute Error (MAE), Root Mean Square Error (RMSE). To apply the simple forecasting techniques naive forecast, seasonally naive forecast, and mean forecast to a forecasting problem and evaluate their performance. This exercise is due Tuesday, September 12.

Recall that the forecast error is given by $$e_{i}=y_{i}-\hat{y}_{i}$$. During our in-class forecasting exercise all of you produced forecasts $$\hat{y}_i$$ of the unemployment rate (from the Household Survey) and the additions to nonfarm payrolls (from the Establishment Survey) that was released in the August 2017 Employment Report produced by the Bureau of Labor Statistics (BLS) on September 1, 2017. The actual unemployment rate for August 2017 turned out to be 4.4 percent. Here’s a list of your forecasts $$\{\hat{y}_i\}_{i=1,\cdots,24}$$for the unemployment rate. Each forecast is indexed by $$i$$:

Index Unemployment Forecast $$\hat{y}_i$$ Nonfarm Payroll Forecast $$\hat{y}_i$$
1 3.0 160,000
2 5.7 208,000
3 5.2 190,000
4 4.8 215,000
5 4.4 202,000
6 4.1 200,000
7 4.3 224,000
8 4.5 147,000
9 4.4 195,000
10 4.3 148,000
11 4.2 146,830
12 4.2 150,000
13 4.0 147,000
14 5.3 200,000
15 4.2 220,000
16 4.4 180,000
17 4.2 237,500
18 4.2 176,000
19 4.4 250,000
20 4.8 146,850
21 4.7 190,000
22 4.3 182,000
23 4.3 220,000
24 4.1 220,000

Implement all the questions in R. In addition to submitting your homework via CANVAS, submit individual R files to my SMU e-mail address prior to the deadline.

# Preliminaries

rm(list =ls())
forecasts.unrate <- c(3.0,5.7,5.2,4.8,4.4,4.1,4.3,4.5,4.4,4.3,4.2,4.2,4.0,5.3,4.2,4.4,4.2,4.2,4.4,4.8,4.7,4.3,4.3,4.1)
forecasts.nonfar <- c(160000,208000,190000,215000,202000,200000,224000,147000,195000,148000,146830,150000,147000,200000,
220000,180000,237500,176000,250000,146850,190000,182000,220000,220000)
number.forecasts <- length(forecasts.unrate)
realization.unrate <- rep(4.4,n=number.forecasts)
realization.nonfar <- rep(156000,n=number.forecasts)

df.forecasts <- data.frame(1:number.forecasts,
forecasts.unrate,realization.unrate,
forecasts.nonfar,realization.nonfar)

head(df.forecasts)
##   X1.number.forecasts forecasts.unrate realization.unrate forecasts.nonfar
## 1                   1              3.0                4.4           160000
## 2                   2              5.7                4.4           208000
## 3                   3              5.2                4.4           190000
## 4                   4              4.8                4.4           215000
## 5                   5              4.4                4.4           202000
## 6                   6              4.1                4.4           200000
##   realization.nonfar
## 1             156000
## 2             156000
## 3             156000
## 4             156000
## 5             156000
## 6             156000
tail(df.forecasts)
##    X1.number.forecasts forecasts.unrate realization.unrate
## 19                  19              4.4                4.4
## 20                  20              4.8                4.4
## 21                  21              4.7                4.4
## 22                  22              4.3                4.4
## 23                  23              4.3                4.4
## 24                  24              4.1                4.4
##    forecasts.nonfar realization.nonfar
## 19           250000             156000
## 20           146850             156000
## 21           190000             156000
## 22           182000             156000
## 23           220000             156000
## 24           220000             156000
names(df.forecasts)
## [1] "X1.number.forecasts" "forecasts.unrate"    "realization.unrate"
## [4] "forecasts.nonfar"    "realization.nonfar"
names(df.forecasts) <- c("index","f.un","r.un","f.nfp","r.nfp")
names(df.forecasts)
## [1] "index" "f.un"  "r.un"  "f.nfp" "r.nfp"
head(df.forecasts)
##   index f.un r.un  f.nfp  r.nfp
## 1     1  3.0  4.4 160000 156000
## 2     2  5.7  4.4 208000 156000
## 3     3  5.2  4.4 190000 156000
## 4     4  4.8  4.4 215000 156000
## 5     5  4.4  4.4 202000 156000
## 6     6  4.1  4.4 200000 156000
tail(df.forecasts,n=3)
##    index f.un r.un  f.nfp  r.nfp
## 22    22  4.3  4.4 182000 156000
## 23    23  4.3  4.4 220000 156000
## 24    24  4.1  4.4 220000 156000
summary(df.forecasts)
##      index            f.un            r.un         f.nfp
##  Min.   : 1.00   Min.   :3.000   Min.   :4.4   Min.   :146830
##  1st Qu.: 6.75   1st Qu.:4.200   1st Qu.:4.4   1st Qu.:157500
##  Median :12.50   Median :4.300   Median :4.4   Median :192500
##  Mean   :12.50   Mean   :4.417   Mean   :4.4   Mean   :189799
##  3rd Qu.:18.25   3rd Qu.:4.550   3rd Qu.:4.4   3rd Qu.:216250
##  Max.   :24.00   Max.   :5.700   Max.   :4.4   Max.   :250000
##      r.nfp
##  Min.   :156000
##  1st Qu.:156000
##  Median :156000
##  Mean   :156000
##  3rd Qu.:156000
##  Max.   :156000

# Question 1

What is the mean of your unemployment forecasts? (0.25 points)

In the answers to the questions I will give you a variety of ways to answer the questions in R in order to familiarize you with the R language syntax. Whenever you wonder what a certain command such as mean() does, type ?mean() in the R console and press ‘Enter’.

mean(forecasts.unrate)
## [1] 4.416667
mean(df.forecasts$f.un) ## [1] 4.416667 print(mean(forecasts.unrate), digits = 2) ## [1] 4.4 print(mean(df.forecasts$f.un), digits = 2)
## [1] 4.4
print(paste("The mean of our unemployment forecasts is: ",mean(forecasts.unrate),sep=""))
## [1] "The mean of our unemployment forecasts is: 4.41666666666667"

# Question 2

What is the median of your nonfarm payroll forecasts? (0.25 points)

median(forecasts.nonfar)
## [1] 192500
median(df.forecasts$f.nfp) ## [1] 192500 print(format(median(forecasts.nonfar),big.mark=",")) ## [1] "192,500" print(format(median(df.forecasts$f.nfp),big.mark=","))
## [1] "192,500"
print(paste("The median of our nonfarm payroll forecasts is: ",median(forecasts.nonfar),sep=""))
## [1] "The median of our nonfarm payroll forecasts is: 192500"

# Question 3

What is the interquartile range of the nonfarm payroll forecast? (0.25 points)

IQR(forecasts.nonfar)
## [1] 58750
IQR(df.forecasts$f.nfp) ## [1] 58750 print(format(IQR(forecasts.nonfar),big.mark=",")) ## [1] "58,750" print(format(IQR(df.forecasts$f.nfp),big.mark=","))
## [1] "58,750"
quantile(forecasts.nonfar, probs = seq(0,1,0.25))[4]-quantile(forecasts.nonfar, probs = seq(0,1,0.25))[2]
##   75%
## 58750
print(paste("The interquartile range is of the nonfarm payroll forecasts is: ",IQR(forecasts.nonfar),sep=""))
## [1] "The interquartile range is of the nonfarm payroll forecasts is: 58750"

# Question 4

What is the interquintile range of the nonfarm payroll forecast? (0.25 points)

f.quintiles <- quantile(forecasts.nonfar, probs = seq(0,1,0.2))
interquintile.range <- f.quintiles[5] - f.quintiles[2]

# or directly
interquintile.range <- quantile(forecasts.nonfar, probs = seq(0,1,0.2))[5]-quantile(forecasts.nonfar, probs = seq(0,1,0.2))[2]

print(paste("The interquintile range is: ",interquintile.range,sep=""))
## [1] "The interquintile range is: 70800"

# Question 5

What is the interdecile range of the nonfarm payroll forecast? (0.5 points)

quantile(forecasts.nonfar, probs = seq(0,1,0.1))
##     0%    10%    20%    30%    40%    50%    60%    70%    80%    90%
## 146830 147000 149200 174400 183600 192500 200000 208700 220000 222800
##   100%
## 250000
interdecile.range <- quantile(forecasts.nonfar, probs = seq(0,1,0.1))[10]-quantile(forecasts.nonfar, probs = seq(0,1,0.1))[2]
interdecile.range
##   90%
## 75800
print(paste("The interdecile range is: ",interdecile.range,sep=""))
## [1] "The interdecile range is: 75800"

# Question 6

What is the maximum of the unemployment forecast? (0.25 points)

max(forecasts.unrate)
## [1] 5.7
print(paste("The maximum of the unemployment forecast is: ",max(forecasts.unrate),sep=""))
## [1] "The maximum of the unemployment forecast is: 5.7"

# Question 7

What is the minimum of the nonfarm payroll forecast? (0.25 points)

min(forecasts.nonfar)
## [1] 146830
print(paste("The minimum of the nonfarm forecast is: ",min(forecasts.nonfar),sep=""))
## [1] "The minimum of the nonfarm forecast is: 146830"

# Question 8

What is the mean of the nonfarm payroll forecasts that are stricly above the median? (1.0 points)

mean_p50 <- round(mean(forecasts.nonfar[forecasts.nonfar>median(forecasts.nonfar)]), digits =2)

print(paste("The mean of the nonfarm payroll forecasts that are stricly above the median is: ",mean_p50,sep=""))
## [1] "The mean of the nonfarm payroll forecasts that are stricly above the median is: 215958.33"

# Question 9

What is the median of the unemployment forecasts that are strictly below the mean? (1.0 points)

median_mean <- round(median(forecasts.unrate[forecasts.unrate<mean(forecasts.unrate)]), digits =2)

print(paste("The median of the unemployment forecasts that are stricly below the mean is: ",median_mean,".",sep=""))
## [1] "The median of the unemployment forecasts that are stricly below the mean is: 4.2."

# Question 10

What is the MAE for the set of unemployment forecasts? (1.0 points)

Recall that the Mean Absolute Error (MAE) and the Root Mean Squared Error (RMSE) are given by \begin{align*} \text{Mean Absolute Error: MAE} & = \text{mean}(|e_{i}|),\\ \text{Root Mean Squared Error: RMSE} & = \sqrt{\text{mean}(e_{i}^2)}. \end{align*}

abs.e.i <- abs(realization.unrate - forecasts.unrate)
mae.un <- mean(abs.e.i)

# or more concisely:
mean(abs(realization.unrate - forecasts.unrate))
## [1] 0.3333333
print(paste("The MAE for the set of unemployment forecasts is ",mae.un,".",sep=""))
## [1] "The MAE for the set of unemployment forecasts is 0.333333333333333."

# Question 11

What is the RMSE for the set of unemployment forecasts? (1.0 points)

e.i <- realization.unrate - forecasts.unrate
rmse.un <- round(sqrt(mean(e.i*e.i)), digits = 4)

print(paste("The RMSE for the set of unemployment forecasts is ",rmse.un,".",sep=""))
## [1] "The RMSE for the set of unemployment forecasts is 0.5041."

# Question 12

What is the MAE for the set of nonfarm payroll forecasts? (1.0 points)

abs.e.i <- abs(realization.nonfar - forecasts.nonfar)
mae.nfp <- mean(abs.e.i)

# or more concisely:
mean(abs(realization.nonfar - forecasts.nonfar))
## [1] 37992.5
print(paste("The MAE for the set of nonfarm payroll forecasts is ",mae.nfp,".",sep=""))
## [1] "The MAE for the set of nonfarm payroll forecasts is 37992.5."

# Question 13

What is the RMSE for the set of nonfarm payroll forecasts? (1.0 points)

The Mean Absolute Percentage Error is

$\text{Mean Absolute Percentage Error: MAPE} = \text{mean}(|p_{i}|).$ where $p_{i} = 100 \cdot \frac{e_{i}}{y_{i}}$

e.i <- realization.nonfar - forecasts.nonfar
rmse.nfp <- round(sqrt(mean(e.i*e.i)), digits = 2)

print(paste("The RMSE for the set of nonfarm payroll forecasts is ",rmse.nfp,".",sep=""))
## [1] "The RMSE for the set of nonfarm payroll forecasts is 45868.5."

# Question 14

What is the MAPE for your unemployment forecasts? (1.0 points)

e.i <- realization.unrate - forecasts.unrate
p.i <- 100*(e.i/realization.unrate)

mape <- mean(abs(p.i))

print(paste("The MAPE for the set of unemployment forecasts is ",mape,".",sep=""))
## [1] "The MAPE for the set of unemployment forecasts is 7.57575757575758."

# Question 15

What is the MAPE for your nonfarm payroll forecasts? (1.0 points)

e.i <- realization.nonfar - forecasts.nonfar
p.i <- 100*(e.i/realization.nonfar)

mape <- mean(abs(p.i))

print(paste("The MAPE for the set of nonfarm payroll forecasts is ",mape,".",sep=""))
## [1] "The MAPE for the set of nonfarm payroll forecasts is 24.3541666666667."

# Optional

What is a more useful measure of forecast error when comparing the ECO5375 students’ performance on the forecasting exercises on unemployment and nonfarm payroll forecasts?

MAE and RMSE are less useful than MAPE because the forecasted variables are at very different scales (percent unemployment rate versus thousands of nonfarm payroll additions.)