20 Final Synthesis

In Chapter 14, we replicated Fama-MacBeth (1973) to get a risk premium of .352% with a t-statistic of 1.17. But to be able to reject the null at the 5% level, we need a t-statistic of 1.96 or larger; at the 10% level, it’s 1.645 or larger. In this classwork, we’ll return to this replication. This time, we’ll consider our methods a little more carefully in light of the Econometrics we’ve been learning. Will we be able to show that riskier stocks have higher returns, like theory suggests?

To review, the Fama-MacBeth procedure works in 3 steps:

Estimate each stock’s CAPM beta by filtering for one stock and running the regression ret - RF ~ mkt_rf. Add betas to the data.
Estimate risk premiums for each month: filter for each time period and run the regression ret - RF ~ beta.
Average the monthly risk premiums to get a single estimate $\bar{\lambda}$. Compute the t-statistic as: $t = \frac{\bar{\lambda}}{S_\lambda / \sqrt{T}}$, where $S_\lambda$ is the standard deviation of the monthly estimates and $T$ is the number of months.

library(tidyverse)
library(frenchdata)

stock_panel <- read_csv("stock_panel.csv")

french <- download_french_data('Fama/French 3 Factors')$subsets$data[[1]] %>%
  rename(mkt_rf = `Mkt-RF`) %>%
  mutate(
    date = ym(date),
    across(c(mkt_rf, SMB, HML, RF), ~ .x / 100)
  )

joined_data <- stock_panel %>%
  mutate(date = floor_date(date, unit = "month")) %>%
  left_join(french, join_by(date)) %>%
  select(permno, comnam, date, ret, mkt_rf, RF)

Question 1: Naive Fama-MacBeth Procedure

Finish the code below to do the Fama-MacBeth procedure as we’ve been practicing it. Here we use Kenneth French’s market returns and risk-free rate, so the results will be a little different from Chapter 14’s results.

# Step 1: estimate betas for stocks with all 300 months of data
betas <- map(
  .x = joined_data %>% count(permno) %>% filter(n == 300) %>% pull(permno),
  .f = function(p) {
    joined_data %>% 
      filter(permno == ____) %>%
      lm(ret - RF ~ ____, data = .) %>%
      broom::tidy() %>%
      slice(____) %>%
      select(beta = ____) %>%
      mutate(permno = p)
  }
) %>%
  bind_rows()

# Add betas to our data set:
joined_data <- joined_data %>%
  inner_join(betas, join_by(permno))

# Step 2: estimate monthly risk premiums
risk_premiums <- map(
  .x = joined_data %>% pull(date) %>% unique(),
  .f = function(t) {
    joined_data %>% 
      filter(date == ____) %>%
      lm(ret - RF ~ ____, data = .) %>%
      broom::tidy() %>%
      slice(____) %>%
      select(risk_premium = ____)
  }
) %>%
  bind_rows()

risk_premiums %>%
  summarize(
    lambda = mean(risk_premium),
    t_statistic = mean(risk_premium) / (sd(risk_premium) / sqrt(n()))
  )

We estimate the monthly risk premium to be ____%, with a t-statistic of ____.

Question 2: Splitting the Sample

One issue with our naive approach is that the betas are measured with error. So when we use an imperfect beta measurement to estimate monthly risk premiums, the monthly risk premiums will be biased toward zero. We’ll talk about how to minimize this measurement error issue in Question 4.

Another related issue is that we use the same data to estimate both beta and risk premiums. If a stock had a lucky run so that it had unusually good returns, it will end up with an inflated beta estimate, and those good returns will make it look like beta drives high returns: this is omitted variable bias, where the omitted variable is “luck.” We can break that link by splitting the data in two: use the first half to estimate betas and the second half to estimate monthly risk premiums. After accounting for this upward omitted variable bias, we should see that the estimated risk premium falls.

# reset joined_data
joined_data <- stock_panel %>%
  mutate(date = floor_date(date, unit = "month")) %>%
  left_join(french, join_by(date)) %>%
  select(permno, comnam, date, ret, mkt_rf, RF)

# Step 1: estimate betas using the first half only (first 150 months)
betas <- map(
  .x = joined_data %>% count(permno) %>% filter(n == 300) %>% pull(permno),
  .f = function(p) {
    joined_data %>% 
      filter(permno == ____, date <= ymd(20000101) %m+% months(150)) %>%
      lm(ret - RF ~ ____, data = .) %>%
      broom::tidy() %>%
      slice(____) %>%
      select(beta = ____) %>%
      mutate(permno = p)
  }
) %>%
  bind_rows()

# Add betas to our data set:
joined_data <- joined_data %>%
  inner_join(betas, join_by(permno))

# Step 2: estimate monthly risk premiums using the second half only (months 151-300)
risk_premiums <- map(
  .x = joined_data %>% filter(date > ymd(20000101) %m+% months(150)) %>% pull(date) %>% unique(),
  .f = function(t) {
    joined_data %>% 
      filter(date == ____) %>%
      lm(ret - RF ~ ____, data = .) %>%
      broom::tidy() %>%
      slice(____) %>%
      select(risk_premium = ____)
  }
) %>%
  bind_rows()

risk_premiums %>%
  summarize(
    lambda = mean(risk_premium),
    t_statistic = mean(risk_premium) / (sd(risk_premium) / sqrt(n()))
  )

We estimate the monthly risk premium to be ____%, with a t-statistic of ____.

Question 3: Survivorship Bias

As we saw in Chapter 19, when we filter on only companies that have a return for every month in the data, we’re selecting on survivors, which may behave differently than the average stock. Still, we need enough data per company to estimate beta with some precision. Instead of filtering on n == 300, try n >= 200. When we include stocks that fell out of our data, will that bring our risk premium estimate up or down? It’s unpredictable and depends on whether those companies are low or high beta companies.

# reset joined_data
joined_data <- stock_panel %>%
  mutate(date = floor_date(date, unit = "month")) %>%
  left_join(french, join_by(date)) %>%
  select(permno, comnam, date, ret, mkt_rf, RF)

betas <- map(
  .x = joined_data %>% count(permno) %>% filter(n >= 200) %>% pull(permno),
  .f = function(p) {
    joined_data %>% 
      filter(permno == ____, date <= ymd(20000101) %m+% months(150)) %>%
      lm(ret - RF ~ ____, data = .) %>%
      broom::tidy() %>%
      slice(____) %>%
      select(beta = ____) %>%
      mutate(permno = p)
  }
) %>%
  bind_rows()

# Add betas to our data set
joined_data <- joined_data %>%
  inner_join(betas, join_by(permno))

risk_premiums <- map(
  .x = joined_data %>% filter(date > ymd(20000101) %m+% months(150)) %>% pull(date) %>% unique(),
  .f = function(t) {
    joined_data %>% 
      filter(date == ____) %>%
      lm(ret - RF ~ ____, data = .) %>%
      broom::tidy() %>%
      slice(____) %>%
      select(risk_premium = ____)
  }
) %>%
  bind_rows()

risk_premiums %>%
  summarize(
    lambda = mean(risk_premium),
    t_statistic = mean(risk_premium) / (sd(risk_premium) / sqrt(n()))
  )

We estimate the monthly risk premium to be ____%, with a t-statistic of ____.

Question 4: Forming Portfolios

In their original paper, Fama and MacBeth were very concerned with the measurement-error-in-betas problem. If each stock’s betas are measured with a lot of error, that will wash out the risk premium estimate.

The workaround: instead of using individual stock returns to estimate the risk premium, they grouped stocks into portfolios by their beta values. Then they used those portfolio returns to estimate the risk premium. Here are the steps:

Use the first 100 months to estimate rough betas for each stock. Sort stocks by betas, and create 20 portfolios, going from riskiest to least risky.
Use the second 100 months to re-estimate each portfolio’s beta using the portfolio’s returns. Idiosyncratic noise gets averaged out.
Use the final 100 months with the portfolio betas and the period 3 portfolio returns to estimate the risk premium.

This procedure helps estimate beta more reliably, reducing the attenuation bias from measurement error. We should see that the estimate for the risk premium increases and becomes more statistically significant.

Let’s dive in!

# reset joined_data
joined_data <- stock_panel %>%
  mutate(date = floor_date(date, unit = "month")) %>%
  left_join(french, join_by(date)) %>%
  select(permno, comnam, date, ret, mkt_rf, RF)

start_date <- joined_data %>% pull(date) %>% min()
cutoff1  <- start_date %m+% months(100)
cutoff2  <- start_date %m+% months(200)

# Step 1: Use the first 100 months to estimate rough betas for each stock. 
# Sort stocks by betas, and create 20 portfolios, going from riskiest to 
# least risky.
betas <- map(
  .x = joined_data %>% count(permno) %>% filter(n == 300) %>% pull(permno),
  .f = function(p) {
    joined_data %>% 
      filter(permno == ____, date <= cutoff1) %>%
      lm(ret - RF ~ ____, data = .) %>%
      broom::tidy() %>%
      slice(____) %>%
      select(beta = ____) %>%
      mutate(permno = p)
  }
) %>%
  bind_rows()

# Assigning a portfolio to each stock by its beta value, creating 20 portfolios in total:
betas <- betas %>%
  mutate(portfolio = ntile(beta, 20))

# Adding betas to our data:
joined_data <- joined_data %>%
  inner_join(betas %>% select(permno, portfolio), join_by(permno))

# Step 2: Use the second 100 months to re-estimate each portfolio's 
# beta using the portfolio's returns. Idiosyncratic noise gets averaged out.

portfolios <- map(
  .x = 1:20, # there are 20 portfolios
  .f = function(pf) {
    joined_data %>%
      filter(portfolio == pf, date > cutoff1, date <= cutoff2) %>%
      group_by(date, RF, mkt_rf) %>%
      summarize(ret = mean(ret), .groups = "drop") %>% # Average monthly return for the portfolio
      lm(ret - RF ~ mkt_rf, data = .) %>%
      broom::tidy() %>%
      slice(2) %>%
      select(portfolio_beta = estimate) %>%
      mutate(portfolio = pf)
  }
) %>%
  bind_rows()

# Add the portfolio betas to our data:
joined_data <- joined_data %>%
  inner_join(portfolios, join_by(portfolio))

# Step 3: Use the final 100 months with the portfolio betas and the 
# period 3 portfolio returns to estimate the risk premium.

risk_premiums <- map(
  .x = joined_data %>% filter(date > cutoff2) %>% pull(date) %>% unique(),
  .f = function(t) {
    joined_data %>%
      filter(date == t) %>%
      group_by(portfolio, mkt_rf, RF, portfolio_beta) %>%
      summarize(portfolio_return = mean(ret), .groups = "drop") %>%
      lm(portfolio_return - RF ~ portfolio_beta, data = .) %>%
      broom::tidy() %>%
      slice(2) %>%
      select(risk_premium = estimate)
  }
) %>%
  bind_rows()

risk_premiums %>%
  summarize(
    lambda = mean(risk_premium),
    t_statistic = mean(risk_premium) / (sd(risk_premium) / sqrt(n()))
  )

We estimate the monthly risk premium to be ____%, with a t-statistic of ____.

Let’s contextualize this number. The risk premium is the coefficient on the regression ret - RF ~ beta, so it’s the extra return over the risk-free rate you can expect when beta increases by 1. In the last 25 years, the average risk-free rate has been .147% per month (a $1000 risk-free investment grows in 10 years to 1000 * (1.00147)^120 = 1192.76). If the risk premium is .761%, a portfolio with a beta of 1 (the S&P500) would earn in 10 years: 1000 * (1.00147 + .00761)^120 = 2958.50, and a very risky portfolio with a beta of 2 would be expected to earn in 10 years: 1000 * (1.00147 + .00761*2)^120 = 7288.3. Of course, the cost is that you’re bearing more risk.

Download this assignment

Here’s a link to download this assignment.