19 The Replication Crisis in Finance

In this assignment, we’ll explore two related problems that have shaken empirical finance: the replication crisis and survivorship bias.

The replication crisis says: When researchers test enough possible explanations, some will appear statistically significant purely by chance. If all those significant results get published, the published literature will be full of false discoveries, which makes the literature very untrustworthy. The practice of testing many hypotheses until one is statistically significant is known as p-hacking.

Survivorship bias says: When a data set only includes the winners, any average computed from that data set will be too optimistic. Stocks for companies that went bankrupt disappear. What remains is a sample that has been filtered for success.

In both cases (p-hacking and survivorship bias), a selection process removes failures from view, leaving behind a distorted picture of reality. We’ll start with a simulation of p-hacking, then prove the math behind multiple testing, and then investigate survivorship bias directly in stock return data.

Part 1: map() Simulation

Imagine a researcher who has monthly stock returns for 10 years (120 observations) and wants to find a factor that predicts them. The returns are pure random noise: there is no real factor in this world. Undaunted, the researcher tests 100 candidate factors, each one also pure random noise, running a separate regression for each. Any factor with p < 0.05 gets written up as a discovery.

Question 1: Using map() and rnorm(), simulate this researcher’s process. Generate 10 years of random monthly returns (120 observations) and 100 independent random factors, and record how many factors produce a regression coefficient that is statistically significant at the 5% level. In a short paragraph, explain what your simulation implies about a published paper that reports one significant factor out of a large number of tested candidates.

library(tidyverse)

Part 2: Econometric Proof

The simulation showed that false positives are inevitable when you test many factors. Let’s work through the math.

Question 2: Suppose a researcher tests $k$ completely unrelated factors, each against the same outcome variable. Each test uses a significance threshold of $\alpha = 0.05$. By definition, a test on a truly null factor has a 5% chance of returning a false positive, and a 95% chance of returning a true negative.

Let $k = 2$: the researcher tests 2 independent hypotheses. Argue why the probability of getting no false positives is 0.95^2 = .9025.
Fill in the table below with the probability of getting no false positives given different values for $k$.

k	P(no false positives)
1	.95
2	.9025
3
4
5
10
15
20

How many hypotheses does the researcher need to run to get more than a 95% chance of getting at least one false positive?
When you run a single hypothesis test, the expected number of false positives is .05. When you run $k$ independent hypothesis tests, the expected number of false positives is $.05 k$. Fill in the table below.

k	Expected number of false positives
1	.05
2
3
10
20
100

The appropriate way to handle testing multiple hypotheses is to raise the bar so that the probability of even one false positive discovery across all k tests is at most 5%. So you want to find a p-value threshold p* such that $1 - (1 - p^*)^k = 0.05$. Fill in the table below with the appropriate p-value given the number of hypothesis tests conducted.

k	P-value
1	.05
2
3
10
20
100

Part 3: Stock Market Investigation

Survivorship bias shows up throughout finance, but one of the clearest places to see it is in stock return databases. The CRSP database records every stock that was ever listed on major U.S. exchanges, including stocks that later went bankrupt, were acquired, or were delisted for falling below exchange requirements. A researcher who restricts to only the stocks still trading at the end of the sample is working with a survivorship-biased dataset, and any average return they compute will be too high.

Question 3: In this part, you will measure how survivorship distorts reported returns.

Start by computing the average return for stocks across the whole data set.
Show that the average monthly return for “survivors” only (stocks that have a return listed for every month in the data) is 1.23%.
Show that the average monthly return for “non-survivors” is 0.633%.
Show that only 6.8% of stocks in this data are survivors.
Compounding returns: Suppose you invested $1000 at the start of the period. If you earned 0.763% per month by investing randomly across the whole stock market, how much would your investment be worth after 300 months? What if you earned 1.23% by investing in survivors? Why isn’t “invest in survivors” a viable investment strategy?

stock_panel <- read_csv("stock_panel.csv")

# a)


# b)


# c)


# d) 


# e)

Download this assignment

Here’s a link to download this assignment.