---
title: "5 Fulton Fish Market with ggplot2"
format:
  html:
    self-contained: TRUE
---

Start today's class by doing the handout on ggplot2.

Then, move on to this project, where we'll review ggplot2 by finishing our replication of Kathryn Graddy’s (2006) paper *Markets: The Fulton Fish Market*.

Use this cheat sheet as a reminder of the declarative programming tools we've learned (joins are coming up soon). In this classwork, focus on the lower left decision tree for ggplot.

![](https://colleen.quarto.pub/the-beginners-econometrics-workbook/images/tidyverse_cheatsheet.jpg)

## Part 2: Ggplot Practice

```{r, message = F}
library(tidyverse)

fish <- read_csv("https://raw.githubusercontent.com/cobriant/teaching-datasets/refs/heads/main/fish.csv") %>%
  mutate(
    weekday = case_when(
      mon == 1 ~ "M",
      tues == 1 ~ "T",
      wed == 1 ~ "W",
      thurs == 1 ~ "R",
      .default = "F"
    ),
    weekday = factor(weekday, levels = c("M","T","W","R","F"))
  ) %>%
  select(price, quantity_sold, buyer_race, weekday, wind_speed, wave_height, day)

# 1) Visualize the distribution of the price of whiting.



# 2) Visualize the relationship between price (y-axis) and quantity sold (x-axis). Add a line of best fit with geom_smooth(method = lm).



# 3) Use a plot to answer the question, "Does price change based on the weekday?"


  
# 4) Use a plot to answer the question, "Does price change based on wind speed?"



# 5) Use a plot to answer the question, "Does price change based on wave height?" Add a line of best fit.



# 6) Visualize the relationship between price and buyer race.



# 7) Visualize the distribution of quantity_sold.



# 8) Use a plot to answer the question, "Do we see larger sales on some weekdays relative to others?"



# 9) Visualize the distribution of wind_speed.



# 10) Visualize the distribution of wave_height.



# 11) Use a plot to answer the question, "what is the relationship between wind speed and wave height?" Add a line of best fit.


```


## Part 3: lm Practice

We can use `lm()` to fit a **linear model** using the method of least-squares. The syntax is `lm(y ~ x + z, data = d)` for the dependent variable `y`, the explanatory variables `x` and `z`, and data set `d`. Pipe the lm object into `broom::tidy()` to get a tidied version of the output, which can be piped into other tidyverse functions in next steps.

`broom::tidy()` lets you evaluate the statistical significance of the OLS estimates 3 different (but equivalent) ways: the standard error, the t-statistic, and the p-value. Let's focus on the p-value: it tells you, under the null hypothesis (the true value of the coefficient is 0), the *probability* we will get an estimate as large as the one obtained, by chance (given how noisy the data is). If the p-value is less than 0.05, that means we can reject the null hypothesis at the 5% level: the explanatory variable seems to have a nonzero statistical relationship to the dependent variable, holding other explanatory variables constant. 

```{r}
# 1) Use lm to show that asian buyers negotiate lower fish
# prices than white buyers, holding constant the weekday,
# wind speed, and wave height.

lm(price ~ ___ + weekday + wind_speed + wave_height, data = fish) %>%
  broom::tidy()

# Interpretation: Holding constant the day of the week, 
# the wind speed, and the wave height, white buyers pay 
# on average ___ cents per pound more for fish
# compared to asian buyers, which (is/is not 
# statistically significant).

# 2. Use lm to verify that wind speed and wave height
# predict lower quantities of fish sold at the fish market,
# holding constant weekday and buyer race.



# Interpretation: When wind speed increases by 1 mile/hr,
# quantity sold decreases by ___, which (is/is not)
# statistically significant. When wave height increases
# by 1 unit, quantity sold decreases by ___, which 
# (is/is not) statistically significant.
```

Now you can use the function `ivreg` to run an **instrumental variables regression**: the shifters wind_speed and wave_height are instruments for price: they show the effect of the shifting supply curve, tracing out the demand curve. The coefficient on `log(price)` is interpreted as the percent increase in the Y variable for a one percent increase in the X variable. That is, the coefficient on `log(price)` is the elasticity. An elasticity of -1 means that when price increases by 1%, quantity demanded decreases by 1%. An elasticity of -2 means that when price increases by 1%, quantity demanded decreases by 2%: the consumers are elastic. An elasticity of -0.5 means that when price increases by 1%, quantity demanded decreases by 0.5%: consumers are inelastic.

```{r, eval = F}
install.packages("ivreg")
```

```{r}
library(ivreg)

# 3) Use ivreg to estimate the price elasticity of demand 
# for asian buyers and then white buyers.

fish %>%
  filter(buyer_race == ___) %>%
  ivreg(log(quantity_sold) ~ log(price) + weekday | wave_height + wind_speed + weekday, data = .) %>%
  broom::tidy()

fish %>%
  filter(buyer_race == ___) %>%
  ivreg(log(quantity_sold) ~ log(price) + weekday | wave_height + wind_speed + weekday, data = .) %>%
  broom::tidy()

# Interpretation: 
# - When price increases by 1%, white buyers demand ___% fewer fish.
# - When price increases by 1%, asian buyers demand ___% fewer fish.
# - Perfectly inelastic buyers have a price elasticity of
#   0: when price increases by 1%, perfectly inelastic
#.  buyers demand 0% fewer fish (their demand stays exactly
#.  the same).
# - Perfectly elastic buyers have a price elasticity of
#.  infinity: when price increases by 1%, perfectly elastic
#.  buyers demand infinity % fewer fish (their demand goes
#.  to 0).
# - Asian buyers respond more to price increases than 
#.  white buyers: their demand is more (elastic/inelastic).
#.  This makes sense because asian buyers resell fish in
#.  low-income neighborhoods or use it for products like
#.  fishballs, so are very sensitive to price changes. But
#.  white buyers are more likely to sell to high-end
#.  restaurants or suburban retailers, so they have less
#.  elastic demand because they can pass higher costs on to
#.  their customers.
```
