The purpose of this in-class lab is to use R to practice with two-stage least squares (2SLS) estimation. The lab should be completed in your group. To get credit, upload your .R script to the appropriate place on Canvas.
Open up a new R script (named ICL13_XYZ.R
, where XYZ
are your initials) and add the usual “preamble” to the top:
# Add names of group members HERE
library(tidyverse)
library(wooldridge)
library(broom)
library(AER)
library(magrittr)
library(estimatr)
library(modelsummary)
library(vtable)
We’re going to use data on working women.
df <- as_tibble(mroz)
Like last time, let’s use stargazer
to get a quick view of our data:
df %>% sumtable(out="return")
wage
and lwage
have 428 observations, but all of the other variables have 753 observations?Using the filter()
and is.na()
functions (or drop_na()
function), drop the observations with missing wages. (I suppress the code, since you should know how to do this.)
We want to estimate the return to education for women who are working, using mother’s and father’s education as instruments: \[ \log(wage) = \beta_0 + \beta_1 educ + \beta_2 exper + \beta_3 exper^2 + u \] where \(wage\) is the hourly rate of pay, \(educ\) is years of education, and \(exper\) is labor market experience (in years).
Let’s estimate the first stage regression, which is a regression of the endogenous variable (\(educ\)) on the instrument(s) (\(motheduc\) and \(fatheduc\)) and the exogenous explanatory variables (\(exper\) and \(exper^2\)).1
Run this regression (again, I suppress the code). Call the estimation object est.stage1
.
linearHypothesis(est.stage1,c("motheduc","fatheduc"))
In the second stage, we estimate the log wage equation above, but this time we include \(\widehat{educ}\) on the right hand side instead of \(educ\), where \(\widehat{educ}\) are the fitted values from the first stage.
In R, we can easily access the fitted values by typing fitted(est.stage1)
.
Let’s estimate the second stage regression:
est.stage2 <- lm(log(wage) ~ fitted(est.stage1) + exper + I(exper^2), data=df)
The standard errors from the above second stage regression will be incorrect.2 Instead, we should estimate these at the same time. We could do this with the ivreg()
function, just like in the previous lab. We could also use iv_robust()
from the estimatr
package, which will give us robust SEs.
est.2sls <- iv_robust(log(wage) ~ educ + exper + I(exper^2) |
motheduc + fatheduc + exper + I(exper^2),
data=df)
modelsummary(list("OLS" = est.ols, "IV By Hand" = est.stage2, "IV Automatic" = est.2sls))
Note that you can easily include the quadratic in experience as I(exper^2)
without having to create this variable in a mutate()
statement.↩︎
The reason is that error term in the second stage regression includes the residuals from the first stage, but the default standard errors fail to take this into account.↩︎