The purpose of this in-class lab is to use R to practice with instrumental variables estimation. The lab should be completed in your group. To get credit, upload your .R script to the appropriate place on Canvas.
You may need to install the packages
AER may have already been installed when you previously installed
Open up a new R script (named
XYZ are your initials) and add the usual “preamble” to the top:
# Add names of group members HERE library(tidyverse) library(wooldridge) library(broom) library(AER) library(magrittr) library(modelsummary)
We’re going to use data on fertility of Botswanian women.
df <- as_tibble(fertil2)
Let’s look at summary statistics of our data by using the
modelsummary package. We can export this to a word document format if we’d like:
df %>% datasummary_skim(histogram=F,output="myfile.docx")
##  "myfile.docx"
Suppose we want to see if education causes lower fertility (as can be seen when comparing more- and less-educated countries): \[ children = \beta_0 + \beta_1 educ + \beta_2 age + \beta_3 age^2 + u \] where \(children\) is the number of children born to the woman, \(educ\) is years of education, and \(age\) is age (in years).
est.ols <- lm(children ~ educ + age + I(age^2), data=df)
I(age^2) puts the quadratic term in automatically without us having to use
mutate() to create a new variable called
We can also use
modelsummary to examine the output. It puts the standard errors of each variable in parentheses under the estimated coefficient.
We know that education is endogenous (i.e. people choose the level of education that maximizes their utility). A possible instrument for education is \(firsthalf\), which is a dummy equal to 1 if the woman was born in the first half of the calendar year, and 0 otherwise.
Let’s create this variable:
df %<>% mutate(firsthalf = mnthborn<7)
We will assume that \(firsthalf\) is uncorrelated with \(u\).
Now let’s do the IV regression:
est.iv <- ivreg(children ~ educ + age + I(age^2) | firsthalf + age + I(age^2), data=df)
The variables on the right hand side of the
| are the instruments (including the \(x\)’s that we assume to be exogenous, like \(age\)). The endogenous \(x\) is the first one after the
Now we can compare the output for each of the models:
|Model 1||Model 2||Model 3|
We can also save the output of
modelsummary() to an image, a text file or something else:
## save_kable will have the best result with magick installed.