The purpose of this in-class lab is to practice conducting joint hypothesis tests of regression parameters in R. We will do this using t-tests and F-tests. The lab may be completed in your group, but each group member should submit their own copy. To get credit, upload your .R script to the appropriate place on Canvas.

## For starters

Open up a new R script (named ICL8_XYZ.R, where XYZ are your initials) and add the usual “preamble” to the top:

# Add names of group members HERE
library(tidyverse)
library(broom)
library(wooldridge)
library(car)
library(magrittr)
library(modelsummary)

We’ll use a data set on earnings and ability, called htv. The data set contains a sample of 1,230 workers.

df <- as_tibble(htv)

Check out what’s in the data by typing

datasummary_df(df)
datasummary_skim(df,histogram=FALSE)

The main variables we’re interested in are: wages, education, ability, parental education, and region of residence (ne, nc, west, and south).

### Create regional factor variable

Let’s start by creating a factor variable from the four regional dummies. Borrowing code from lab 6, we have:

df %<>% mutate(region = case_when(ne==1 ~ "Northeast",
nc==1 ~ "NorthCentral",
west==1 ~ "West",
south==1 ~ "South")) %>%
mutate(region = factor(region))

## Regression and Hypothesis Testing

Estimate the following regression model: $educ = \beta_0 + \beta_1 motheduc + \beta_2 fatheduc + \beta_3 abil + \beta_4 abil^2 + \beta_5 region + u$ Note that $$abil$$ is in standard deviation units. You will need to use a mutate() function to create $$abil^2$$ (not shown here). Call it abilsq. $$region$$ represents the factor variable you created above.1

est <- lm(educ ~ motheduc + fatheduc + abil + abilsq + region, data=df)
tidy(est)
modelsummary(est)

### t-test

1. Test the hypothesis that $$abil$$ has a linear effect on $$educ$$.

### F-test (single parameter)

1. Now test that $$motheduc$$ and $$fatheduc$$ have equal effects on $$educ$$. In other words, test $$H_0: \beta_1=\beta_2; H_a: \beta_1 \neq \beta_2$$. To do this, you will need to obtain $$se(\beta_1 - \beta_2)$$. Luckily, R will do this for you with the linearHypothesis() function in the car package:
linearHypothesis(est, "motheduc = fatheduc")

The resulting p-value is that of an F test, but one would get an identical result by using a t-test, since this is a simple hypothesis (see Wooldridge (2015), pp. 125-126).

### F-test (multiple parameters)

The p-values from the previous regression might indicate that the three region dummies don’t contribute to education.

1. Test the hypothesis that they don’t; i.e. test $H_0: \text{all region dummies}=0; \\ H_a: \text{any region dummy}\neq 0$

The code to do this again comes from the linearHypothesis() function. The syntax is to encolose each component hypothesis in quotes and then surround them with c(), which is how R creates vectors.

linearHypothesis(est, c("regionNortheast=0", "regionSouth=0", "regionWest =0"))

or, more simply,

linearHypothesis(est, matchCoefs(est,"region"))

Alternatively, you can perform the F-test as follows (no need to put this in your R-script; I’m just showing you how to do it “by hand”):

est.restrict <- lm(educ ~ motheduc + fatheduc + abil + abilsq, data=df)
Fstat.numerator   <- (deviance(est.restrict)-deviance(est))/3
Fstat.denominator <- deviance(est)/1222
Fstat <- Fstat.numerator/Fstat.denominator
p.value <- 1-pf(Fstat,3,1222)

This gives the exact same answer as the linearHypothesis() code.

# References

Wooldridge, Jeffrey M. 2015. Introductory Econometrics: A Modern Approach. 6th ed. Cengage Learning.

1. Here the notation of $$\beta_5 region$$ is not quite right. It more technically should be written $$\beta_5 region.NE + \beta_6 region.S + \beta_7 region.W$$, where each of the $$region.X$$ variables is a dummy. The way it is written above, $$\beta_5 region$$ implies that $$\beta_5$$ is a vector, not a scalar.↩︎