Regression Discontinuity Design

Load necessary packages

rm(list=ls())
# library(devtools)
# install_github("naoki-egami/DIDdesign", dependencies = TRUE)

## load package
library(DIDdesign)
library(tidyverse)

load data

data(anzia2012)

The Basic Difference-in-Differences Design with Panel Data

In the basic DID design, units receive the treatment at the same time. In anzia2012 dataset, the treatment assignment happens in 2007.

Assessing the output from did_check()

The output from did_check() function can be accessed by summary() function, which reports estimates for the pre-treatment parallel trends as well as the 95% standardized equivalence confidence interval.

## view estimates
summary(check_panel)
## ── Estimates for assessing parallel trends assumption ──────────────────────────
##    estimate lag std.error EqCI95_LB EqCI95_UB
## 1 -0.003613   1   0.00265   -0.1018    0.1018
## 2  0.003263   2   0.00231   -0.0926    0.0926
## 3 -0.000434   3   0.00271   -0.0674    0.0674
## visualize the estimates
plot(check_panel)

# Step 2: Estimate the treatment effect with the double DID estimator

After assessing the parallel trends assumption with did_check(), we can estimate the average treatment effect on the treated (ATT) via did().

## estimate treatment effect
set.seed(1234)
fit_panel <- did(
  formula = lnavgsalary_cpi ~ oncycle | teachers_avg_yrs_exper +
    ami_pc + asian_pc + black_pc + hisp_pc,
  data = anzia2012,
  id_unit = "district",
  id_time = "year",
  design = "did",
  is_panel = TRUE,
  option = list(n_boot = 200, parallel = TRUE, lead = 0:2, se_boot = TRUE)
)

Assessing the output from did()

Users can obtain the estimates via summary() function.

## view the estimates
summary(fit_panel)
## ── ATT Estimates ───────────────────────────────────────────────────────────────
##    estimator lead estimate std.error statistic p_value  ci.low  ci.high
## 1 Double-DID    0  -0.0065    0.0026      -2.5  0.0131 -0.0109 -0.00069
## 2        DID    0  -0.0062    0.0027      -2.3  0.0194 -0.0106 -0.00028
## 3       sDID    0  -0.0044    0.0044      -1.0  0.3179 -0.0114  0.00404
## 4 Double-DID    1  -0.0079    0.0032      -2.4  0.0146 -0.0134 -0.00136
## 5        DID    1  -0.0115    0.0036      -3.2  0.0016 -0.0173 -0.00371
## 6       sDID    1  -0.0031    0.0039      -0.8  0.4260 -0.0110  0.00513
## 7 Double-DID    2  -0.0049    0.0043      -1.1  0.2502 -0.0134  0.00202
## 8        DID    2  -0.0115    0.0049      -2.3  0.0196 -0.0206 -0.00164
## 9       sDID    2   0.0015    0.0049       0.3  0.7664 -0.0085  0.00932

Or even better, we can plot the event study plot with leads and lags

# plot only treatment effects
post_plot <- plot(fit_panel)

# plot treatment effects + pre-treatment assessment
pre_post_plot <- plot(fit_panel, check_fit = check_panel)

## show the plots side-by-side
require(patchwork)
(post_plot +
  ggplot2::theme(aspect.ratio = 1) +
  ggplot2::ylim(-0.015, 0.01) +
  ggplot2::labs(title = "Post-Treatment")) +
  (pre_post_plot +
    ggplot2::theme(aspect.ratio = 1) +
    ggplot2::ylim(-0.015, 0.01) +
    ggplot2::labs(title = "Pre- and Post-Treatment"))

The Staggered Adoption Design

DIDdesign supports the staggered adoption design where units receive the treatment at different periods of time. As an example, we analyze paglayan2019 dataset in the package (see ?paglayan2019 for more details about this dataset).

## data
require(dplyr)
require(tibble)

## format dataset
paglayan2019 <- paglayan2019 %>%
  filter(!(state %in% c("WI", "DC"))) %>%
  mutate(
    id_time = year,
    id_subject = as.numeric(as.factor(state)),
    log_expenditure = log(pupil_expenditure + 1),
    log_salary = log(teacher_salary + 1)
  )

Step 2: Estimate staggered-adoption average treatment effect

did() function can handle the staggered adoption design by setting the design argument to design = "sa".

## estimate time-weighted SA-ATE
set.seed(1234)
fit_sa <- did(
  formula = log_expenditure ~ treatment,
  data    = paglayan2019,
  id_unit = "id_subject",
  id_time = "id_time",
  design  = "sa",
  option  = list(n_boot = 200, parallel = TRUE, thres = 1, lead = 0:9)
)

## plot treatment effects + assessment statistic
sa_plot <- plot(fit_sa, check_sa, band = TRUE)

## show plot
sa_plot +
  ggplot2::ylim(-0.1, 0.1) +
  ggplot2::geom_vline(xintercept = 0, color = "red", linetype = "dotted")