Longitudinal causal inference
A lot of real questions are not single decisions. A patient is seen, treated, returns, seen again, treated again. By visit two, what we observed is partly the consequence of visit one's treatment, and the same variable is at once a confounder for the next decision and a mediator of the previous one. Cross-sectional tools quietly miss this.
Same outcome, different graph. Conditioning on the time-varying covariate is not optional in the longitudinal world, and adjusting for it the wrong way blocks the very effect you're trying to measure.
One decision, one outcome.
L₂ is post-treatment for A₁ and pre-treatment for A₂.
The estimand grows with the decision sequence. Click each card to see the math.
A two-stage longitudinal DGP, intentionally gentle. The propensity barely depends on the covariates, so the naive (regime-matching) mean is close to the truth here. With stronger time-varying confounding the gap widens, sometimes flipping the sign. The point of the sandbox is the workflow, not the bias.
A two-stage longitudinal DGP with time-varying confounding. Pick a regime, redraw the sample, and watch the bars settle around the oracle truth.
Oracle and naive recompute on each redraw. The LTMLE point estimate is a snapshot from the kit's RMD output (SL.glm, seed=123, n=1,000); download the kit to run LTMLE on your own data.
The same code that produced the snapshot above. Drop in your own dataset and re-knit.
Synthetic dataset, 1000 rows, columns L1, A1, L2, A2, Y
Six-column data dictionary
Runnable Rmd: naive contrasts, LTMLE for three regimes, oracle truth
Pre-knitted HTML preview (embedded below)
Coming next
We're going to swap the toy DGP out for a use case we're actively working on, with real time-varying confounding and a clean naive-vs-LTMLE contrast. This space is held for it.