-
Notifications
You must be signed in to change notification settings - Fork 88
Add Event Study (aka Dynamic Difference in Differences) functionality #584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #584 +/- ##
==========================================
+ Coverage 93.21% 94.50% +1.29%
==========================================
Files 35 37 +2
Lines 5511 6278 +767
Branches 358 420 +62
==========================================
+ Hits 5137 5933 +796
+ Misses 246 205 -41
- Partials 128 140 +12 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
The EventStudy class now requires a patsy-style formula to specify the outcome and fixed effects, removing the separate outcome_col argument. Design matrix construction uses patsy, and event-time dummies are appended. Input validation checks for formula presence, and tests and documentation are updated to reflect the new API and output format.
|
@cursor review |
Expanded documentation to explain the patsy formula syntax, the role of unit and time fixed effects, and how event-time dummies ($\beta_k$) are automatically constructed by the EventStudy class. Added details on the event window and reference event time parameters for clearer guidance.
Added a warning in the EventStudy class and documentation that the implementation only supports simultaneous treatment timing and does not support staggered adoption. Introduced a validation to raise a DataException if treated units have different treatment times. Added a corresponding test to ensure staggered adoption raises an error, and updated the notebook to clarify estimator limitations.
Enhanced the _bayesian_plot and _ols_plot methods in EventStudy to support configurable figure size and HDI probability. Updated docstrings to document new parameters and improved plot labeling for clarity.
Introduces event study support to effect_summary(), including parallel trends check and dynamic effect reporting. Updates event study class to allow HDI probability customization and reporting, and extends documentation with effect summary usage and interpretation.
The `generate_event_study_data` function now supports optional time-varying predictors generated as AR(1) processes, controlled by new parameters: `predictor_effects`, `ar_phi`, and `ar_scale`. Added the `generate_ar1_series` utility function. Updated docstrings and examples to reflect these changes. The event study PyMC notebook was updated with additional analysis and improved section headings.
Introduces integration tests for the EventStudy.effect_summary method using both PyMC and sklearn models. Tests verify the returned EffectSummary object, its table and text attributes, and key output elements.
|
I will take a look in the next few days :) |
|
@cursor review |
PR SummaryAdds an Event Studies section with the
Written by Cursor Bugbot for commit 786a5f8. This will update automatically on new commits. Configure here. |
|
bugbot run |
| # (staggered adoption is not currently supported) | ||
| treated_times = self.data.loc[ | ||
| ~self.data[self.treat_time_col].isna(), self.treat_time_col | ||
| ].unique() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Validation rejects valid data when using documented np.inf for controls
The docstring at line 81 states users can "Use NaN or np.inf for never-treated (control) units," but the staggered adoption validation only uses .isna() to filter out control units. Since np.inf is not detected by .isna(), control units marked with np.inf are included in treated_times. When combined with treated units having finite treatment times, .unique() returns both values (e.g., [10, inf]), causing a false DataException about staggered adoption when the data is actually valid. The same issue affects _compute_event_time() where treated_mask would incorrectly include np.inf control units.
Additional Locations (1)
| } | ||
| rows.append(row) | ||
|
|
||
| return pd.DataFrame(rows) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Unused round_to parameter in get_event_time_summary method
The round_to parameter in get_event_time_summary is documented as controlling "Number of decimals for rounding" but is never actually used in the function body. The returned DataFrame contains unrounded values from float(coeff.mean()), float(coeff.std()), and hdi computations. This means users who pass round_to=3 expecting rounded output will get full precision values instead.
| # Convert patsy output to DataFrames for manipulation | ||
| X_df = pd.DataFrame( | ||
| X, columns=X.design_info.column_names, index=self.data.index | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Index mismatch if patsy drops rows with NaN
When building the design matrix, dmatrices from patsy may drop rows containing NaN values (its default NA_action='drop' behavior). The code then creates X_df using index=self.data.index, which still has all original rows. If patsy dropped any rows, X will have fewer rows than self.data.index has elements, causing a pandas ValueError about shape mismatch. This affects users who provide data with missing values in the outcome or predictor columns used in the formula.
Added Event Study to the experiment support table in reporting_statistics.md and updated AGENTS.md to instruct updating the table when adding new experiment types.
This commit adds extensive new tests to test_reporting.py and test_synthetic_data.py, covering error handling, experiment type detection, OLS statistics edge cases, prose and table generation for various models, and all synthetic data generation utilities. These tests improve coverage and robustness for reporting and data simulation functions.
Expanded documentation in both the EventStudy class and the event study PyMC notebook to explain the equivalence between indicator functions and dummy variables. Added details on how dummy variables are constructed for each event time, the omission of the reference period to avoid multicollinearity, and the interpretation of regression coefficients as ATT at each event time.
This pull request adds support for event study analysis to the CausalPy package. The main changes include introducing the new
EventStudyclass, updating the package exports to include it, and providing a utility function for generating synthetic panel data suitable for event study and dynamic DiD analyses.Consider adding the time relative to treatment column in the notebook, not hidden in the experiment class.📚 Documentation preview 📚: https://causalpy--584.org.readthedocs.build/en/584/