-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement interrupted time series method #41
Comments
I've had a look at https://causalpy.readthedocs.io/en/latest/notebooks/its_pymc.html It seems that their models are extremely simple and do not have any autoregressive components, but only intercept, numeric variables (which can model a very basic trend), and categorical variables (which can model e.g. monthly variations). We can still use this by adding lags to the input data (using pandas), thus making the model autoregressive. That way it should be possible to use time-series regression and time-series random/boosted forests. ARIMA will be harder to implement this way. I'm not sure about how ARIMA works exactly, but maybe rather than using some ARIMA model, we can get ARIMA features and then use simple regression on these features. Thoughts on this @kleinlennart ? |
Made some mistakes when using lags. Better to use a proper time-series library. Best one I know is https://github.com/Nixtla/statsforecast (+ mlforecast, neuralforecast). |
Works in principle using ARIMA. Unexpectedly, for my test data the impact begins on the day before the protest: The test data is: events = get_acled_events(
countries=["Germany"], start_date=date(2023, 6, 1), end_date=date(2024, 2, 29)
)
events = events[
events["organizations"].apply(lambda x: "Last Generation (Germany)" in x)
]
article_counts = get_mediacloud_counts(
'"Letzte Generation"', date(2023, 1, 1), date(2024, 3, 31)
) This gives us the following impacts, where the protest date is at (x-axis = date, y-axis = number of articles about Last Generation) But the impact starts at Here are the respective time series where we use predictions starting at the protest date / one day earlier / two days earlier / three days earlier / four days earlier. Note that the impact estimate is quite wrong for the first plot, because it can already predict that the number of articles is going up from the data before the protest date. protest_date = 0: protest_date = 1: protest_date = 2: protest_date = 3: protest_date = 4: The above are all woth ARIMA. With the (non-optimized!) random forest code it is similar but not as continuous: |
Interrupted time series is implemented and available via API. Still missing:
Issues to think about:
|
Todo:
|
Is mostly finished but all data sources are currently down, so the tests don't run. |
Part of #38
The text was updated successfully, but these errors were encountered: