Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Adaptive Conformal Inference method for Time Series #341

Merged
Merged
Show file tree
Hide file tree
Changes from 102 commits
Commits
Show all changes
116 commits
Select commit Hold shift + click to select a range
237f8a6
Ajout ACI
Aug 8, 2023
21e8e80
FIX : Correction passages lint, type check.
Aug 11, 2023
96d47d0
FIX: fix type-check and add warning
thibaultcordier Aug 11, 2023
8647ec1
UPD: format code and update metainfo
thibaultcordier Aug 11, 2023
f6f171e
FIX: scikit-learn error
thibaultcordier Aug 11, 2023
1897b91
FIX: take account conformity score function
thibaultcordier Aug 11, 2023
e5e0400
FIX: redefine previous tests for beta optimization
thibaultcordier Aug 11, 2023
20b49e8
ajout metrics cwc
QMaphan Aug 21, 2023
0c18c9f
FIX: fix type-check
QMaphan Aug 22, 2023
ec0b07f
FIX: fix type check
QMaphan Aug 22, 2023
e13771d
FIX= fix test_metrics
QMaphan Aug 23, 2023
6b50c65
FIX: fix test_metrics
QMaphan Aug 23, 2023
34ff553
FIX: fix type_check
QMaphan Aug 23, 2023
35e620f
fix: correction erreur typing
QMaphan Aug 23, 2023
3db5ad2
FIX: fix static type and metrics
QMaphan Aug 24, 2023
24ae12c
FIX: fix linting
QMaphan Aug 24, 2023
21448bf
FIX: fix typing
QMaphan Aug 24, 2023
4bc21f8
FIX: pytest
QMaphan Aug 24, 2023
2f839e1
FIX: pytest
QMaphan Aug 24, 2023
f87c4e7
FIX: fix pytest
QMaphan Aug 24, 2023
69dbb27
FIX: pytest
QMaphan Aug 24, 2023
8894bf5
FIX: Pytest
QMaphan Aug 24, 2023
9cbe0f4
FIX: pytest
QMaphan Aug 24, 2023
06c41d5
FIX: correction coverage utils and test_metrics
QMaphan Aug 24, 2023
c97cf57
FIX: fix coverage test
QMaphan Aug 25, 2023
4b6e6b0
Apply suggestions from code review (clean code)
thibaultcordier Aug 29, 2023
38e6979
FIX: remove blank lines
thibaultcordier Aug 29, 2023
b410c1a
FIX: change reference value
thibaultcordier Aug 29, 2023
d04f987
TMP: add empty test for coverage
thibaultcordier Aug 29, 2023
11e6a4d
Merge remote-tracking branch 'origin/master' into 334-adaptive-confor…
thibaultcordier Aug 29, 2023
db56bbe
FIX: lint error
thibaultcordier Aug 29, 2023
ec823ac
FIX: adapt valid test with no return
thibaultcordier Aug 29, 2023
0112b86
FIX: convert pandas series to numpy array
thibaultcordier Aug 29, 2023
5739c07
FIX: use existing metrics instead of picp & pinaw
QMaphan Sep 13, 2023
f4722cf
Fix: lint
QMaphan Sep 13, 2023
fdc70b1
Fix: lint
QMaphan Sep 13, 2023
69416f4
Fix : Orignal Paper
QMaphan Sep 13, 2023
a606c8d
FIX: ACI aci function and tutorial
QMaphan Sep 26, 2023
8bb352c
Fix : Correct metrics cwc and exemple
QMaphan Sep 27, 2023
63354ae
FIX : Lint
QMaphan Sep 27, 2023
670f213
Fix : plot_ts-tutorial comment and plot
QMaphan Sep 27, 2023
389daa4
FIX: mypy "ndarray[Any, dtype[Any]]" has no attribute "values"
QMaphan Sep 27, 2023
d6aafaa
Merge branch 'master' into 334-adaptive-conformal-predictions-for-tim…
thibaultcordier Sep 27, 2023
327b05f
FIX: lint error after merging
thibaultcordier Sep 27, 2023
0e94e74
Fix: Doc error
QMaphan Sep 27, 2023
20720a3
Update examples/regression/2-advanced-analysis/plot-coverage-width-ba…
QMaphan Sep 29, 2023
0697a57
Update examples/regression/4-tutorials/plot_ts-tutorial.py
QMaphan Sep 29, 2023
602b300
Update examples/regression/4-tutorials/plot_ts-tutorial.py
QMaphan Sep 29, 2023
de003f8
Update mapie/metrics.py
QMaphan Sep 29, 2023
d932b17
Update mapie/metrics.py
QMaphan Sep 29, 2023
1abda74
Update mapie/tests/test_metrics.py
QMaphan Sep 29, 2023
453632e
Update mapie/tests/test_metrics.py
QMaphan Sep 29, 2023
2dded22
Update mapie/regression/time_series_regression.py
QMaphan Sep 29, 2023
a3083ed
Update mapie/utils.py
QMaphan Sep 29, 2023
0987952
Update mapie/regression/time_series_regression.py
QMaphan Sep 29, 2023
e6b3164
Update mapie/regression/time_series_regression.py
QMaphan Sep 29, 2023
c5e96f8
Update mapie/regression/time_series_regression.py
QMaphan Sep 29, 2023
143a908
Update mapie/regression/time_series_regression.py
QMaphan Sep 29, 2023
0166740
Update mapie/regression/time_series_regression.py
QMaphan Sep 29, 2023
cf80033
FIX: remove math latex in docstring
Sep 29, 2023
a3eb38d
FIX: correction documentation
QMaphan Sep 29, 2023
51274ae
Merge branch '334-adaptive-conformal-predictions-for-time-series' of …
QMaphan Sep 29, 2023
444d4e6
FIX: test error correction, raise wrong TS attribute
QMaphan Sep 29, 2023
6b9b929
FIX: doc maths equation
QMaphan Oct 2, 2023
6c6cdd2
Update mapie/tests/test_time_series_regression.py
QMaphan Oct 2, 2023
23a8aff
Update mapie/regression/time_series_regression.py
QMaphan Oct 2, 2023
9fdfdbb
Update exoplanets.ipynb
QMaphan Oct 2, 2023
6923e4c
Update exoplanets.ipynb
QMaphan Oct 2, 2023
e06ff19
Update tutorial_multilabel_classification_precision.ipynb
QMaphan Oct 2, 2023
332faec
Update tutorial_multilabel_classification_precision.ipynb
QMaphan Oct 2, 2023
c35cca3
Update tutorial_multilabel_classification_precision.ipynb
QMaphan Oct 2, 2023
ba06fb8
Update tutorial_regression.ipynb
QMaphan Oct 2, 2023
7d1aba2
Update ts-changepoint.ipynb
QMaphan Oct 4, 2023
ed03d3d
FIx: Correction based on Louis Lacombe remark
QMaphan Oct 23, 2023
6114566
Fix: Last update CP to aci
QMaphan Oct 23, 2023
49dbfbf
FIX: Lint error
QMaphan Oct 24, 2023
99ea1b4
FIX: mypy error, and correct doc writing
QMaphan Oct 25, 2023
0a9896e
Fix: lint error
QMaphan Oct 25, 2023
73746e3
Merge branch 'master' into 334-adaptive-conformal-predictions-for-tim…
thibaultcordier Nov 10, 2023
6ce4d77
UPD: latex formula
Nov 10, 2023
8845539
FIX: remove problematic formula
Nov 10, 2023
a236b77
Merge branch 'numpy-version-patch' into 334-adaptive-conformal-predic…
Nov 13, 2023
504b75f
UPD: improve metric tests
Nov 13, 2023
6bebaad
FIX: better value error raise
Nov 13, 2023
64bf885
UPD: improve docstring
Nov 13, 2023
ab76b99
UPD: style of test
Nov 13, 2023
f259f4f
UPD: format code example
Nov 13, 2023
f6a6e1d
UPD: better docstring
Nov 13, 2023
721d676
FIX: change mu to 1-alpha
Nov 14, 2023
bd013b9
FIX: ssl default
Nov 14, 2023
8052192
UPD: remove large files and transform nb to py file
Nov 14, 2023
60e8138
FIX: typo in results
Nov 14, 2023
fb848d4
UPD: add a header docstring
Nov 14, 2023
379db90
FIX: lint error
Nov 14, 2023
0d529ec
UPDATE : Add training with steps notebook ACI Zaffran
Dec 11, 2023
6c00dcf
UPD: iterative training notebook ACI Zaffran
Dec 11, 2023
75cc0a0
DEL: deplicated notebook wrt existing python file
Dec 12, 2023
2a055ec
Merge branch 'master' into 334-adaptive-conformal-predictions-for-tim…
thibaultcordier Dec 21, 2023
6d4b75b
FIX: lint
thibaultcordier Dec 21, 2023
c0d4337
UPD: consolidate ACI + reproduce experimental results
thibaultcordier Dec 21, 2023
31ea28d
UPD: adapt the aci preprocess + add corresponding tests
Dec 21, 2023
1e1911f
FIX: hyperlinks
thibaultcordier Dec 21, 2023
4bf1da2
UPD: else condition in update
Dec 21, 2023
e8c2ba4
UPD: change gamma value error message
Dec 21, 2023
176bbaf
UPD: change name of alpha method
Dec 21, 2023
e83389d
UPD: docstring
Dec 21, 2023
fc0522f
UPD: modify the user warning with a better function call
Dec 21, 2023
5a1c3ab
UPD: add DeprecationWarning for partial_fit
thibaultcordier Dec 21, 2023
b4c3029
FIX: add warnings and better indent
Dec 21, 2023
a0587a0
FIX: remove useless parameter
Dec 21, 2023
765b23f
UPD: change docstring and section names
Dec 21, 2023
a2fbd3a
UPD: allow infinite prediction intervals to be produced in regressor …
Dec 21, 2023
505a500
UPD: ACI doctring
thibaultcordier Dec 22, 2023
6b56953
UPD: add plot compared results
Dec 22, 2023
e6e7529
FIX: shorter hyperlink
thibaultcordier Jan 3, 2024
116eb2d
FIX: remove hyperlink
thibaultcordier Jan 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions AUTHORS.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,9 @@ Contributors
* Daniel Herbst <daniel.herbst@tum.de>
* Candice Moyet <cmoyet@quantmetry.com>
* Sofiane Ziane <sziane@quantmetry.com>
* Remi Colliaux <rcolliaux@quantmetry.com>
* Arthur Phan <aphan@quantmetry.com>
* Rafael Saraiva <rafael.saraiva.de@gmail.com>
* Mehdi Elion <mehdi.elion@gmail.com>

To be continued ...
2 changes: 2 additions & 0 deletions HISTORY.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ History

##### (##########)
------------------
* Add the Adaptative Conformal Inference (ACI) method for MapieTimeSeriesRegressor.
* Add the Coverage Width-based Criterion (CWC) metric.
* Allow to use more split methods for MapieRegressor (ShuffleSplit, PredefinedSplit).
* Integrate ConformityScore into MapieTimeSeriesRegressor.
* Add (extend) the optimal estimation strategy for the bounds of the prediction intervals for regression via ConformityScore.
Expand Down
2 changes: 1 addition & 1 deletion doc/notebooks_regression.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ This section lists a series of Jupyter notebooks hosted on the MAPIE Github repo
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------


3. Estimating prediction intervals for time series forecast with EnbPI : `notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/regression/ts-changepoint.ipynb>`_
3. Estimating prediction intervals for time series forecast with EnbPI and ACI : `notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/regression/ts-changepoint.ipynb>`_
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


thibaultcordier marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,249 @@
"""
================================================
Estimating coverage width based criterion
================================================
This example uses :class:`~mapie.regression.MapieRegressor`,
:class:`~mapie.quantile_regression.MapieQuantileRegressor` and
:class:`~mapie.metrics` is used to estimate the coverage width
based criterion of 1D homoscedastic data using different strategies.
The coverage width based criterion is computed with the function
:func:`~mapie.metrics.coverage_width_based()`
"""

import os
import warnings

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression, QuantileRegressor
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
from mapie.metrics import (coverage_width_based, regression_coverage_score,
regression_mean_width_score)
from mapie.regression import MapieQuantileRegressor, MapieRegressor
from mapie.subsample import Subsample

os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"
warnings.filterwarnings("ignore")


##############################################################################
# Estimating the aleatoric uncertainty of heteroscedastic noisy data
# ---------------------------------------------------------------------
#
# Let's define again the :math:`x \times \sin(x)` function and another simple
# function that generates one-dimensional data with normal noise uniformely
# in a given interval.

def x_sinx(x):
"""One-dimensional x*sin(x) function."""
return x*np.sin(x)


def get_1d_data_with_heteroscedastic_noise(
funct, min_x, max_x, n_samples, noise
):
"""
Generate 1D noisy data uniformely from the given function
and standard deviation for the noise.
"""
np.random.seed(59)
X_train = np.linspace(min_x, max_x, n_samples)
np.random.shuffle(X_train)
X_test = np.linspace(min_x, max_x, n_samples*5)
y_train = (
funct(X_train) +
(np.random.normal(0, noise, len(X_train)) * X_train)
)
y_test = (
funct(X_test) +
(np.random.normal(0, noise, len(X_test)) * X_test)
)
y_mesh = funct(X_test)
return (
X_train.reshape(-1, 1), y_train, X_test.reshape(-1, 1), y_test, y_mesh
)


##############################################################################
# We first generate noisy one-dimensional data uniformely on an interval.
# Here, the noise is considered as *heteroscedastic*, since it will increase
# linearly with :math:`x`.

min_x, max_x, n_samples, noise = 0, 5, 300, 0.5
(
X_train, y_train, X_test, y_test, y_mesh
) = get_1d_data_with_heteroscedastic_noise(
x_sinx, min_x, max_x, n_samples, noise
)

##############################################################################
# Let's visualize our noisy function. As x increases, the data becomes more
# noisy.

plt.xlabel("x")
plt.ylabel("y")
plt.scatter(X_train, y_train, color="C0")
plt.plot(X_test, y_mesh, color="C1")
plt.show()

##############################################################################
# As mentioned previously, we fit our training data with a simple
# polynomial function. Here, we choose a degree equal to 10 so the function
# is able to perfectly fit :math:`x \times \sin(x)`.

degree_polyn = 10
polyn_model = Pipeline(
[
("poly", PolynomialFeatures(degree=degree_polyn)),
("linear", LinearRegression())
]
)
polyn_model_quant = Pipeline(
[
("poly", PolynomialFeatures(degree=degree_polyn)),
("linear", QuantileRegressor(
solver="highs",
alpha=0,
))
]
)

##############################################################################
# We then estimate the prediction intervals for all the strategies very easily
# with a `fit` and `predict` process. The prediction interval's lower and
# upper bounds are then saved in a DataFrame. Here, we set an alpha value of
# 0.05 in order to obtain a 95% confidence for our prediction intervals.

STRATEGIES = {
"naive": dict(method="naive"),
"jackknife": dict(method="base", cv=-1),
"jackknife_plus": dict(method="plus", cv=-1),
"jackknife_minmax": dict(method="minmax", cv=-1),
"cv": dict(method="base", cv=10),
"cv_plus": dict(method="plus", cv=10),
"cv_minmax": dict(method="minmax", cv=10),
"jackknife_plus_ab": dict(method="plus", cv=Subsample(n_resamplings=50)),
"conformalized_quantile_regression": dict(
method="quantile", cv="split", alpha=0.05
)
}
y_pred, y_pis = {}, {}
for strategy, params in STRATEGIES.items():
if strategy == "conformalized_quantile_regression":
mapie = MapieQuantileRegressor(polyn_model_quant, **params)
mapie.fit(X_train, y_train, random_state=1)
y_pred[strategy], y_pis[strategy] = mapie.predict(X_test)
else:
mapie = MapieRegressor(polyn_model, **params)
mapie.fit(X_train, y_train)
y_pred[strategy], y_pis[strategy] = mapie.predict(X_test, alpha=0.05)


##############################################################################
# Once again, let’s compare the target confidence intervals with prediction
# intervals obtained with the Jackknife+, Jackknife-minmax, CV+, CV-minmax,
# Jackknife+-after-Boostrap, and CQR strategies.

def plot_1d_data(
X_train,
y_train,
X_test,
y_test,
y_sigma,
y_pred,
y_pred_low,
y_pred_up,
ax=None,
title=None
):
ax.set_xlabel("x")
ax.set_ylabel("y")
ax.fill_between(X_test, y_pred_low, y_pred_up, alpha=0.3)
ax.scatter(X_train, y_train, color="red", alpha=0.3, label="Training data")
ax.plot(X_test, y_test, color="gray", label="True confidence intervals")
ax.plot(X_test, y_test - y_sigma, color="gray", ls="--")
ax.plot(X_test, y_test + y_sigma, color="gray", ls="--")
ax.plot(
X_test, y_pred, color="blue", alpha=0.5, label="Prediction intervals"
)
if title is not None:
ax.set_title(title)
ax.legend()


strategies = [
"jackknife_plus",
"jackknife_minmax",
"cv_plus",
"cv_minmax",
"jackknife_plus_ab",
"conformalized_quantile_regression"
]
n_figs = len(strategies)
fig, axs = plt.subplots(3, 2, figsize=(9, 13))
coords = [axs[0, 0], axs[0, 1], axs[1, 0], axs[1, 1], axs[2, 0], axs[2, 1]]
for strategy, coord in zip(strategies, coords):
plot_1d_data(
X_train.ravel(),
y_train.ravel(),
X_test.ravel(),
y_mesh.ravel(),
(1.96*noise*X_test).ravel(),
y_pred[strategy].ravel(),
y_pis[strategy][:, 0, 0].ravel(),
y_pis[strategy][:, 1, 0].ravel(),
ax=coord,
title=strategy
)
plt.show()


##############################################################################
# Let’s now conclude by summarizing the *effective* coverage, namely the
# fraction of test
# points whose true values lie within the prediction intervals, given by
# the different strategies.

coverage_score = {}
width_mean_score = {}
cwc_score = {}

for strategy in STRATEGIES:
coverage_score[strategy] = regression_coverage_score(
y_test,
y_pis[strategy][:, 0, 0],
y_pis[strategy][:, 1, 0]
)
width_mean_score[strategy] = regression_mean_width_score(
y_pis[strategy][:, 0, 0],
y_pis[strategy][:, 1, 0]
)
cwc_score[strategy] = coverage_width_based(
y_test,
y_pis[strategy][:, 0, 0],
y_pis[strategy][:, 1, 0],
eta=0.001,
alpha=0.05
)

results = pd.DataFrame(
[
[
coverage_score[strategy],
width_mean_score[strategy],
cwc_score[strategy]
] for strategy in STRATEGIES
],
index=STRATEGIES,
columns=["Coverage", "Width average", "Coverage Width-based Criterion"]
).round(2)

print(results)


##############################################################################
# All the strategies have the wanted coverage, however, we notice that the CQR
# strategy has much lower interval width than all the other methods, therefore,
# with heteroscedastic noise, CQR would be the preferred method.
Loading
Loading