unquad is a wrapper applicable for most PyOD detectors (see Supported Estimators) enabling uncertainty-quantified anomaly detection based on one-class classification and the principles of conformal inference.
pip install unquad
Mind the optional dependencies for using deep learning models or the built-in datasets (see. pyproject.toml).
Conformal Anomaly Detection applies the principles of conformal inference (conformal prediction) to anomaly detection. Conformal Anomaly Detection focuses on controlling error metrics like the false discovery rate, while maintaining statistical power.
CAD converts anomaly scores to p-values by comparing test data scores against calibration scores from normal training data. The resulting p-value of the test score(s) is computed as the normalized rank among the calibration scores. These statistically valid p-values enable error control through methods like Benjamini-Hochberg, replacing traditional anomaly estimates that lack any kind of statistical guarantee.
Using the default behavior of ConformalDetector()
with default DetectorConfig()
.
from pyod.models.gmm import GMM
from unquad.utils.enums import Dataset
from unquad.data.loader import DataLoader
from unquad.estimation.conformal import ConformalDetector
from unquad.strategy.split import Split
from unquad.utils.metrics import false_discovery_rate, statistical_power
dl = DataLoader(dataset=Dataset.SHUTTLE)
x_train, x_test, y_test = dl.get_example_setup(random_state=1)
ce = ConformalDetector(
detector=GMM(),
strategy=Split(calib_size=1_000)
)
ce.fit(x_train)
estimates = ce.predict(x_test)
print(f"Empirical FDR: {false_discovery_rate(y=y_test, y_hat=estimates)}")
print(f"Empirical Power: {statistical_power(y=y_test, y_hat=estimates)}")
Output:
Empirical FDR: 0.108
Empirical Power: 0.892
The behavior can be customized by changing the DetectorConfig()
:
@dataclass
class DetectorConfig:
alpha: float = 0.2 # Nominal FDR value
adjustment: Adjustment = Adjustment.BH # Multiple Testing Procedure
aggregation: Aggregation = Aggregation.MEDIAN # Score Aggregation (if necessary)
seed: int = 1
silent: bool = True
Using ConformalDetector()
with customized DetectorConfig()
.
The BootstrapConformal()
strategy allows to set 2 of the 3 parameters resampling_ratio
, n_boostraps
and n_calib
.
For either combination, the remaining parameter will be filled automatically. This allows exact control of the
calibration procedure when using a bootstrap strategy.
from pyod.models.iforest import IForest
from unquad.data.loader import DataLoader
from unquad.estimation.properties.configuration import DetectorConfig
from unquad.estimation.conformal import ConformalDetector
from unquad.strategy.bootstrap import Bootstrap
from unquad.utils.enums import Aggregation, Adjustment, Dataset
from unquad.utils.metrics import false_discovery_rate, statistical_power
dl = DataLoader(dataset=Dataset.SHUTTLE)
x_train, x_test, y_test = dl.get_example_setup(random_state=1)
ce = ConformalDetector(
detector=IForest(behaviour="new"),
strategy=Bootstrap(resampling_ratio=0.99, n_bootstraps=20, plus=True),
config=DetectorConfig(alpha=0.1, adjustment=Adjustment.BY, aggregation=Aggregation.MEAN),
)
ce.fit(x_train)
estimates = ce.predict(x_test)
print(f"Empirical FDR: {false_discovery_rate(y=y_test, y_hat=estimates)}")
print(f"Empirical Power: {statistical_power(y=y_test, y_hat=estimates)}")
Output:
Empirical FDR: 0.0
Empirical Power: 1.0
The package only supports anomaly estimators that are suitable for unsupervised one-class classification. As respective detectors are therefore exclusively fitted on normal (or non-anomalous) data, parameters like threshold are internally set to the smallest possible values.
Models that are currently supported include:
- Angle-Based Outlier Detection (ABOD)
- Autoencoder (AE)
- Cook's Distance (CD)
- Copula-based Outlier Detector (COPOD)
- Deep Isolation Forest (DIF)
- Empirical-Cumulative-distribution-based Outlier Detection (ECOD)
- Gaussian Mixture Model (GMM)
- Histogram-based Outlier Detection (HBOS)
- Isolation-based Anomaly Detection using Nearest-Neighbor Ensembles (INNE)
- Isolation Forest (IForest)
- Kernel Density Estimation (KDE)
- k-Nearest Neighbor (kNN)
- Kernel Principal Component Analysis (KPCA)
- Linear Model Deviation-base Outlier Detection (LMDD)
- Local Outlier Factor (LOF)
- Local Correlation Integral (LOCI)
- Lightweight Online Detector of Anomalies (LODA)
- Locally Selective Combination of Parallel Outlier Ensembles (LSCP)
- GNN-based Anomaly Detection Method (LUNAR)
- Median Absolute Deviation (MAD)
- Minimum Covariance Determinant (MCD)
- One-Class SVM (OCSVM)
- Principal Component Analysis (PCA)
- Quasi-Monte Carlo Discrepancy Outlier Detection (QMCD)
- Rotation-based Outlier Detection (ROD)
- Subspace Outlier Detection (SOD)
- Scalable Unsupervised Outlier Detection (SUOD)
Bug reporting: https://github.com/OliverHennhoefer/unquad/issues