Artificial Intelligence for Industry's project on Italian COVID-19 dataset.
In this project we explored the potential and limitations of Bayesian melding, a statistical technique which fits the input parameters of a deterministic function, according to stochastic observations.
The general idea behind the method is to merge different "opinions" about an observed phenomenon via statistical pooling:
- A prior probability on the outputs of the model ("what may be reasonable to happen")
- An induced probability computed by applying the deterministic model to some input prior distribution ("what we expect to observe according to the model")
- A likelihood probability on the inputs ("what we know has happened")
- A likelihood probability on the outputs ("what we actually observe").
In order to correctly apply the pooling operation, the model needs to be inverted. Since this is seldom possible, pooling is approximated with the SIR (sampling importance-resampling, not to be confused with the susceptible-infected-removed model, also used in this repository) algorithm:
- Extract a large number of random samples from the input prior distribution
- Weight each sample according to , where:
- is the output of the model applied to
- is the pooling factor (usually 0.5)
- is the output prior
- is the induced output posterior, ie. the output distribution computed applying the input distribution to the model; it can be estimated by applying the model to each sample and then performing a kernel density estimation with a Gaussian kernel
- is the input likelihood
- is the output likelihood
- Extract a small subset of samples, but this time use the computed weights instead of the prior distribution
- The distribution on the resampled weights is an approximation of the true input distribution and the usual operations can be performed on it (eg. extract mean to fit the model to the data and variance to determine confidence).
Bayesian melding was applied to three different epidemiological models:
- SIR: Susceptible-infected-removed
- SIRD: Susceptible-infected-recovered-deceased
- SEIRD: Susceptible-exposed-infected-recovered-deceased, extended with hidden E compartment and reinfection rate.
Due to step 1. being very slow and the curse of dimensionality (especially for SEIRD), we also tried to perform deterministic seeding in order to reduce the search space, with limited success.
Slides' beamer template was forked from UniBO beamer and modified for the AI course at DISI.
Authors: G. Tsiotas, L.S. Lorello.
We also maintain a public dataset of Italian regions' colors at: https://github.com/tsiotas/covid-19-zone.
This dataset is updated every day and contains the colors of each region, starting from November, 6th, 2020 (the first day in which the Government decided to apply a color-based scheme).