-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathanalysis-bcpnn.Rmd
122 lines (91 loc) · 3.56 KB
/
analysis-bcpnn.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
---
title: "Base Ranker: Bayesian Confidence Propagation Neural Network"
author:
- name: Nan Xiao
url: https://nanx.me/
affiliation: Seven Bridges
affiliation_url: https://www.sevenbridges.com/
- name: Soner Koc
url: https://github.com/skoc
affiliation: Seven Bridges
affiliation_url: https://www.sevenbridges.com/
- name: Kaushik Ghose
url: https://kaushikghose.wordpress.com/
affiliation: Seven Bridges
affiliation_url: https://www.sevenbridges.com/
date: "`r Sys.Date()`"
output: distill::distill_article
bibliography: rankv.bib
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, eval = TRUE, cache = TRUE)
```
# Data Model
The BCPNN method leverages the information component (IC) to measure the association between the vaccine and symptom. IC is widely used to measure the mutual information between two random variables.
Let $p_i$ be the probability of a target vaccine $i$ exposure being reported, $p_j$ be the the probability of the target symptom $j$ being reported, and $p_{ij}$ be the joint probability of a report on the target symptom $j$ under exposure to the target vaccine $i$. Bate et al. [@bate1998] defines the metric $\text{IC}_{ij}$ as
$$
\text{IC}_{ij} = \log_2\frac{p_{ij}}{p_i p_j}.
$$
Recall the contingency table for target vaccine $i$ and target symptom $j$:
| Target vaccine | Target symptom | All other symptoms | Total |
| :------------- | :------------- | :----------------------- | :-------- |
| Yes | $n_{ij}$ | $n_i - n_{ij}$ | $n_i$ |
| No | $n_j - n_{ij}$ | $n - n_i - n_j + n_{ij}$ | $n - n_i$ |
| Total | $n_j$ | $n - n_j$ | $n$ |
Let the cell counts for vaccine-symptom pairs $(i, j)$ be $n_{ij}$. The BCPNN data model assumes
$$
n_{ij} | p_{ij} \sim \text{Binomial}(n, p_{ij}),\\
p_{ij} \sim \text{Beta}(\alpha_{ij}, \beta_{ij})
$$
where
$$
\alpha_{ij} = 1,\\
\beta_{ij} = \frac{1}{E(p_i | n_i) + E(p_j | n_j)} - 1.
$$
Under the assumption of independence, the marginal sums over the rows and columns of the $i \times j$ contingency table are:
$$
n_i | p_i \sim \text{Binomial}(n, p_i),\\
n_j | p_j \sim \text{Binomial}(n, p_j)
$$
where
$$
p_i \sim \text{Beta}(1, 1),\\
p_j \sim \text{Beta}(1, 1).
$$
The IC estimate is
$$
\hat{\text{IC}_{ij}} = \log_2 \frac{(n_{ij} + 1) (n + 2)^2}{(n_{ij} + 1) (n+2)^2 + n(n_i + 1) (n_j + 1)}.
$$
The variance estimation is given by
$$
\hat{\sigma_{ij}}^2 = \frac{1}{(\log 2)^2} (\frac{n - n_{ij} + \gamma - 1}{(n_{ij} + 1)(n+\gamma+1)} + \frac{n-n_{i} + 1}{(n_i + 1) (n+3)} + \frac{n - n_j + 1}{(n_j + 1)(n+3)})
$$
where
$$
\gamma = \frac{(n+2)^2}{(n_i + 1)(n_j + 1)}.
$$
# Computation
Load the packages for BCPNN-based singal detection and ranking:
```{r}
suppressMessages(library("PhViD"))
library("kableExtra")
```
Load the preprocessed VAERS data and transform it into the analyzable format:
```{r}
df_p <- readRDS("data-processed/df_p.rds")
df_p <- df_p[, 1:3]
df_v <- as.PhViD(df_p, MARGIN.THRES = 10)
```
Calculate the Information Component derived by the Bayesian neural network model [@bate1998], [@noren2006] and the ranking statistic --- 2.5% quantile of the posterior distribution of IC:
```{r}
lst_bcpnn <- BCPNN(df_v, MIN.n11 = 10, DECISION = 3, RANKSTAT = 2)
df_bcpnn <- lst_bcpnn$SIGNALS[order(lst_bcpnn$SIGNALS$`Q_0.025(log(IC))`, decreasing = TRUE), 1:5]
row.names(df_bcpnn) <- NULL
```
View the top ranked vaccine-adverse event pairs:
```{r}
head(df_bcpnn) %>% kable() %>% kable_styling()
```
```{r,echo=FALSE}
saveRDS(df_bcpnn, file = "data-processed/df_bcpnn.rds")
```