-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathguide.Rmd
176 lines (125 loc) · 5.42 KB
/
guide.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
---
title: "Tier II Data Validation Guide"
author: "Jason P. Pickering, David Huser"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Tier II Data Validation Guide}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
## Tier II Data Validation Guide
This manual should guide through the Tier II DataPack / DisaggTool process for COP18.
**Status: DRAFT**
### Set up your environment
- Follow the instructions here: [data-pack-importer-vagrant](https://github.com/davidhuser/data-pack-importer-vagrant)
- Update your [support files](https://www.pepfar.net/Project-Pages/collab-38/Shared%20Documents/COP18%20Target%20Setting%20Process%20Improvement/Import%20Team/) and download them to the `support_files` folder
### Download the support files
They are Excel spreadsheets links to Sharepoint files (they must not be attached). Download those into the repository from above. The country team *needs to indicate* the distribution method that should be used, either it's 2017 (FY17 Results) or 2018 (FY18 Targets).
### DisaggTool validation
Run a first validation. Adjust the following code and paste it into RStudio code (http://localhost:8787) and hit `ENTER`.
```{r eval=FALSE}
# ADJUST THIS --->
filename="DisaggTool_filename.xlsx"
distribution_year=2017
# DO NOT CHANGE
library(devtools)
install_github("jason-p-pickering/data-pack-importer", ref="prod")
library(datapackimporter)
support_files="/vagrant/support_files/"
check_support_files(support_files)
disagg_tools="/vagrant/disagg_tools/"
disagg_tool=paste0(disagg_tools, filename)
#Parse the PNSU data
d<-DataPackR(disagg_tool, distribution_method = distribution_year, support_files_path = support_files)
```
It creates the following files:
- **Site Level Review File** - `SiteLevelReview_*.xlsx` -> send back to the country team via Datim Support. There may be errors, *include that log* into your response so the country team can correct their Disagg Tools and re-submit it. Re-run the DisaggTool validation.
- **PSNU data:** `*_import_*.csv` -> depending of issues, continue to run _Import File Validation_ or act as below.
**TBD**
Warnings / error handling:
- *Hard stop:* Tier II does not proceed with anything else until this is fixed by the country team.
- *Continue:* Tier II proceeds with validation #2 of the PSNU CSV (`*_import_*.csv`)
Possible issues and their consequences:
- Negative values: *hard stop*
- Schema not valid: *hard stop*
- DisAggTool corrupted: *hard stop*
- Missing PSNUs: *continue*
### Site-Level Review Tool Validation
You receive back the Site Level Review Tool, reviewed by the country. This needs another validation. Adjust the `filename` and re-run.
```{r eval=FALSE}
# ADJUST THIS --->
filename="SiteLevelReview.xlsx"
distribution_year=2017
(re-use the same script from above)
```
### Import File Validation
In above step a PSNU CSV file was generated. This step checks if it is a valid DATIM import file. See these instructions for doing so: https://github.com/jason-p-pickering/datim-validation/blob/master/vignettes/validating_data.Rmd
The gist of it:
- Clear the console in RStudio
- Create a `~/.secrets/datim.json` file, or use the `/vagrant/datim.json`
```json
{
"dhis": {
"baseurl": "https://dev-de.datim.org/",
"username": "admin",
"password": "district"
}
}
```
Make it only readable by your user:
```bash
chmod 0600 ~/.secrets/datim.json
```
or
```bash
chmod 0600 /path/to/vagrant/datim.json
```
- Install `datim-validation` and load secrets:
```{r eval=FALSE}
# ADJUST THIS -->
filename="/vagrant/disagg_tools/path_to_psnu.csv"
secrets_file="/vagrant/datim.json"
# DON'T CHANGE
devtools::install_github('jason-p-pickering/datim-validation')
require(datimvalidation)
secrets<-secrets_file
loadSecrets(secrets)
d<-d2Parser(file=filename,
type = "csv",
dataElementIdScheme = "id",
orgUnitIdScheme = "id",
idScheme = "id",
invalidData = TRUE)
```
then run these one by one:
```{r eval=FALSE}
getInvalidMechanisms(d)
```
For site level tools, invoke this command:
```{r eval=FALSE}
getInvalidDataElements(d,c("eyI0UOWJnDk","l796jk9SW7q"))
```
For PSNU level tools, invoke this command
```{r eval=FALSE}
getInvalidDataElements(d,c("msJImKkTJkV","pTuDWXzkAkJ"))
```
For both of these commands, you should not get any rows returned from the function.
Then run remaining code (one by one or together should work):
```{r eval=FALSE}
checkValueTypeCompliance(d)
vr_violations<-validateData(data = d,
return_violations_only = TRUE,
parallel = FALSE)
vr_rules<-getValidationRules()
cop_18_des<-getValidDataElements(datasets=c("l796jk9SW7q","eyI0UOWJnDk","msJImKkTJkV"))
match<-paste(unique(cop_18_des$dataelementuid),sep="",collapse="|")
vr_filter<-vr_rules[grepl(match,vr_rules$leftSide.expression) & grepl(match,vr_rules$rightSide.expression),"id"]
vr_violations<-vr_violations[ vr_violations$id %in% vr_filter,]
diff<-gsub(" <= ","/",vr_violations$formula)
vr_violations$diff<-sapply(diff,function(x) { round( ( eval(parse(text=x)) -1 ) * 100 , 2) })
write.csv(vr_violations,file="validation_violations.csv")
```
You can download the resulting file by checking the checkbox next to the appropriate file in the file panel, then by clicking More and selecting Export.
If the validation passes, reroute file to Tier III.
If it does not pass, respond with the logs to the country team.