-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
131 lines (109 loc) · 5.51 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
message = FALSE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# njoaguof
<!-- badges: start -->
[![R-CMD-check](https://github.com/tor-gu/njoaguof/workflows/R-CMD-check/badge.svg)](https://github.com/tor-gu/njoaguof/actions)
<!-- badges: end -->
This is a cleaned-up version of the NJ OAG Use of Force database
available from https://www.njoag.gov/force/.
## Dataset Overview
* ```incident``` This is the main data table. Each use of force incident is recorded here. A single incident may involve multiple subjects.
* ```subject``` The subjects of the use of force -- includes persons and animals.
* ```incident```-specific data tables. Each of these data tables are for multi-value fields associated to incidents. For example -- a single incident may have both ```"Rain"``` and ```"Fog"``` weather conditions, and this will result in two rows in ```incident_weather```.
+ ```incident_contact_origin```
+ ```incident_lighting```
+ ```incident_location_type```
+ ```incident_officer_injury_type```
+ ```incident_officer_medical_treatment```
+ ```incident_planned_contact```
+ ```incident_type```
+ ```incident_video_type```
+ ```incident_weather```
* ```incident```-```subject``` data tables These data tables are for multi-value fields that should be associated to individual incident subjects. Unfortunately, these tables reflect some irreducible messiness of the source data. See the notes below.
+ ```incident_subject_action```
+ ```incident_subject_force_type```
+ ```incident_subject_injury```
+ ```incident_subject_medical_treatment```
+ ```incident_subject_perceived_condition```
+ ```incident_subject_reason_not_arrested```
+ ```incident_subject_resistance```
* ```officer_name_variants``` Includes every variation in spelling and capitalization of the officer names found in the source data.
* ```use_of_force_raw``` The source data.
## Examples
Every incident has a unique ```form_id```, and this field is used to link the ```subject``` ```incident_xxx``` and ```incident_subject_xxx``` tables to specific incidents:
```{r}
library(njoaguof)
library(dplyr)
# Summarize video_type by agency_county
incident %>%
select(form_id, agency_county) %>%
right_join(incident_video_type, by = "form_id") %>%
count(agency_county, video_type)
```
```{r}
library(njoaguof)
library(dplyr)
# Summarize subject gender by officer gender
incident %>%
select(form_id, officer_gender) %>%
right_join(subject, by="form_id") %>%
count(officer_gender, subject_gender=gender)
```
## Notes
The raw data from the NJ OAG is available in table ```use_of_force_raw```, which has one row for each use of force incident. Fields with multiple values are recorded as comma separated lists. For example:
```{r}
use_of_force_raw %>% count(SubjectGender) %>% head(5)
```
These fields are broken out into new data tables in the following categories.
#### Subject Fields
Several fields contain one value for each subject. We presume that the order is preserved, so that we may create one row for each subject in the ```subject``` table.
```{r}
use_of_force_raw %>% filter(FormID == 16301) %>%
select(
FormID,
SubjectArrested,
SubjectType,
SubjectAge,
SubjectRaceEthnicity,
SubjectGender
)
subject %>% filter(form_id == 16301)
```
#### Multi-value incident fields
Some fields contain multiple values which apply to the entire incident. For each such column, we create a separate table expressing this many-to-one relationship. For example, this row in the source data has three values for ```incident_type```, and this results in three rows in the ```incident_type``` table.
```{r}
library(tidyverse)
use_of_force_raw %>% filter(FormID == 16301) %>%
select(IncidentType)
incident_type %>% filter(form_id == 16301)
```
#### Multi-value Incident-Subject Fields
Some fields contain multiple values which apply to individual subjects, but there is no reliable way to assign the values to subjects. For example, in this row of the raw data, there are two subjects and three values in the ```SubResist``` field. In this case, we create three rows in the ```incident_subject_resistance``` table, indicating the position of each item in the list with the ```index``` value.
```{r}
use_of_force_raw %>%
filter(FormID == 19542) %>%
select(SubjectType,SubjectResistance)
incident_subject_resistance %>% filter(form_id == 19542)
```
Note that it is not clear in the source data if "Aggressive resistance" should be associated to the first subject or to the second subject.
All of the ```incident_subject_xxx``` data tables are of this form, with an ```index``` column included so the order information is not lost.
#### Officer name variants
In the raw data, there are two fields which identify the officer: ```officer_name``` (an ID field) and ```Officer_Name2``` (a name field). A single ```officer_name``` ID can be associated different spellings in the ```Officer_Name2``` field. When building the ```incident``` table, we ensure that every ```officer_name_id``` is associated with a single spelling of the officer name by choosing the most common form.
But all variants of the officer names appearing in the source data are preserved in the ```officer_name_variants``` table.
## Installation
You can install the latest version of njoaguof from [GitHub](https://github.com/) with:
``` r
# install.packages("devtools")
devtools::install_github("tor-gu/njoaguof")
```