This is an R package that you can install with the following R code:
# install devtools if you haven't already
install.packages("devtools")
library(devtools)
devtools::install_github("ID529/ID529data")
Once you’ve installed the package, you can load the package and the
nhanes
dataset built into it. This will be our primary dataset for
teaching examples.
library(ID529data)
data("nhanes_id529", package = 'ID529data')
The documentation for the dataset can be accessed by running ?nhanes
after loading the data.
The data comes with a built-in data dictionary:
labelled::generate_dictionary(nhanes_id529, details = 'full')
pos variable label col_type values
1 id Unique identifier for individuals in NHANES chr range: 73568 - 83726
2 race_ethnicity Race/Ethnicity fct Non-Hispanic White
Non-Hispanic Black
Hispanic
3 sex_gender Sex/Gender [as binary] fct Male
Female
4 age Age [in years] at screening int range: 12 - 80
5 poverty_ratio Ratio of Household Income to US Federal Poverty Line. The value was not computed if the respondent only reported income as < $20,000 or ≥ $20,000. If family income~ dbl range: 0 - 5
6 days_dental_floss Days [Reported] Respondents Floss per Week [days/week] int range: 0 - 7
7 PFAS_total Perfluoroalkyl and Polyfluoroalkyl Substances [ng/mL] dbl range: 0.49 - 1423.1
8 PFOS Perfluorooctane Sulfonate [ng/mL] dbl range: 0.14 - 1403
9 PFOA Perfluorooctanoic Acid [ng/mL] dbl range: 0.14 - 85.27
10 PFNA Perfluorononanoic Acid [ng/mL] dbl range: 0.07 - 16.3
11 PFHS Perfluorohexane Sulfonate [ng/mL] dbl range: 0.07 - 33.9
12 PFDE Perfluorodecanoic Acid [ng/mL] dbl range: 0.07 - 51.3
13 total_energy Calories in Daily Dietary Intake [reported] [kCal] int range: 130 - 12108
14 fast_food_energy_no_popcorn_no_seafood Calories from fast food excluding popcorn and seafood in daily dietary recall [kCal] int range: 0 - 7539
15 restaurant_energy_no_popcorn_no_seafood Calories from food at restaurants excluding popcorn and seafood in daily dietary recall [kCal] int range: 0 - 3424
16 non_fast_food_or_restaurant_energy_no_popcorn_no_seafood Calories from food that is not from fast food or restaurants excluding popcorn and seafood in dietary recall [kCal] int range: 0 - 10765
17 popcorn_energy Calories from popcorn in dietary recall [kCal] int range: 0 - 852
18 shellfish_energy Calories from shellfish in dietary recall [kCal] int range: 0 - 1039
19 fish_energy Calories from fish in dietary recall [kCal] int range: 0 - 1518
20 mean_BP Mean Systolic Blood Pressure [mm Hg] dbl range: 83.3333333333333 - 216.666666666667
21 weight Weight [kg] dbl range: 29.8 - 173.7
22 height Height [cm] dbl range: 135.4 - 197
For any single variable, the description can be retrieved using
var_label
from the labelled
package.
labelled::var_label(x = nhanes_id529$poverty_ratio)
## [1] "Ratio of Household Income to US Federal Poverty Line. The value was not computed if the respondent only reported income as < $20,000 or ≥ $20,000. If family income was reported as a more detailed category, the midpoint of the range was used to compute the ratio. Values at or above 5.00 were coded as 5.00 or more because of disclosure concerns."
library(ggcorrplot)
cor_mat <- cor(nhanes_id529 %>% dplyr::select(where(is.numeric)), use = 'pairwise.complete.obs')
ggcorrplot::ggcorrplot(
cor_mat,
hc.order = TRUE,
type = "lower",
outline.color = "white",
ggtheme = ggplot2::theme_gray,
colors = c("#6D9EC1", "white", "#E46726"),
lab = TRUE)