Skip to content

A package containing datasets for the ID529 course

Notifications You must be signed in to change notification settings

ID529/ID529data

Repository files navigation

Teaching Dataset for ID529 — Data Management and Analytic Workflows in R

This is an R package that you can install with the following R code:

# install devtools if you haven't already
install.packages("devtools")

library(devtools)
devtools::install_github("ID529/ID529data")

Once you’ve installed the package, you can load the package and the nhanes dataset built into it. This will be our primary dataset for teaching examples.

library(ID529data)
data("nhanes_id529", package = 'ID529data')

Documentation

The documentation for the dataset can be accessed by running ?nhanes after loading the data.

Data Labels

The data comes with a built-in data dictionary:

labelled::generate_dictionary(nhanes_id529, details = 'full')
 pos variable                                                 label                                                                                                                                                                col_type values                                    
 1   id                                                       Unique identifier for individuals in NHANES                                                                                                                          chr      range: 73568 - 83726                      
 2   race_ethnicity                                           Race/Ethnicity                                                                                                                                                       fct      Non-Hispanic White                        
                                                                                                                                                                                                                                            Non-Hispanic Black                        
                                                                                                                                                                                                                                            Hispanic                                  
 3   sex_gender                                               Sex/Gender [as binary]                                                                                                                                               fct      Male                                      
                                                                                                                                                                                                                                            Female                                    
 4   age                                                      Age [in years] at screening                                                                                                                                          int      range: 12 - 80                            
 5   poverty_ratio                                            Ratio of Household Income to US Federal Poverty Line. The value was not computed if the respondent only reported income as < $20,000 or ≥ $20,000. If family income~ dbl      range: 0 - 5                              
 6   days_dental_floss                                        Days [Reported] Respondents Floss per Week [days/week]                                                                                                               int      range: 0 - 7                              
 7   PFAS_total                                               Perfluoroalkyl and Polyfluoroalkyl Substances [ng/mL]                                                                                                                dbl      range: 0.49 - 1423.1                      
 8   PFOS                                                     Perfluorooctane Sulfonate [ng/mL]                                                                                                                                    dbl      range: 0.14 - 1403                        
 9   PFOA                                                     Perfluorooctanoic Acid [ng/mL]                                                                                                                                       dbl      range: 0.14 - 85.27                       
 10  PFNA                                                     Perfluorononanoic Acid [ng/mL]                                                                                                                                       dbl      range: 0.07 - 16.3                        
 11  PFHS                                                     Perfluorohexane Sulfonate [ng/mL]                                                                                                                                    dbl      range: 0.07 - 33.9                        
 12  PFDE                                                     Perfluorodecanoic Acid [ng/mL]                                                                                                                                       dbl      range: 0.07 - 51.3                        
 13  total_energy                                             Calories in Daily Dietary Intake [reported] [kCal]                                                                                                                   int      range: 130 - 12108                        
 14  fast_food_energy_no_popcorn_no_seafood                   Calories from fast food excluding popcorn and seafood in daily dietary recall [kCal]                                                                                 int      range: 0 - 7539                           
 15  restaurant_energy_no_popcorn_no_seafood                  Calories from food at restaurants excluding popcorn and seafood in daily dietary recall [kCal]                                                                       int      range: 0 - 3424                           
 16  non_fast_food_or_restaurant_energy_no_popcorn_no_seafood Calories from food that is not from fast food or restaurants excluding popcorn and seafood in dietary recall [kCal]                                                  int      range: 0 - 10765                          
 17  popcorn_energy                                           Calories from popcorn in dietary recall [kCal]                                                                                                                       int      range: 0 - 852                            
 18  shellfish_energy                                         Calories from shellfish in dietary recall [kCal]                                                                                                                     int      range: 0 - 1039                           
 19  fish_energy                                              Calories from fish in dietary recall [kCal]                                                                                                                          int      range: 0 - 1518                           
 20  mean_BP                                                  Mean Systolic Blood Pressure [mm Hg]                                                                                                                                 dbl      range: 83.3333333333333 - 216.666666666667
 21  weight                                                   Weight [kg]                                                                                                                                                          dbl      range: 29.8 - 173.7                       
 22  height                                                   Height [cm]                                                                                                                                                          dbl      range: 135.4 - 197                        

For any single variable, the description can be retrieved using var_label from the labelled package.

labelled::var_label(x = nhanes_id529$poverty_ratio)
## [1] "Ratio of Household Income to US Federal Poverty Line. The value was not computed if the respondent only reported income as < $20,000 or ≥ $20,000. If family income was reported as a more detailed category, the midpoint of the range was used to compute the ratio. Values at or above 5.00 were coded as 5.00 or more because of disclosure concerns."

Correlation among quantitative variables

library(ggcorrplot)

cor_mat <- cor(nhanes_id529 %>% dplyr::select(where(is.numeric)), use = 'pairwise.complete.obs')

ggcorrplot::ggcorrplot(
  cor_mat,
  hc.order = TRUE,
  type = "lower",
  outline.color = "white",
  ggtheme = ggplot2::theme_gray,
  colors = c("#6D9EC1", "white", "#E46726"),
  lab = TRUE)

About

A package containing datasets for the ID529 course

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages