GitHub - amrosicka/multivariate_analyses: Gender differences in modifiable risk factors associated with cognition: A machine-learning approach

amrosicka / multivariate_analyses Public

Notifications You must be signed in to change notification settings
Fork 0
Star 0

Gender differences in modifiable risk factors associated with cognition: A machine-learning approach

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
ML_pipeline_gender.R		ML_pipeline_gender.R
ML_pipeline_gender_sensitivity_analyses.R		ML_pipeline_gender_sensitivity_analyses.R
ML_preprocessing_final.R		ML_preprocessing_final.R
external_validation.R		external_validation.R
helper_script.R		helper_script.R
readme		readme

Repository files navigation

README

Project: "Gender differences in modifiable risk factors associated with cognition: A machine-learning approach"
Authors: Anna Marie Rosická, Claire Gillan, Sharon Chi Tak Lee
Permutation analyses based on legacy code from Klaas E Stpehan, Jakob Heinzle(?)
--------------------------------------------------------
Data comes from completers of the Risk Factors challenge, gathered with the app Neureka.

Aims of the project:
1) Learn about the relative importance a set of previously established demetia risk factors for objective and subjective cognition, when considered all together
2) Learn how much variance they together explain
3) Test the interactions of these risk factors with gender

Data overview:
10,660 people completed at least a part of the Risk Factors module.
Features available:
* 13 risk factors variables used in previous paper (https://osf.io/preprints/psyarxiv/rc4sy, see page 14 for overview)
* covariates: age, gender

Scripts:
* helper_script.R = installs and loads packages using groundhog, loads custom functions and ggplot themes
* ML_preprocessing_final.R = preprocesses data for the ML pipeline:
1. Load data and packages
2. Data preparation
3. Get descriptive statistics & select full completers
* excluding people with missing data on any of the following cognitive measures:
* subjective memory problems, working memory, cognitive flexibility (=Trails B), and processing speed (=Trails A; this measure was not included in the previous paper but we can include it here if we want more cognitive data; missingness is comparable to Trails B)
* Compare gender groups on important variables
4. Split data to train/test set - 80/20 split
* stratified based on age and gender
* done in full sample and then also creating subsets for each gender
* create gender-balanced subsets of both train and test set
5. Missing data imputation
* Impute missing data for training dataset and then carry over estimated parameters to the test dataset
* Median for numerical variables, mode for categorical variables
6. Drop rarely endorsed items
7. Scale variables
* Calculation:
* scaled_train = (train - train_mean) / train_std_deviation
* scaled_test = (test - train_mean) / train_std_deviation
* i.e., carrying over scaling values from train to test set
8. Generate composite cognitive scores (avg z-scores)
9. Export preprocessed data
* ML_pipeline_gender.R = main ML analyses and plots; note that for running this you will need the folder structure already in place otherwise plots and files won't save
1. Load data
2. Define feature lists
3. Elastic nets with repeated nested cross-validation (10 inner & 10 outer folds and 10 repeats)
4. Multicollinearity checks
5. Plot SHAP values
* external_validation.R = generate performance metrics in both train and test set
& run permutation analyses to check if performance differs significantly or not
* ML_pipeline_gender_sensitivity_analyses.R = combines ML pipeline with external validation for a gender-balanced subset of the data