TemporalVAE: atlas-assisted temporal mapping of time-series single-cell transcriptomes during embryogenesis
Contact: Yuanhua Huang, Yijun Liu
Email: yuanhua@hku.hk
A user-oriented repo is at https://github.com/StatBiomed/TemporalVAE-release with more features to be added.
TemporalVAE is a deep generative model in a dual-objective setting to infer the biological time of cells from a compressed latent space. We demonstrated its scalability to millions of cells in the mouse development atlas and its high accuracy in atlas-based cell staging on mouse organogenesis across platforms and during human peri-implantation between in vivo and in vitro conditions. Furthermore, we showed that our atlas-based time predictor can effectively support RNA velocity modeling over short-time cell differentiation, including hematopoiesis and neuronal development.
A preprint describing TemporalVAE's algorithms and results is at [bioRxiv](https://;.
- Latest Updates
- Installations
- [Reproduce the result in manuscript](#Reproduce the result in manuscript)
- v0.1 (May, 2024): Initial release.
To install TemporalVAE, python 3.9 is required and follow the instruction
- Install Miniconda3 if not already available.
- Clone this repository:
git clone https://github.com/StatBiomed/TemporalVAE
- Navigate to
TemporalVAE
directory:
cd TemporalVAE
- (5-10 minutes) Create a conda environment with the required dependencies:
conda env create -f environment.yml
- Activate the
TemporalVAE
environment you just created:
conda activate TemporalVAE
- Install pytorch: You may refer to pytorch installtion as needed. For example, the command of installing a cpu-only pytorch is:
conda install pytorch torchvision torchaudio cpuonly -c pytorch
The code is in folder named by figure-index
Compare the TemporalVAE with baseline methods in three small datasets cited in Psupertime mansucript.
- Preprocess three datasets by the code described in preprocess_data_fromPsupertimeManuscript.md
- run the code of each benchmarking method, then run plotFig2_check_corr.py to generate Fig2.
- Preprocess the mouse atlas data and mouse stereo data by
python -u Fig3_mouse_data/preprocess_data_mouse_embryonic_development_combineData.py
python -u Fig3_mouse_data/preprocess_data_mouse_embryo_stereo.py
- Reproduce the result of Figure3.A&B and save results in folder results/230827_trainOn_mouse_embryonic_development_kFold_testOnYZdata0809
python -u Fig3_mouse_data/TemporalVAE_kFoldOn_mouseAtlas.py
--result_save_path=230827_trainOn_mouse_embryonic_development_kFold_testOnYZdata0809
--vae_param_file=supervise_vae_regressionclfdecoder_mouse_stereo
--file_path=/mouse_embryonic_development/preprocess_adata_JAX_dataset_combine_minGene100_minCell50_hvg1000
--time_standard_type=embryoneg5to5
--train_epoch_num=100 --kfold_test --train_whole_model
> logs/log.log
-
Plot Figure3.A&B with the result in results/230827_trainOn_mouse_embryonic_development_kFold_testOnYZdata0809, please check Fig3_mouse_data/plot_figure3AB.ipynb
-
Figure3.C: Compare TemporalVAE with LR, PCA, RF on mouse atlas data, please check Fig3_mouse_data/LR_PCA_RF_kFoldOn_mouseAtlas.ipynb
-
Figure3.D&E: Models train on mouse atlas data and predict on mouse stereo-seq data, please check Fig3_mouse_data/TemporalVAE_LR_PCA_RF_directlyPredictOn_mouseStereo.ipynb or run code Fig3_mouse_data/TemporalVAE_LR_PCA_RF_directlyPredictOn_mouseStereo.py on console.
- Preprocess the raw dataset by
python -u Fig4_human_data/preprocess_humanEmbryo_xiang2019data.py
python -u Fig4_human_data/preprocess_humanEmbryo_PLOS.py
python -u Fig4_human_data/preprocess_humanEmbryo_CS7_Tyser.py
- Figure 4.A: K-fold test on xiang19 dataset, please check Fig4_human_data/vae_humanEmbryo_xiang19.ipynb or run code on console:
python -u Fig4_human_data/TemporalVAE_humanEmbryo_kFoldOn_xiang19.py --file_path=/240322Human_embryo/xiang2019/hvg500/
- Figure 4.B: Temporal trained on xiang19 dataset and predict on Lv19 dataset, please check Fig4_human_data/LR_PCA_RF_directlyPredictOn_humanEmbryo_PLOS.ipynb or run code Fig4_human_data/LR_PCA_RF_directlyPredictOn_humanEmbryo_PLOS.py on console.
- Figure 4C&D: train on 4 in vitro dataset and predict on one in vivo dataset, please check Fig4_human_data/vae_humanEmbryo_Melania.ipynb or run code on console:
python -u Fig4_human_data/vae_humanEmbryo_Melania.py --file_path=/240405_preimplantation_Melania/
- The data is from paper .
- 1 Figure 5. C&E is the data of hematopoiesis cells, please check Fig5_RNA_velocity/VAE_mouse_fineTune_Train_on_U_pairs_S_hematopoiesis.ipynb or run code on console:
python -u Fig5_RNA_velocity/TemporalVAE_mouse_fineTune_Train_on_U_pairs_S.py --sc_file_name=240108mouse_embryogenesis/hematopoiesis --clf_weight=0.2
- 2 Figure 5. D&F is the data of neuron cells, please check Fig5_RNA_velocity/VAE_mouse_fineTune_Train_on_U_pairs_S_neuron.ipynb or run code on console:
python -u Fig5_RNA_velocity/TemporalVAE_mouse_fineTune_Train_on_U_pairs_S.py --sc_file_name=240108mouse_embryogenesis/neuron --clf_weight=0.1
- The scVelo result in Figure 5. E&F is base on the .ipynb code provided by the dataset's paper, please check Fig5_RNA_velocity/scVelo_hematopoiesis.ipynb and Fig5_RNA_velocity/scVelo_neuron.ipynb