Skip to content

ESICCC: A systematic computational framework for evaluation, selection and integration of cell-cell communication inference methods

Notifications You must be signed in to change notification settings

SunXQlab/ESICCC

Repository files navigation

Introduction

We benchmark two types of CCC inference methods, one type of methods predict LR pairs based on scRNA-seq data, and another type of methods that can predict ligand/receptor-targets regulations.

For the first benchmark, we evaluated the accuracy, stability and usability of 18 LR inference methods. In term of accuracy, paired ST datasets, CAGE expression/Proteomics data and sampled scRNA-seq datasets were used to benchmark the 18 methods. Firstly, 11 scRNA-seq datasets were used as input for methods to predict intercellular communication and the two defined similarity index (SI, modified Jaccard index) and rank-based similarity index (RSI) were used to compare the similarity of LR pairs predicted by methods.Furthermore, we benchmark the 18 methods using 11 paired ST datasets with the hypothesis that the values of mutual information (MI) of LR pairs are greater in the close group than that in the distant group. In addition, three PBMC datasets from 10X Genomics website were used as input for methods to predict LR pairs and CAGE expression/Proteomics data were used as pseudo gold standards to benchmark the 18 methods. In term of stability, we ramdomly sampled different ratios of cells in all the scRNA-seq, resulting 70 sampled datasets and 14 original datasets as input for methods. We calculated the Jaccard index of the LR pairs predicted based sampled datasets and original datasets and a stability value was defined to test the robustness of methods to sampling rates of scRNA-seq data. In term of usability, we recorded the running time and maximum memory usage of methods in all the 84 scRNA-seq datasets.

For the second benchmark, 8 ST datasets were used as the input for 5 LR-Targets inference tools to predict ligand/receptor-targets regulations, and the cell line perturbation datasets were used for evaluation, involving knockout/mutant conditions for 5 receptors, and treatment conditions for 10 ligands. And the differentially expressed genes (DEGs) in each cell line perturbation dataset, were used as the ground truth of ligand/receptor-targets regulations. The score of ligand/receptor-targets predicted by different tools were compared to the differential expression status (DGEs or not DEGs) of corresponding targets to calculate AUROC and AUPRC. In addition, we also record the running time and maximum memory usage of methods in all the ST datasets.

Workflow

  • Step0_LRToolsFunction contains the R/Python/Shell scripts that package the running code of 19 methods with Seurat objects as input into function.
  • Step1_LRPredictionResult contains the R/Shell scripts to run 19 methods for inferring LR pairs from the 14 scRNA-seq datasets.
  • Step2_PreSTForLRBench contains the R scripts to get the different ratios (e.g.top 10%, 20%, 30%, 40%) of cell type specific close and distant cell pairs in each dataset for the preparation of the benchmarking using mutual infomation.
  • Step3_MIForLRBench contains the R scripts to calculate MI of LR interactions predicted by methods in the different ratios of cell type specific close and distant groups and calculate DLRC index of methods in each dataset.
  • Step4_SIRSIForLRBench contains the R scripts to benchmark the similarity (SI and RSI) of the LR interactions predicted by each two methods.
  • Step5_BenchBasedCAGEProteomic contains the R scripts to benchmark the 18 LR inference methods using the CAGE expression and proteomics data.
  • Step6_LRBenchSampling contains the R/Shell scripts to run the 18 LR inference methods for inferring LR pairs from 70 sampled scRNA-seq datasets.
  • Step7_LRBenchSamplingBench contains the R/Shell scripts to calculate Jaccard index between the LR pairs predicted based on the sampled datasets and the original datasets, and record the running time and maximum memory usage of methods in each dataset.
  • Step8_LRTToolsFunction contains the R/Python/Shell scripts to run the 5 LR-Target inference methods for predicting ligand/receptor-targets using ST datasets as input.
  • Step9_LRTBench contains the R scripts to benchmark the 5 LR-Target inference methods using cell line perturbation datasets for evaluation, and record the running time and maximum memory usage of methods in each dataset.

Datasets

  • scRNA-seq and ST datasets
Tissue (Disease)SampleID
(scRNA-seq)
SampleID
(ST)
Literature PMIDDownload URL
(scRNA-seq)
Download URL
(ST)
Evaluation purpose
Heart Tissue (Health)CK357control_P735948637URLURLLR interactions
LR-Target regulations
CK358control_P8
Heart Tissue (ICM)CK368FZ_GT_P19LR interactions
CK162FZ_GT_P4
CK362RZ_P11
Heart Tissue (AMI)CK361IZ_P10
CK161IZ_P3
CK165IZ_BZ_P2
Tumor Tissue
(Breast cancer)
CID44971CID4497134493872URLURLLR interactions
LR-Target regulations
CID4465CID4465
Mouse embryo——Slide1434210887——URLLR interactions
PBMCPBMC4K————URL——LR interactions
PBMC6K————URL——
PBMC8K————URL——
Tumor Tissue
(Gliomas)
——UKF243_T_ST35700707——URLLR-Target interactions
——UKF260_T_ST
——UKF266_T_ST
——UKF334_T_ST
  • Cell line perturbation datasets
DatasetsLigand/ReceptorTypeConditionCell LineDisease
GSE120268AXLreceptorKnockdownMDA-MB-231Breast Cancer
GSE157680NRP1receptorKnockdownMDA-MB-231
GSE15893CXCR4receptorMutantMDA-MB-231
CXCL12ligandTreatmentMDA-MB-231
GSE160990TGFB1ligandTreatmentMDA-MB-231
GSE36051DLL4(1)ligandTreatmentMCF7
DLL4(2)ligandTreatmentMDA-MB-231
JAG1ligandTreatmentMDA-MB-231
GSE65398IGF1(1)ligandTreatmentMCF7
GSE7561IGF1(2)ligandTreatmentMCF7
GSE69104CSF1RreceptorInhibitTAMsGliomas
GSE116414FGFR1receptorInhibitGSLC
GSE206947EFNB2ligandTreatmentcardiac fibroblastsHealth
GSE181575TGFB1ligandTreatmentcardiac fibroblasts
GSE123018TGFB1ligandTreatmentcardiac fibroblasts

Tools for inferring intercellular LR pairs

  • CellPhoneDB (Python, version: 3.0.0)
  • CellTalker (R, version: 0.0.4.9000)
  • Connectome (R, version: 1.0.1)
  • NATMI (Python)
  • ICELLNET (R, version: 1.0.1)
  • scConnect (Python, version: 1.0.3)
  • CellChat (R, version: 1.4.0)
  • SingleCellSignalR (R, version: 1.2.0)
  • CytoTalk (R, version: 0.99.9)
  • CellCall (R, version: 0.0.0.9000)
  • scSeqComm (R, version: 1.0.0)
  • NicheNet (R, version: 1.1.0)
  • Domino (R, version: 0.1.1)
  • scMLnet (R, version: 0.2.0)
  • PyMINEr (Python, version: 0.10.0)
  • iTALK (R, version: 0.1.0)
  • cell2cell (Python, version: 0.5.10)
  • RNAMagnet (R, version: 0.1.0)

Tools for predicting ligand/receptor-targets regulations

  • CytoTalk (R, version: 0.99.9)
  • NicheNet (R, version: 1.1.0)
  • stMLnet (R, version: 0.1.0)
  • MISTy (R, version: 1.3.8)
  • HoloNet (Python, version: 0.0.5)

Citation

Please cite ESICCC as follows:

Luo J, Deng M, Zhang X, Sun X*. ESICCC as a systematic computational framework for evaluation, selection and integration of cell-cell communication inference methods. Genome Research. 2023. doi: 10.1101/gr.278001.123

Contact

If you encounter any problems, please contact (sunxq6@mail.sysu.edu.cn).

About

ESICCC: A systematic computational framework for evaluation, selection and integration of cell-cell communication inference methods

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published