GitHub - gutierrez-arcelus-lab/hla

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.pdf		README.pdf
README.qmd		README.qmd
compile_results.R		compile_results.R
index_hlapers.sh		index_hlapers.sh
run_hlamapper.slurm		run_hlamapper.slurm
run_hlapers.slurm		run_hlapers.slurm
run_kourami.slurm		run_kourami.slurm
run_qc.slurm		run_qc.slurm
run_star.slurm		run_star.slurm
sample_description.tsv		sample_description.tsv
setup.R		setup.R
star_index.slurm		star_index.slurm

Repository files navigation

---
title: "In-silico HLA typing from RNA-seq data"
authors:
  - name: Vitor R. C. Aguiar
    affiliations:
      - ref: aff1
      - ref: aff2
  - name: Maria Gutierrez-Arcelus
    affiliations:
      - ref: aff1
      - ref: aff2
affiliations:
  - id: aff1
    name: Division of Immunology, Boston Children’s Hospital, Harvard Medical School, Boston, MA, USA
  - id: aff2
    name: Broad Institute of MIT and Harvard, Cambridge, MA, USA

filters:
  - authors-block

format: pdf
bibliography: misc.bib
csl: nature.csl
---


# Methods

### RNA-seq data processing

We used trim_galore (<https://github.com/FelixKrueger/TrimGalore>) to perform adapter and quality trimming on the RNA-seq reads. We then aligned reads to the human reference genome GRCh38 with STAR [@Dobin2012].

### HLA typing

In order to call HLA alleles from the RNA-seq data, we used two independent approaches: (1) HLApers [@Aguiar2019; @Aguiar2020], which integrates STAR [@Dobin2012] and Salmon [@Patro2017] to map reads to HLA allele sequences and infer the most likely HLA genotypes; and (2) Kourami [@Lee2018], a graph-guided approach to assemble HLA allele sequences.

We used HLApers v1.2_dev with HLA reference sequences from the IPD-IMGT/HLA database [@Robinson2019] v3.52.0. We performed read extraction from STAR's BAM files with `HLApers bam2fq` in order to extract unmapped reads and reads mapping to the MHC region, which we used as input for `HLApers genotype` to infer HLA types. 

We used Kourami v0.9.6 with its built-in HLA database based on IPD-IMGT/HLA v3.24.0. As a first step, we used hla-mapper [@Castelli2018] v4.3 to correct for mapping bias at HLA genes in the BAM files from STAR. Then, we extracted reads mapped to HLA genes with the script `alignAndExtract_hs38.sh`, and ran Kourami using as input the reads realigned to its built-in HLA panel. 

### Code availability

Code is available at <https://github.com/gutierrez-arcelus-lab/hla_mis-c>

# Results

In @tbl-1 we show the HLA genotypes inferred by HLApers and Kourami. Kourami outputs alleles in "G" groups of alleles that are identical at the exons encoding for the antigen recognition sites, whereas HLApers output consists of single alleles. There is almost complete concordance between the two methods at one-field resolution. The only exception is at HLA-A for sample "Key_75", for which both methods call an A\*02:01, but for the second allele HLApers calls an A\*03 whereas Kourami calls another A\*02. We visually inspected the read alignments at HLA-A and confirmed that many reads are compatible with an A\*03 and incompatible with both alleles being of the type A\*02.

```{r}
#| echo: false
#| warning: false
#| message: false
#| label: tbl-1
#| tbl-cap: Class I HLA genotypes inferred by HLApers and Kourami.

library(tidyverse)

read.csv("./results.csv") |>
    mutate_at(vars(allele_hlapers, allele_kourami), ~sub("\\*", "\\\\*", .)) |>
    mutate_at(vars(locus), ~paste0("HLA-", .)) |>
    knitr::kable()

```

# References