As stated on Crypt4GH: A secure method for sharing human genetic data:
Crypt4GH, a new standard file container format from the Global Alliance for Genomics and Health (GA4GH), allows genomic data to remain secure throughout their lifetime, from initial sequencing to sharing with professionals at external organizations.
While Crypt4GH solving secure data-in-rest and data-in-transfer issues, data-in-use is still needed to be addressed. We propose to use Trusted Execution Environment technology and specific Intel SGX implementation for the protected in-memory processing for genomics data. It also provides means to implement a Key Management System (KMS) using remote Configuration and Attestation Service (CAS).
This example provides two services:
- encrypt - to encrypt a VCF file, using a Crypt4GH Python library
- process - to process the encrypted file: extract the ID column.
Encryption keys are:
- Generated by SCONE Configuration and Attestation Service
- Not exposed to any users, including administrators
- Only visible inside particular Intel SGX enclaves.
We used Azure Confidential Computing to execute this example, but it can run on any other SGX supported hardware, including bare metal.
- Host with Intel Software Guard Extensions (SGX) enabled, and SGX driver installed
- Docker access to the Scontain Registry https://sconedocs.github.io/registry/
- Docker and Docker Compose installed
- Build utilites installed: awk, curl, make, openssl
To build Docker services Docker image, generate and upload session to CAS, run: $ make
Build and start SCONE Local Attestation Service container:
$ docker-compose up -d las
$ cat input.vcf | docker-compose run --rm encrypt > ./input.c4gh
Run "process" service to extract VCF file ID's from the input:
$ cat input.c4gh | docker-compose run --rm process 2>/dev/null
$ pip3 install -r requirements.txt
$ export SENDER_KEY=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
$ export RECIPIENT_KEY=bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
$ python app encrypt input.vcf | python app process
- Memory limit causing SGX VMs overhead on intense memory tasks
- Need for the additional software adaptation
- Close to impossible to use the fork system call
- htslib-crypt4gh integration enabling popular bioinformatics tools such as samtools and bcftools to work in the enclave
- Demonstrate remote block fetch of encrypted SAM/VCF file formats for the secure processing