- Analysis: Pipeline to use chopchop API using command shell. Design guides for CRISPR screnning.
- Date: 11/04/23
- Author: Agustín Sánchez Belmonte (asanchezb@cnio.es)
- Institution: Spanish National Research Cancer Centre (CNIO)
- The bash script
chop_pipeline.sh
that can be used for obtaining target sites for CRISPR. config_env.yml
which is a file for create the working enviroment.
The output of this bash script includes:
<gen>.txt
which are target sites for CRISPR.
There are other outputs less important such as off-targets sequences.
If you have not git package installed
conda install -c anaconda git
Then clone the entire repository in your local space
git clone https://bitbucket.org/valenlab/chopchop.git
cd chopchop
conda update --all
conda env create -f config_env.yml
conda activate chopchop
If this step doesn´t work, doing step 2b.
conda update --all
conda env create -n chopchop
conda activate chopchop
conda install -c anaconda biopython pandas numpy scipy argparse mysql-python scikit-learn=0.18.1
chopchop.py
will need a table to look up genomic coordinates if you want to supply names of the genes rather than coordinates. To get example genePred table:
- Select organism and assembly
- Select group: Genes and Gene Predictions
- Select track: RefSeq Genes or Ensemble Genes
- Select table: refGene or ensGene
- Select region: genome
- Select output format: all fields from selected table
- Fill name with extension ".gene_table' e.g. danRer10.gene_table
- Get output
mkdir genePred_folder
Save file.gene_table
inside of genePred_folder
.
Download *.2bit compressed genome:
- Select organism in complete annotation sets section
- Select Full data set
- download *.2bit file
mkdir 2bit_folder
wget -P 2bit_folder http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/latest/hg38.2bit
Create fasta version of genome by running twoBitToFa on *.2bit file
./twoBitToFa 2bit_folder/hg38.2bit hg38.fasta
Make bowtie compressed version of genome using your new *.fasta file
mkdir ebwt_folder
./bowtie/bowtie-build hg38.fasta ebwt_folder/hg38
Change config.json
file, replace paths with your own for .2bit genome files, bowtie (.ewbt) genome files and *.gene_table files
Observe config.json
in order to see an example.
Make sure all these files and programs have proper access rights. You can use the chmod
command in order to change permissions. Maybe some packages may require compilation for your operating system.
You must run this in your terminal shell and in gen must type the name of the interest gen (be carefull, you must write gene name correctly, some genes have several names, but it is only in one way).
./chopchop.py -G hg38 -o results -Target <gen> --scoringMethod DOENCH_2016 -consensusUnion -t CODING > results/<gen>.txt
- -G is the genome to search
- -o output folder
- -Target Target genes or regions
- -t Target the whole gene CODING/WHOLE/UTR5/UTR3/SPLICE
- -consensusUnion this option specifies union of isoforms
When the gene is very small, the design the guides will fail and -t WHOLE is recommended.
You must run this in your terminal shell and in gen must type the name of the interest gen (be carefull, you must write gene name correctly, some genes have several names, but it is only in one way).
./chopchop.py -G hg38 -o results -Target <gen> --scoringMethod DOENCH_2016 -consensusUnion -t PROMOTER -TDP 0 -TUP 300 > results/<gen>.txt
- -t Promoter
- -TDP how many bp to target downstream of TSS
- -TUP how many bp to target upstream of TSS
You must run this in your terminal shell and type interest genes separated by spaces.
bash chop_pipeline.sh <gen1> <gen2> <gen3> <gen4>
chopchop.py
has a lot of funtionalities and arguments that you can change, it would be well for you observe this in the chochop link or doing this:
./chopchop.py --help