Wellderly_analysis

Extract Wellderly genotypes only, remove variants that are not found in the wellderly, transform to vcf

python Create_job_extract_wellderly_vcf.py

Parse to vcf all of the data

python Create_jobs_parse_genomeComb.py

Sort wellderly vcf

python Create_jobs_sort_vcf.py

Extract the varinats with at least one VQHIGH in white individuals

python remove_vqlow.py

*Extract variants that are clustered in >0.1 wellderly or inova

python create_jobs_remove_clustered.py

*Extract the variants with AF >0.01 python create_jobs_0.01AF.py ---> DID'T work with vcftools, will have to do it manually

*Extract the repeats, homopolymers, etc python Create_jobs_extractRepeats_etc.py

*Count the number of VQHIGH passed filters by AF python create_jobs_count_totalVQHIGH_byAF.py

*Count rest of the filters (in the Count_filters folder)

*Remove variants with >10% missing in either wellderly or inova python Create_jobs_extract_missing.py

*Remove variants with coverage <10 or >100 python create_jobs_remove_coverage.py

Extract snp position based on rsID python ./snps_of_interest/Extract_position_of_snp.py

*Extract the snps of interest python Exract_snps_of_interest.py

*Extracting separatelly snps and delins with AF>0.01 python Extract_snpsOnly_AFmoreThen0.01.py

*Concatenating the vcf file by chrom into a final one: vcf-concat vcf_snps_AF0.01.chr1.vcf.gz vcf_snps_AF0.01.chr2.vcf.gz vcf_snps_AF0.01.chr3.vcf.gz vcf_snps_AF0.01.chr4.vcf.gz vcf_snps_AF0.01.chr5.vcf.gz vcf_snps_AF0.01.chr6.vcf.gz vcf_snps_AF0.01.chr7.vcf.gz vcf_snps_AF0.01.chr8.vcf.gz vcf_snps_AF0.01.chr9.vcf.gz vcf_snps_AF0.01.chr10.vcf.gz vcf_snps_AF0.01.chr11.vcf.gz vcf_snps_AF0.01.chr12.vcf.gz vcf_snps_AF0.01.chr13.vcf.gz vcf_snps_AF0.01.chr14.vcf.gz vcf_snps_AF0.01.chr15.vcf.gz vcf_snps_AF0.01.chr16.vcf.gz vcf_snps_AF0.01.chr17.vcf.gz vcf_snps_AF0.01.chr18.vcf.gz vcf_snps_AF0.01.chr19.vcf.gz vcf_snps_AF0.01.chr20.vcf.gz vcf_snps_AF0.01.chr21.vcf.gz vcf_snps_AF0.01.chr22.vcf.gz | gzip -c >final_vcf_allChrom_snps_AF0.01.vcf.gz

*Run the first step of the association on all data: python create_job_association_final.py

*Second step: python run_association.py

*Association shows biggest p-values in repeat regions, removing all of them --> skiped this for final analysis: python ./association/create_jobs_remove_ALLrepeats_association.py

*Extracted the related individuals <-- This doesn't work /gpfs/group/stsi/data/projects/wellderly/GenomeComb/vcf_snps_AFmore0.01> vcftools --gzvcf final_vcf_allChrom_snps_AF0.01.vcf.gz --remove eliminate_individuals.txt --out final_vcf_allChroms_snps_AD0.01_noRelated.vcf.gz

*Extract related better version python Excluded_related.py

*Concatenate files python concatenate.py

*Extracted inova coverage

*Concatenating v1 (still need to concatenate chr1-chr4) vcf-concat final_vcf_nokmer_snps_AF0.01.noRelated.chr5.vcf.gz final_vcf_nokmer_snps_AF0.01.noRelated.chr6.vcf.gz final_vcf_nokmer_snps_AF0.01.noRelated.chr7.vcf.gz final_vcf_nokmer_snps_AF0.01.noRelated.chr8.vcf.gz final_vcf_nokmer_snps_AF0.01.noRelated.chr9.vcf.gz final_vcf_nokmer_snps_AF0.01.noRelated.chr10.vcf.gz final_vcf_nokmer_snps_AF0.01.noRelated.chr11.vcf.gz final_vcf_nokmer_snps_AF0.01.noRelated.chr12.vcf.gz final_vcf_nokmer_snps_AF0.01.noRelated.chr13.vcf.gz final_vcf_nokmer_snps_AF0.01.noRelated.chr14.vcf.gz final_vcf_nokmer_snps_AF0.01.noRelated.chr16.vcf.gz final_vcf_nokmer_snps_AF0.01.noRelated.chr17.vcf.gz final_vcf_nokmer_snps_AF0.01.noRelated.chr18.vcf.gz final_vcf_nokmer_snps_AF0.01.noRelated.chr19.vcf.gz final_vcf_nokmer_snps_AF0.01.noRelated.chr20.vcf.gz final_vcf_nokmer_snps_AF0.01.noRelated.chr21.vcf.gz final_vcf_nokmer_snps_AF0.01.noRelated.chr22.vcf.gz | gzip -c >final_vcf_nokmer_snps_AF0.01.noRelated.temp.vcf.gz

*Extract snps of interest from vcf file python Extract_snps_of_interest.vcf.py

*Extract coveage by individual snps of interest: python Extract_coverage_SnpSOfInterest_wellderly.py

*Calculate AF and median coverage (from median coverage file) SNPs of interest, missing geno and VQHIGH for both wellderly and inova: python Calculate_AF_welldVsInova.py

Match p-values from association based on location final_association.sh

*Extract rare variants python Create_jobs_extract_rareVariants.py

*Extract AF, p-value (from association) snps of interest python Extract_AF_p-values.py

ASSOCIATION ./association/final_association.sh

PATHWAY ANALYSIS FOR pathway analysis, extract genes/positions (in ~/wellderly/resources) mysql -h genome-mysql.cse.ucsc.edu -u genome -D hg19 -N -A -e 'select kgXref.kgID, kgXref.geneSymbol,knownGene.name,knownGene.chrom,knownGene.txStart,knownGene.txEnd from kgXref, knownGene where knownGene.name=kgXref.kgID' >genes_positions.txt

Extract genes with 100kb interval (from pathway analysis folder) python split_UCSC_geneByChrom.py

*Add gene to bim file (not needed in the end) python add_gene_to_bim.py

*Generate 10k *.pheno files and run the simulation python generate_pheno_files.py

*For pathway analysis read ./pathway_analysis/pathway_analysis.sh

TABLE 1 and 2 *Extract all filters: python Create_jobs_apply_all_filters.py

FOR Rare variants *Extracting ALL clustered variants: python ./Rare_variant_analysis/Create_jobs_remove_ALL_clustered.py

*Extract the variants removed by allele depth filter python ./Table2/Create_jobs_filter_by_AD.py

*Extract the AF after all filters except AD filter after removing all of the variants with 0.0 AF in both populations python ./Table2/Create_jobs_extract_AF_by_var.py

*Generate table2 counts python ./Table2/Create_jobs_table2.py

*Extracting snp position for cognitive snps python ./pathway_analysis/CognitiveSnps/Extract_snp_position.py

#Extract cognitive snps p-values from all 10k simulations python ./pathway_analysis/CognitiveSnps/Create_jobs_extract_sim_pvalues.py

*Count 36mers: python ./Count_filters/Create_jobs_count_36mers.py

*Count hwe: python ./Count_filters/Create_jobs_count_hwe.py

*Filter out HWE: python ./Create_jobs/Create_job_filter_HWE.py

*Split annotations by chromosomes python ./Create_jobs/Create_jobs_split_annotation_by_chrom.py

*Calculate the AF on the filtered out dataset with plink: python ./Create_jobs/Create_jobs_calcAF_cases_controls.py

*Suplimental table 2 python ./Create_jobs/Count_wellderly_characteristics.py

*Pathway analysis with variants inside genes only: python ./pathway_analysis/Reassign_genes/combine_simulations.py

*Redo pathway analysis

Apply the filters: missing/uncertain genotype > 10 perc in either wellderly or inova, covereage <10 or >100, whites only (testing 0.85 white and 0.95)

python Extract_white_filter.py

Run the plink analysis, maf > 0.01, in LD

python data_analysis.py

Add filters to the association file (reapeat, homopolymer, segDup, Microsat)

python Add_filters.py

Test different filters to see which one work better

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
Count_filters		Count_filters
Create_jobs		Create_jobs
Plots		Plots
Rare_variant_analysis		Rare_variant_analysis
Table2		Table2
annai_QC		annai_QC
association		association
coverage		coverage
filter_file		filter_file
pathway_analysis		pathway_analysis
snps_of_interest		snps_of_interest
split_files		split_files
Add_cluster_filter.py		Add_cluster_filter.py
Add_filters_after_assoc.py		Add_filters_after_assoc.py
Apply_ALL_filters_welld.py		Apply_ALL_filters_welld.py
Calculate_AF_cases_controls.py		Calculate_AF_cases_controls.py
Compile_annotation_files.py		Compile_annotation_files.py
Count_filters_by_AF.py		Count_filters_by_AF.py
Count_missense_annotations_etc.py		Count_missense_annotations_etc.py
Coverage_histogram.R		Coverage_histogram.R
Exclude_related.py		Exclude_related.py
Extract_White_filter_vcf.py		Extract_White_filter_vcf.py
Extract_coverage_files.py		Extract_coverage_files.py
Extract_homopolymer.py		Extract_homopolymer.py
Extract_missing.py		Extract_missing.py
Extract_noHomoRef.py		Extract_noHomoRef.py
Extract_rare_variants.py		Extract_rare_variants.py
Extract_snpsOnly_AFmoreThen0.01.py		Extract_snpsOnly_AFmoreThen0.01.py
Extract_snps_of_interest.py		Extract_snps_of_interest.py
Extract_white0.85.py		Extract_white0.85.py
Filter_vcf.py		Filter_vcf.py
Illumina_coverage.py		Illumina_coverage.py
README.md		README.md
Remove_clustered_variants.py		Remove_clustered_variants.py
Remove_repeats_homopoly_etc.py		Remove_repeats_homopoly_etc.py
Replace_VQLOW.py		Replace_VQLOW.py
Vcf_extract_white.py		Vcf_extract_white.py
concatenate.py		concatenate.py
coverage.sh		coverage.sh
coverageInfo.py		coverageInfo.py
create_job_histogram.py		create_job_histogram.py
data_analysis.py		data_analysis.py
exclude_nonWhite.py		exclude_nonWhite.py
extract_header.py		extract_header.py
filter_by_HWE.py		filter_by_HWE.py
filter_clustered.py		filter_clustered.py
parse_genome_comb.py		parse_genome_comb.py
parse_wellderly_from_CG.py		parse_wellderly_from_CG.py
remove_36kmer.py		remove_36kmer.py
remove_coverage.py		remove_coverage.py
remove_vqlow.py		remove_vqlow.py
sort_wellderly.py		sort_wellderly.py
split_annotation_by_chrom.py		split_annotation_by_chrom.py
split_byChrom.py		split_byChrom.py
split_by_chrom_v1.py		split_by_chrom_v1.py
vcf_for_cypher.py		vcf_for_cypher.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wellderly_analysis

About

Releases

Packages

Languages

gerikson/Wellderly_analysis

Folders and files

Latest commit

History

Repository files navigation

Wellderly_analysis

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages