-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathVariant to MIGRATE_N format
91 lines (57 loc) · 2.79 KB
/
Variant to MIGRATE_N format
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
1 . first convert vcf data to tabular data using vcftools
cat MAF_Filtered.vcf.gatk6262 | vcf-to-tab > MAF_Filtered.vcf.gatk6262.tab
output
#CHROM POS REF 1-PA37-Port.fq 10-S7Castl-1-Port.fq
scaffold_1 13090 A A/C C/C C/C C/C
scaffold_1 14491 C C/A C/C C/C C/C
scaffold_1 14502 C C/T C/C C/C C/C
scaffold_1 14658 T T/T ./. T/G T/G
.
.
.
2. once a table is generated use vcf_tab_to_fasta_alignment.pl script to convert to fasta alignment
{https://code.google.com/archive/p/vcf-tab-to-fasta/}
vcf_tab_to_fasta_alignment.pl -i MAF_Filtered.vcf.gatk6262.tab > MAF_Filtered.vcf.gatk6262.fasta
output
>1-PA37-Port.fq
MMYTGKTACRGRATGGTCCGCYCKYARYGYMTATCACRYGYAGTMRRGYYMRYRMCMTGTYGAMRAAGSGTYTGGRTYGYCCGCGCTCRCYCCYMAGGCG
YKGKKCAGAYYCYWYACGCKTRKWSRRARGGAYGGTRAGAGKGCRTRSGTTRCGRWRSCATTATTTCACRYGCYGGGRATACMGTAGYTSCGCMGRARCC
ACGGCGMYAGCGYTACCACYGYKRTARSTAYCCCGCACCRGSCGTGCGARRRGGGGYTARRAYRGWAGARGMCAWCTGRCCYCGYGGYGRYCRYRKTCRT
CYGG
>10-S7Castl-1-Port.fq
CCCGWACRRGRRRGGTCCSCYYKYRYGYATRWCAYGYRTAGTAGAGYYMATGCCMTGKCRAMRACASRTYTRGRWCGYYYRYRYYCGCYSCTCRGGCGTK
GKTSRRMCCYYWTACRYKTRKSRAAGGGACGGTRAGWGGGSRWGGGYTGCGGWGGCAYTRCYYCACATGCYGGGARYRMMGYRRYWGCGYMGRGGCCAYG
GTGTAGCGTTMACMYTGYKGTARSTACCYCGCRCCRGSCGWRCGRGGGKGGGYTAARAYRRWARARGAYWWCWGGYYYCKCGGYGAYYRYRGWCGTCTAR
GAACCRGRCGCRTYTGGTGYGGRRRYYYYGYTYYTRSYCWMYRTCRGYYCCRYYWRWCCGCCCGRGCRRSAAYGCCWYGRAGGCCTTTGACRYGGTTCSC
MCYGGGYYCAYSYAGAYKWWGRAWYRCGRTMTTTSTCSRMKRCCYKAYYCTGCTGCATAYCYAYYATAKWYRRCYRGYRCGGCRKGYRYWSYRYRYGAGC
YRYAAA
3. Align fasta sequence using MUSCLE and convert to phylip format
4. cat MAF_Filtered.vcf.gatk6262.fasta | grep "Tai" > Taiwan.txt # create list of samples per population
Taiwan.txt
>100-TW20-NTaiw.fq
>101-TW21-NTaiw.fq
>102-TW22-NTaiw.fq
>103-TW27-NTaiw.fq
>104-TW80-NETaiw.fq
>106-TW82bis-NETaiw.fq
>107-TW84-NETaiw.fq
>138-TW203-STaiw.fq
>139-TW206-STaiw.fq
>140-TW207-STaiw.fq
>141-TW210-STaiw.fq
>142-TW212-STaiw.fq
## Try renaming samples name because later on MIGRATEN might throw an error if names are > 10 characaters
4. Extract fasta for these samples / independent fasta file per population
cut -c 2- Taiwan.txt | xargs -n 1 samtools faidx MAF_Filtered.vcf.gatk6262.fasta.aligned > Taiwan.fasta
awk -F"-" '{print $2}' Vietnam.txt # Editing file names
5. fasta2tab <.fa>
convert per population fasta to tab delimited txt file, where columns are id and sequence respectively.
## MIGRATE
MIGRATE needs two files:
1. input file formatted as below.
<n-POP> <NUMBER OF LOCI> <title for the output file>
<length of sequence>
<Numbe of samples for pop1> <Popname>
id-1 seq1
id-2 seq2
2. File specifying parameters, which include typically input file, model either to estimate or not theta and M, MCMC run, burnin , type of data etc