diff --git a/template/canada_covid19/exampleInput/README.md b/template/canada_covid19/exampleInput/README.md index 5167b231..f7f43816 100644 --- a/template/canada_covid19/exampleInput/README.md +++ b/template/canada_covid19/exampleInput/README.md @@ -1,7 +1,7 @@ # CanCOGeN Example Input Data This directory contains example input/test data for the Canadian COVID Genomics Network (CanCOGeN) DataHarmonizer application template: `CanCOGeN Covid-19`. This data is appropriate for testing up to the version appended to the end of the file name, for example: -- `validTestData_0-15-4.csv` is _valid_ for version `0.15.4` of the DataHarmonizer. +- `validTestData_0-15-5.csv` is _valid_ for version `0.15.5` of the DataHarmonizer. ## Valid Test Data diff --git a/template/canada_covid19/exampleInput/invalidTestData_0-15-4.csv b/template/canada_covid19/exampleInput/invalidTestData_0-15-5.csv similarity index 63% rename from template/canada_covid19/exampleInput/invalidTestData_0-15-4.csv rename to template/canada_covid19/exampleInput/invalidTestData_0-15-5.csv index c837fc92..a9dd3743 100644 --- a/template/canada_covid19/exampleInput/invalidTestData_0-15-4.csv +++ b/template/canada_covid19/exampleInput/invalidTestData_0-15-5.csv @@ -1,5 +1,5 @@ Database Identifiers,,,,,,,,,,,,Sample collection and processing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Host Information,,,,,,,,,,,,,,,,,Host vaccination information,,,,,,,,,,,Host exposure information,,,,,,,,,,,,,,Host reinfection information,,,,,,Sequencing,,,,,,,,,,,,Bioinformatics and QC metrics,,,,,,,,,,,,,,,,,,,,,Lineage and Variant information,,,,,,Pathogen diagnostic testing,,,,,,,,,Contributor acknowledgement, specimen collector sample ID,third party lab service provider name,third party lab sample ID,case ID,Related specimen primary ID,IRIDA sample name,umbrella bioproject accession,bioproject accession,biosample accession,SRA accession,GenBank accession,GISAID accession,sample collected by,sample collector contact email,sample collector contact address,sequence submitted by,sequence submitter contact email,sequence submitter contact address,sample collection date,sample collection date precision,sample received date,geo_loc_name (country),geo_loc_name (state/province/territory),geo_loc_name (city),organism,isolate,purpose of sampling,purpose of sampling details,NML submitted specimen type,Related specimen relationship type,anatomical material,anatomical part,body product,environmental material,environmental site,collection device,collection method,collection protocol,specimen processing,specimen processing details,lab host,passage number,passage method,biomaterial extracted,host (common name),host (scientific name),host health state,host health status details,host health outcome,host disease,host age,host age unit,host age bin,host gender,host residence geo_loc name (country),host residence geo_loc name (state/province/territory),host subject ID,symptom onset date,signs and symptoms,pre-existing conditions and risk factors,complications,host vaccination status,number of vaccine doses received,vaccination dose 1 vaccine name,vaccination dose 1 vaccination date,vaccination dose 2 vaccine name,vaccination dose 2 vaccination date,vaccination dose 3 vaccine name,vaccination dose 3 vaccination date,vaccination dose 4 vaccine name,vaccination dose 4 vaccination date,vaccination history,location of exposure geo_loc name (country),destination of most recent travel (city),destination of most recent travel (state/province/territory),destination of most recent travel (country),most recent travel departure date,most recent travel return date,travel point of entry type,border testing test day type,travel history,exposure event,exposure contact level,host role,exposure setting,exposure details,prior SARS-CoV-2 infection,prior SARS-CoV-2 infection isolate,prior SARS-CoV-2 infection date,prior SARS-CoV-2 antiviral treatment,prior SARS-CoV-2 antiviral treatment agent,prior SARS-CoV-2 antiviral treatment date,purpose of sequencing,purpose of sequencing details,sequencing date,library ID,amplicon size,library preparation kit,flow cell barcode,sequencing instrument,sequencing protocol name,sequencing protocol,sequencing kit number,amplicon pcr primer scheme,raw sequence data processing method,dehosting method,consensus sequence name,consensus sequence filename,consensus sequence filepath,consensus sequence software name,consensus sequence software version,breadth of coverage value,depth of coverage value,depth of coverage threshold,r1 fastq filename,r2 fastq filename,r1 fastq filepath,r2 fastq filepath,fast5 filename,fast5 filepath,number of base pairs sequenced,consensus genome length,Ns per 100 kbp,reference genome accession,bioinformatics protocol,lineage/clade name,lineage/clade analysis software name,lineage/clade analysis software version,variant designation,variant evidence,variant evidence details,gene name 1,diagnostic pcr protocol 1,diagnostic pcr Ct value 1,gene name 2,diagnostic pcr protocol 2,diagnostic pcr Ct value 2,gene name 3,diagnostic pcr protocol 3,diagnostic pcr Ct value 3,authors,DataHarmonizer provenance -sample123,Switch Health,abc12345,case4444,NMLsample2222,prov_rona_99,PRJNA623807,PRJNA608651,SAMN14180202,SRR11177792,MN908947.3,EPI_ISL_436489,,switch@email.ca,"123 Main Street, City, Province",National Microbiology Laboratory (NML),RespLab@lab.ca,"123 Sunnybrooke St, Toronto, Ontario, M4P 1L6, Canada",2018-03-01,,30-Apr,Canda,BC,Thunder Bay,Severe acute respiratory syndrome coronavirus 2,hCov-19/CANADA/BC-prov_rona_99/2020,Surveillance testing,Not Provided,Not Applicable, Reinfection testing,Not Applicable,Lungs,Not Applicable,Not Applicable,Not Applicable,Swab,Not Applicable,SOP123,Not Provided,Not Provided,Not Applicable,Not Applicable,Not Applicable,Not Provided,Batman,Homo chiroptera,Sick, Hospitalized (ICU),Recovered,,89,,80 - 89,Female,Cnada,British Columbia,PHN1234,2022-02-23,Cough;Fever,Not Provided,Not Provided,Fully Vaccinated,3,Pfizer-BioNTech (Comirnaty),2021-07-01,Pfizer-BioNTech (Comirnaty),2021-11-02,Moderna (Spikevax),2022-02-01,,,,United States of America,Portland,Oregon,United States of America,2022-03-02,05-2020,Air,day 10,,Occupational exposure (retail),direct human to human,Attendee,"Occupational, Residency or Patronage Exposure",,Prior infection,SARS-CoV-2/human/USA/CA-CDPH-001/2020,2021-06-01,Prior antiviral treatment,remdesivir,2021-06-05, Surveillance of international border crossing by air travel,Not Provided,,XYZ_123345,1200bp,Nextera XT,FAB06069, Illumina NextSeq 2000,SeqProt1234,"Genomes were generated through amplicon sequencing of 1200 bp amplicons with Freed schema primers. Libraries were created using Illumina DNA Prep kits, and sequence data was produced using Miseq Micro v2 (500 cycles) sequencing kits.",1234546,Freed,Trimmomatic 0.38,,ncov123assembly3,ncov123assembly.fasta,User/Documents/RespLab/Data/ncov123assembly.fasta,iVar,1.3,95%,400x,100x,ABC123_S1_L001_R1_001.fastq.gz,ABC123_S1_L001_R2_001.fastq.gz,/User/Documents/RespLab/Data/ABC123_S1_L001_R1_001.fastq.gz,/User/Documents/RespLab/Data/ABC123_S1_L001_R2_001.fastq.gz,rona123assembly.fast5,User/Documents/RespLab/Data/rona123assembly.fast5,387566,38677,330,NC_045512.2,https://github.com/phac-nml/ncov2019-artic-nf,B.1.1.7,Pangolin,2.1.10,VOC,Sequencing,"Lineage-defining mutations: ORF1ab (K1655N), Spike (K417N, E484K, N501Y, D614G, A701V), N (T205I), E (P71L).",E gene (orf4),,21.2,Spike (orf2),,19.2,,,,"Tejinder Singh, Fei Hu, Joe Blogs",DataHarmonizer provenance: v0.15.3 -sample1234,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,DataHarmonizer provenance: v0.15.3 -sample1234,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,DataHarmonizer provenance: v0.15.3 +sample123,Switch Health,abc12345,case4444,NMLsample2222,prov_rona_99,PRJNA623807,PRJNA608651,SAMN14180202,SRR11177792,MN908947.3,EPI_ISL_436489,SharEd hospital Laboratory,switch@email.ca,"123 Main Street, City, Province",National Microbiology Laboratory (NML),RespLab@lab.ca,"123 Sunnybrooke St, Toronto, Ontario, M4P 1L6, Canada",2018-03-01,,30-Apr,Canda,BC,Thunder Bay,Severe acute respiratory syndrome coronavirus 2,hCov-19/CANADA/BC-prov_rona_99/2020,Surveillance testing,Not Provided,Not Applicable, Reinfection testing,Not Applicable,Lungs,Not Applicable,Not Applicable,Not Applicable,Swab,Not Applicable,SOP123,Not Provided,Not Provided,Not Applicable,Not Applicable,Not Applicable,Not Provided,Batman,Homo chiroptera,Sick, Hospitalized (ICU),Recovered,,89,,80 - 89,Female,Cnada,British Columbia,PHN1234,2022-02-23,Cough;Fever,Not Provided,Not Provided,Fully Vaccinated,3,Pfizer-BioNTech (Comirnaty),2021-07-01,Pfizer-BioNTech (Comirnaty),2021-11-02,Moderna (Spikevax),2022-02-01,,,,United States of America,Portland,Oregon,United States of America,2022-03-02,05-2020,Air,day 10,,Occupational exposure (retail),direct human to human,Attendee,"Occupational, Residency or Patronage Exposure",,Prior infection,SARS-CoV-2/human/USA/CA-CDPH-001/2020,2021-06-01,Prior antiviral treatment,remdesivir,2021-06-05, Surveillance of international border crossing by air travel,Not Provided,,XYZ_123345,1200bp,Nextera XT,FAB06069, Illumina NextSeq 2000,SeqProt1234,"Genomes were generated through amplicon sequencing of 1200 bp amplicons with Freed schema primers. Libraries were created using Illumina DNA Prep kits, and sequence data was produced using Miseq Micro v2 (500 cycles) sequencing kits.",1234546,Freed,Trimmomatic 0.38,,ncov123assembly3,ncov123assembly.fasta,User/Documents/RespLab/Data/ncov123assembly.fasta,iVar,1.3,95%,400x,100x,ABC123_S1_L001_R1_001.fastq.gz,ABC123_S1_L001_R2_001.fastq.gz,/User/Documents/RespLab/Data/ABC123_S1_L001_R1_001.fastq.gz,/User/Documents/RespLab/Data/ABC123_S1_L001_R2_001.fastq.gz,rona123assembly.fast5,User/Documents/RespLab/Data/rona123assembly.fast5,387566,38677,330,NC_045512.2,https://github.com/phac-nml/ncov2019-artic-nf,B.1.1.7,Pangolin,2.1.10,VOC,Sequencing,"Lineage-defining mutations: ORF1ab (K1655N), Spike (K417N, E484K, N501Y, D614G, A701V), N (T205I), E (P71L).",E gene (orf4),,21.2,Spike (orf2),,19.2,,,,"Tejinder Singh, Fei Hu, Joe Blogs",DataHarmonizer provenance: v0.15.4 +sample1234,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,DataHarmonizer provenance: v0.15.4 +sample1234,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,DataHarmonizer provenance: v0.15.4 diff --git a/template/canada_covid19/exampleInput/validTestData_0-15-4.xlsx b/template/canada_covid19/exampleInput/validTestData_0-15-4.xlsx deleted file mode 100644 index badc87fe..00000000 Binary files a/template/canada_covid19/exampleInput/validTestData_0-15-4.xlsx and /dev/null differ diff --git a/template/canada_covid19/exampleInput/validTestData_0-15-4.csv b/template/canada_covid19/exampleInput/validTestData_0-15-5.csv similarity index 62% rename from template/canada_covid19/exampleInput/validTestData_0-15-4.csv rename to template/canada_covid19/exampleInput/validTestData_0-15-5.csv index 9f2a2cb6..0e682667 100644 --- a/template/canada_covid19/exampleInput/validTestData_0-15-4.csv +++ b/template/canada_covid19/exampleInput/validTestData_0-15-5.csv @@ -1,3 +1,3 @@ Database Identifiers,,,,,,,,,,,,Sample collection and processing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Host Information,,,,,,,,,,,,,,,,,Host vaccination information,,,,,,,,,,,Host exposure information,,,,,,,,,,,,,,Host reinfection information,,,,,,Sequencing,,,,,,,,,,,,Bioinformatics and QC metrics,,,,,,,,,,,,,,,,,,,,,Lineage and Variant information,,,,,,Pathogen diagnostic testing,,,,,,,,,Contributor acknowledgement, specimen collector sample ID,third party lab service provider name,third party lab sample ID,case ID,Related specimen primary ID,IRIDA sample name,umbrella bioproject accession,bioproject accession,biosample accession,SRA accession,GenBank accession,GISAID accession,sample collected by,sample collector contact email,sample collector contact address,sequence submitted by,sequence submitter contact email,sequence submitter contact address,sample collection date,sample collection date precision,sample received date,geo_loc_name (country),geo_loc_name (state/province/territory),geo_loc_name (city),organism,isolate,purpose of sampling,purpose of sampling details,NML submitted specimen type,Related specimen relationship type,anatomical material,anatomical part,body product,environmental material,environmental site,collection device,collection method,collection protocol,specimen processing,specimen processing details,lab host,passage number,passage method,biomaterial extracted,host (common name),host (scientific name),host health state,host health status details,host health outcome,host disease,host age,host age unit,host age bin,host gender,host residence geo_loc name (country),host residence geo_loc name (state/province/territory),host subject ID,symptom onset date,signs and symptoms,pre-existing conditions and risk factors,complications,host vaccination status,number of vaccine doses received,vaccination dose 1 vaccine name,vaccination dose 1 vaccination date,vaccination dose 2 vaccine name,vaccination dose 2 vaccination date,vaccination dose 3 vaccine name,vaccination dose 3 vaccination date,vaccination dose 4 vaccine name,vaccination dose 4 vaccination date,vaccination history,location of exposure geo_loc name (country),destination of most recent travel (city),destination of most recent travel (state/province/territory),destination of most recent travel (country),most recent travel departure date,most recent travel return date,travel point of entry type,border testing test day type,travel history,exposure event,exposure contact level,host role,exposure setting,exposure details,prior SARS-CoV-2 infection,prior SARS-CoV-2 infection isolate,prior SARS-CoV-2 infection date,prior SARS-CoV-2 antiviral treatment,prior SARS-CoV-2 antiviral treatment agent,prior SARS-CoV-2 antiviral treatment date,purpose of sequencing,purpose of sequencing details,sequencing date,library ID,amplicon size,library preparation kit,flow cell barcode,sequencing instrument,sequencing protocol name,sequencing protocol,sequencing kit number,amplicon pcr primer scheme,raw sequence data processing method,dehosting method,consensus sequence name,consensus sequence filename,consensus sequence filepath,consensus sequence software name,consensus sequence software version,breadth of coverage value,depth of coverage value,depth of coverage threshold,r1 fastq filename,r2 fastq filename,r1 fastq filepath,r2 fastq filepath,fast5 filename,fast5 filepath,number of base pairs sequenced,consensus genome length,Ns per 100 kbp,reference genome accession,bioinformatics protocol,lineage/clade name,lineage/clade analysis software name,lineage/clade analysis software version,variant designation,variant evidence,variant evidence details,gene name 1,diagnostic pcr protocol 1,diagnostic pcr Ct value 1,gene name 2,diagnostic pcr protocol 2,diagnostic pcr Ct value 2,gene name 3,diagnostic pcr protocol 3,diagnostic pcr Ct value 3,authors,DataHarmonizer provenance -sample1234,Switch Health,abc12345,case4444,NMLsample2222,prov_rona_99,PRJNA623807,PRJNA608651,SAMN14180202,SRR11177792,MN908947.3,EPI_ISL_436489,Switch Health,switch@email.ca,"123 Main Street, City, Province",National Microbiology Laboratory (NML),RespLab@lab.ca,"123 Sunnybrooke St, Toronto, Ontario, M4P 1L6, Canada",2022-03-01,day,2022-03-15,Canada,British Columbia,Thunder Bay,Severe acute respiratory syndrome coronavirus 2,hCov-19/CANADA/BC-prov_rona_99/2020,Diagnostic testing,Not Provided,Not Applicable, Reinfection testing,Not Applicable,Nasopharynx (NP); Oropharynx (OP),Not Applicable,Not Applicable,Not Applicable,Swab,Not Applicable,SOP123,Not Provided,Not Provided,Not Applicable,Not Applicable,Not Applicable,Not Provided,Human,Homo sapiens,Symptomatic, Hospitalized (ICU),Recovered,COVID-19,34,year,30 - 39,Female,Canada,British Columbia,PHN1234,2022-02-23,Cough;Fever,Not Provided,Not Provided,Fully Vaccinated,3,Pfizer-BioNTech (Comirnaty),2021-07-01,Pfizer-BioNTech (Comirnaty),2021-11-02,Moderna (Spikevax),2022-02-01,,,,United States of America,Portland,Oregon,United States of America,2022-03-02,2022-03-11,Air,day 10,, Convention," Close contact (face-to-face, no direct contact)",Attendee,"Occupational, Residency or Patronage Exposure",,Prior infection,SARS-CoV-2/human/USA/CA-CDPH-001/2020,2021-06-01,Prior antiviral treatment,remdesivir,2021-06-05, Surveillance of international border crossing by air travel,Not Provided,2022-04-04,XYZ_123345,1200bp,Nextera XT,FAB06069, Illumina NextSeq 2000,SeqProt1234,"Genomes were generated through amplicon sequencing of 1200 bp amplicons with Freed schema primers. Libraries were created using Illumina DNA Prep kits, and sequence data was produced using Miseq Micro v2 (500 cycles) sequencing kits.",1234546,Freed,Trimmomatic 0.38,Nanostripper,ncov123assembly3,ncov123assembly.fasta,User/Documents/RespLab/Data/ncov123assembly.fasta,iVar,1.3,95%,400x,100x,ABC123_S1_L001_R1_001.fastq.gz,ABC123_S1_L001_R2_001.fastq.gz,/User/Documents/RespLab/Data/ABC123_S1_L001_R1_001.fastq.gz,/User/Documents/RespLab/Data/ABC123_S1_L001_R2_001.fastq.gz,rona123assembly.fast5,User/Documents/RespLab/Data/rona123assembly.fast5,387566,38677,330,NC_045512.2,https://github.com/phac-nml/ncov2019-artic-nf,B.1.1.7,Pangolin,2.1.10,Variant of Concern (VOC),Sequencing,"Lineage-defining mutations: ORF1ab (K1655N), Spike (K417N, E484K, N501Y, D614G, A701V), N (T205I), E (P71L).",E gene (orf4),,21.2, RdRp gene (nsp12),,19.2,,,,"Tejinder Singh, Fei Hu, Joe Blogs",DataHarmonizer provenance: v0.15.4 +sample1234,Switch Health,abc12345,case4444,NMLsample2222,prov_rona_99,PRJNA623807,PRJNA608651,SAMN14180202,SRR11177792,MN908947.3,EPI_ISL_436489,Shared Hospital Laboratory,shl@email.ca,"123 Main Street, City, Province",National Microbiology Laboratory (NML),RespLab@lab.ca,"123 Sunnybrooke St, Toronto, Ontario, M4P 1L6, Canada",2022-03-01,day,2022-03-15,Canada,British Columbia,Thunder Bay,Severe acute respiratory syndrome coronavirus 2,hCov-19/CANADA/BC-prov_rona_99/2020,Diagnostic testing,Not Provided,Not Applicable, Reinfection testing,Not Applicable, Nasopharynx (NP); Oropharynx (OP),Not Applicable,Not Applicable,Not Applicable,Swab,Not Applicable,SOP123,Not Provided,Not Provided,Not Applicable,Not Applicable,Not Applicable,Not Provided,Human,Homo sapiens,Symptomatic, Hospitalized (ICU),Recovered,COVID-19,34,year,30 - 39,Female,Canada,British Columbia,PHN1234,2022-02-23,Cough;Fever,Not Provided,Not Provided,Fully Vaccinated,3,Pfizer-BioNTech (Comirnaty),2021-07-01,Pfizer-BioNTech (Comirnaty),2021-11-02,Moderna (Spikevax),2022-02-01,,,,United States of America,Portland,Oregon,United States of America,2022-03-02,2022-03-11,Air,day 10,, Convention," Close contact (face-to-face, no direct contact)",Attendee,"Occupational, Residency or Patronage Exposure",,Prior infection,SARS-CoV-2/human/USA/CA-CDPH-001/2020,2021-06-01,Prior antiviral treatment,remdesivir,2021-06-05, Surveillance of international border crossing by air travel,Not Provided,2022-04-04,XYZ_123345,1200bp,Nextera XT,FAB06069, Illumina NextSeq 2000,SeqProt1234,"Genomes were generated through amplicon sequencing of 1200 bp amplicons with Freed schema primers. Libraries were created using Illumina DNA Prep kits, and sequence data was produced using Miseq Micro v2 (500 cycles) sequencing kits.",1234546,Freed,Trimmomatic 0.38,Nanostripper,ncov123assembly3,ncov123assembly.fasta,User/Documents/RespLab/Data/ncov123assembly.fasta,iVar,1.3,95%,400x,100x,ABC123_S1_L001_R1_001.fastq.gz,ABC123_S1_L001_R2_001.fastq.gz,/User/Documents/RespLab/Data/ABC123_S1_L001_R1_001.fastq.gz,/User/Documents/RespLab/Data/ABC123_S1_L001_R2_001.fastq.gz,rona123assembly.fast5,User/Documents/RespLab/Data/rona123assembly.fast5,387566,38677,330,NC_045512.2,https://github.com/phac-nml/ncov2019-artic-nf,B.1.1.7,Pangolin,2.1.10,Variant of Concern (VOC),Sequencing,"Lineage-defining mutations: ORF1ab (K1655N), Spike (K417N, E484K, N501Y, D614G, A701V), N (T205I), E (P71L).",E gene (orf4),,21.2, RdRp gene (nsp12),,19.2,,,,"Tejinder Singh, Fei Hu, Joe Blogs",DataHarmonizer provenance: v0.15.4 diff --git a/template/canada_covid19/exampleInput/validTestData_0-15-4.tsv b/template/canada_covid19/exampleInput/validTestData_0-15-5.tsv similarity index 62% rename from template/canada_covid19/exampleInput/validTestData_0-15-4.tsv rename to template/canada_covid19/exampleInput/validTestData_0-15-5.tsv index d622341c..93c63a15 100644 --- a/template/canada_covid19/exampleInput/validTestData_0-15-4.tsv +++ b/template/canada_covid19/exampleInput/validTestData_0-15-5.tsv @@ -1,3 +1,3 @@ Database Identifiers Sample collection and processing Host Information Host vaccination information Host exposure information Host reinfection information Sequencing Bioinformatics and QC metrics Lineage and Variant information Pathogen diagnostic testing Contributor acknowledgement specimen collector sample ID third party lab service provider name third party lab sample ID case ID Related specimen primary ID IRIDA sample name umbrella bioproject accession bioproject accession biosample accession SRA accession GenBank accession GISAID accession sample collected by sample collector contact email sample collector contact address sequence submitted by sequence submitter contact email sequence submitter contact address sample collection date sample collection date precision sample received date geo_loc_name (country) geo_loc_name (state/province/territory) geo_loc_name (city) organism isolate purpose of sampling purpose of sampling details NML submitted specimen type Related specimen relationship type anatomical material anatomical part body product environmental material environmental site collection device collection method collection protocol specimen processing specimen processing details lab host passage number passage method biomaterial extracted host (common name) host (scientific name) host health state host health status details host health outcome host disease host age host age unit host age bin host gender host residence geo_loc name (country) host residence geo_loc name (state/province/territory) host subject ID symptom onset date signs and symptoms pre-existing conditions and risk factors complications host vaccination status number of vaccine doses received vaccination dose 1 vaccine name vaccination dose 1 vaccination date vaccination dose 2 vaccine name vaccination dose 2 vaccination date vaccination dose 3 vaccine name vaccination dose 3 vaccination date vaccination dose 4 vaccine name vaccination dose 4 vaccination date vaccination history location of exposure geo_loc name (country) destination of most recent travel (city) destination of most recent travel (state/province/territory) destination of most recent travel (country) most recent travel departure date most recent travel return date travel point of entry type border testing test day type travel history exposure event exposure contact level host role exposure setting exposure details prior SARS-CoV-2 infection prior SARS-CoV-2 infection isolate prior SARS-CoV-2 infection date prior SARS-CoV-2 antiviral treatment prior SARS-CoV-2 antiviral treatment agent prior SARS-CoV-2 antiviral treatment date purpose of sequencing purpose of sequencing details sequencing date library ID amplicon size library preparation kit flow cell barcode sequencing instrument sequencing protocol name sequencing protocol sequencing kit number amplicon pcr primer scheme raw sequence data processing method dehosting method consensus sequence name consensus sequence filename consensus sequence filepath consensus sequence software name consensus sequence software version breadth of coverage value depth of coverage value depth of coverage threshold r1 fastq filename r2 fastq filename r1 fastq filepath r2 fastq filepath fast5 filename fast5 filepath number of base pairs sequenced consensus genome length Ns per 100 kbp reference genome accession bioinformatics protocol lineage/clade name lineage/clade analysis software name lineage/clade analysis software version variant designation variant evidence variant evidence details gene name 1 diagnostic pcr protocol 1 diagnostic pcr Ct value 1 gene name 2 diagnostic pcr protocol 2 diagnostic pcr Ct value 2 gene name 3 diagnostic pcr protocol 3 diagnostic pcr Ct value 3 authors DataHarmonizer provenance -sample1234 Switch Health abc12345 case4444 NMLsample2222 prov_rona_99 PRJNA623807 PRJNA608651 SAMN14180202 SRR11177792 MN908947.3 EPI_ISL_436489 Switch Health switch@email.ca 123 Main Street, City, Province National Microbiology Laboratory (NML) RespLab@lab.ca 123 Sunnybrooke St, Toronto, Ontario, M4P 1L6, Canada 2022-03-01 day 2022-03-15 Canada British Columbia Thunder Bay Severe acute respiratory syndrome coronavirus 2 hCov-19/CANADA/BC-prov_rona_99/2020 Diagnostic testing Not Provided Not Applicable Reinfection testing Not Applicable Nasopharynx (NP); Oropharynx (OP) Not Applicable Not Applicable Not Applicable Swab Not Applicable SOP123 Not Provided Not Provided Not Applicable Not Applicable Not Applicable Not Provided Human Homo sapiens Symptomatic Hospitalized (ICU) Recovered COVID-19 34 year 30 - 39 Female Canada British Columbia PHN1234 2022-02-23 Cough;Fever Not Provided Not Provided Fully Vaccinated 3 Pfizer-BioNTech (Comirnaty) 2021-07-01 Pfizer-BioNTech (Comirnaty) 2021-11-02 Moderna (Spikevax) 2022-02-01 United States of America Portland Oregon United States of America 2022-03-02 2022-03-11 Air day 10 Convention Close contact (face-to-face, no direct contact) Attendee Occupational, Residency or Patronage Exposure Prior infection SARS-CoV-2/human/USA/CA-CDPH-001/2020 2021-06-01 Prior antiviral treatment remdesivir 2021-06-05 Surveillance of international border crossing by air travel Not Provided 2022-04-04 XYZ_123345 1200bp Nextera XT FAB06069 Illumina NextSeq 2000 SeqProt1234 Genomes were generated through amplicon sequencing of 1200 bp amplicons with Freed schema primers. Libraries were created using Illumina DNA Prep kits, and sequence data was produced using Miseq Micro v2 (500 cycles) sequencing kits. 1234546 Freed Trimmomatic 0.38 Nanostripper ncov123assembly3 ncov123assembly.fasta User/Documents/RespLab/Data/ncov123assembly.fasta iVar 1.3 95% 400x 100x ABC123_S1_L001_R1_001.fastq.gz ABC123_S1_L001_R2_001.fastq.gz /User/Documents/RespLab/Data/ABC123_S1_L001_R1_001.fastq.gz /User/Documents/RespLab/Data/ABC123_S1_L001_R2_001.fastq.gz rona123assembly.fast5 User/Documents/RespLab/Data/rona123assembly.fast5 387566 38677 330 NC_045512.2 https://github.com/phac-nml/ncov2019-artic-nf B.1.1.7 Pangolin 2.1.10 Variant of Concern (VOC) Sequencing Lineage-defining mutations: ORF1ab (K1655N), Spike (K417N, E484K, N501Y, D614G, A701V), N (T205I), E (P71L). E gene (orf4) 21.2 RdRp gene (nsp12) 19.2 Tejinder Singh, Fei Hu, Joe Blogs DataHarmonizer provenance: v0.15.4 +sample1234 Switch Health abc12345 case4444 NMLsample2222 prov_rona_99 PRJNA623807 PRJNA608651 SAMN14180202 SRR11177792 MN908947.3 EPI_ISL_436489 Shared Hospital Laboratory shl@email.ca 123 Main Street, City, Province National Microbiology Laboratory (NML) RespLab@lab.ca 123 Sunnybrooke St, Toronto, Ontario, M4P 1L6, Canada 2022-03-01 day 2022-03-15 Canada British Columbia Thunder Bay Severe acute respiratory syndrome coronavirus 2 hCov-19/CANADA/BC-prov_rona_99/2020 Diagnostic testing Not Provided Not Applicable Reinfection testing Not Applicable Nasopharynx (NP); Oropharynx (OP) Not Applicable Not Applicable Not Applicable Swab Not Applicable SOP123 Not Provided Not Provided Not Applicable Not Applicable Not Applicable Not Provided Human Homo sapiens Symptomatic Hospitalized (ICU) Recovered COVID-19 34 year 30 - 39 Female Canada British Columbia PHN1234 2022-02-23 Cough;Fever Not Provided Not Provided Fully Vaccinated 3 Pfizer-BioNTech (Comirnaty) 2021-07-01 Pfizer-BioNTech (Comirnaty) 2021-11-02 Moderna (Spikevax) 2022-02-01 United States of America Portland Oregon United States of America 2022-03-02 2022-03-11 Air day 10 Convention Close contact (face-to-face, no direct contact) Attendee Occupational, Residency or Patronage Exposure Prior infection SARS-CoV-2/human/USA/CA-CDPH-001/2020 2021-06-01 Prior antiviral treatment remdesivir 2021-06-05 Surveillance of international border crossing by air travel Not Provided 2022-04-04 XYZ_123345 1200bp Nextera XT FAB06069 Illumina NextSeq 2000 SeqProt1234 Genomes were generated through amplicon sequencing of 1200 bp amplicons with Freed schema primers. Libraries were created using Illumina DNA Prep kits, and sequence data was produced using Miseq Micro v2 (500 cycles) sequencing kits. 1234546 Freed Trimmomatic 0.38 Nanostripper ncov123assembly3 ncov123assembly.fasta User/Documents/RespLab/Data/ncov123assembly.fasta iVar 1.3 95% 400x 100x ABC123_S1_L001_R1_001.fastq.gz ABC123_S1_L001_R2_001.fastq.gz /User/Documents/RespLab/Data/ABC123_S1_L001_R1_001.fastq.gz /User/Documents/RespLab/Data/ABC123_S1_L001_R2_001.fastq.gz rona123assembly.fast5 User/Documents/RespLab/Data/rona123assembly.fast5 387566 38677 330 NC_045512.2 https://github.com/phac-nml/ncov2019-artic-nf B.1.1.7 Pangolin 2.1.10 Variant of Concern (VOC) Sequencing Lineage-defining mutations: ORF1ab (K1655N), Spike (K417N, E484K, N501Y, D614G, A701V), N (T205I), E (P71L). E gene (orf4) 21.2 RdRp gene (nsp12) 19.2 Tejinder Singh, Fei Hu, Joe Blogs DataHarmonizer provenance: v0.15.4 diff --git a/template/canada_covid19/exampleInput/validTestData_0-15-4.xls b/template/canada_covid19/exampleInput/validTestData_0-15-5.xls similarity index 92% rename from template/canada_covid19/exampleInput/validTestData_0-15-4.xls rename to template/canada_covid19/exampleInput/validTestData_0-15-5.xls index e0d864fa..8658c791 100644 Binary files a/template/canada_covid19/exampleInput/validTestData_0-15-4.xls and b/template/canada_covid19/exampleInput/validTestData_0-15-5.xls differ diff --git a/template/canada_covid19/exampleInput/validTestData_0-15-5.xlsx b/template/canada_covid19/exampleInput/validTestData_0-15-5.xlsx new file mode 100644 index 00000000..21780fc3 Binary files /dev/null and b/template/canada_covid19/exampleInput/validTestData_0-15-5.xlsx differ