Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

include "ignore this xxx" in the sample and taxon sheets #103

Open
tobiasgf opened this issue May 7, 2024 · 2 comments
Open

include "ignore this xxx" in the sample and taxon sheets #103

tobiasgf opened this issue May 7, 2024 · 2 comments

Comments

@tobiasgf
Copy link
Collaborator

tobiasgf commented May 7, 2024

User often have control samples or other samples they wish to exclude from the GBIF published dataset, but wish to keep in the upload data for completeness.
The same goes for specific taxa (contaminants).
Could we have an optional "ignore" field for both the sample and taxon sheets, that would mean that any content in that field would exclude the corresponding row entirely from the data that eventually goes into the DwC-A?

Potential problems:

  • Excluding taxa/sequences may heavily influence the total read count per sample if this number is calculated after the removal.
@tobiasgf
Copy link
Collaborator Author

tobiasgf commented Jun 28, 2024

The new field for the taxon sheet could be called e.g. excludedTaxon, exclusionCriterion, or similar. (or do we already have something that could be used (with caution): e.g. identificationRemarks
Values could be:
Known contaminantion
Suspected contamination
Spurious detection
Habitat mismatch
non-indigenous
Suspicious sequence
non-target
low abundance
low frequency
positive control
other
...

For the sample data we could include the two existing fields: neg_cont_type and pos_cont_type, and exclude (from the generated DwC) all samples that carry any value in either of these fields.
Values for neg_cont_type could be:
field blank,
blank filter,
extraction blank,
PCR non template control
other

Values for pos_cont_type could be free text (a string of taxon names, the name of a known positive control mock sample, etc.

@tobiasgf
Copy link
Collaborator Author

This issue needs to be updated/aligned with work in the FAIR eDNA project before starting to work on it. The overall approach and aim is still the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant