Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can a custom reference dataset be a probe set? #20

Open
vincianem opened this issue Jan 15, 2025 · 3 comments
Open

Can a custom reference dataset be a probe set? #20

vincianem opened this issue Jan 15, 2025 · 3 comments

Comments

@vincianem
Copy link

vincianem commented Jan 15, 2025

Hej Hej,

I have just starting to work with capture data, and I wish to use Captus to process them.

I have two datasets: one with flowering plants (Teucrium) and one with sea animals (Octocorallia). For the flowering plants, it is simple as Captus comes bundled with Mega353, but I wonder how to process for the Octocorallia. Can a probe set be used as a target file if i properly format the sequence names?

I am new to Captus pipeline and this type of data, so please correct me if I've gotten something wrong.

Best wishes,
Vinciane

@edgardomortiz
Copy link
Owner

Hi @vincianem

You can provide any lineage set from the BUSCO database (https://busco-data.ezlab.org/v5/data/lineages/), just download the tar.gz file and provide its path to Captus for extraction step as -n

Now, if you have a custom probeset you must provide the sequences (full locus sequence, e.g. CDS) from where the probes (120bp segments) were derived.

I hope this helps, do not hesitate to ask me is something is not clear

Edgardo

@vincianem
Copy link
Author

vincianem commented Jan 16, 2025

Hi @edgardomortiz

Thank you for your answer!

We used the octocoral v.2 probe set but the target file was not made available with the probe set. How one would proceed to create a robust target file from the probe set? For example, how did you proceed to create the SeedPlantsPTD?

I also have a question regarding the ploidy level. I have at least 2n and 4n in my dataset. How is variation in ploidy taken into account in Captus?

Best wishes,
Vinciane

@edgardomortiz
Copy link
Owner

Regarding the octocoral v2 probe set, I am not familiar with it but perhaps you can contact the authors or maybe the file is available as supplementary material with the paper where it was published? About ploidy, Captus can recover any number of divergent copies of a single locus, as long as they are different enough to be assembled as separate contigs.

In the case of the plastome proteins I downloaded all the plastome proteins available in GenBank and then clustered and manually curated the clusters.

Edgardo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants