Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metadata to extracted overhang reads #22

Open
Adamtaranto opened this issue Nov 11, 2023 · 1 comment
Open

Add metadata to extracted overhang reads #22

Adamtaranto opened this issue Nov 11, 2023 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@Adamtaranto
Copy link
Owner

teloclip-extract will write reads with terminal clipped-overhangs to fasta files. To aide in sanity checks we should include some metadata to the fasta header.

Proposed fields:

  • Anchor len: how much of the read is aligned to the ref sequence
  • Overhang len: how long is the overhang
  • Ref name: which sequence is it aligned to
  • Motifs : Motifs of regex patterns used in filtering (comma delim list)
  • Motif counts: Total count of motif matches in overhang (list matching order of motifs)

Other behaviour changes:

  • Sort fasta by overhang length
  • Include reference segment in output fasta (from earliest alignment start coord)
  • Log total read counts for each end of each reference sequence.
  • Print histogram of overhang depths
  • Log warning if unbalanced overhangs (i.e. most ends have 5 and one has 500 reads)
@Adamtaranto Adamtaranto added the enhancement New feature or request label Nov 11, 2023
@Adamtaranto Adamtaranto self-assigned this Nov 11, 2023
@Adamtaranto Adamtaranto added this to the v0.0.5 release milestone Nov 12, 2023
@Adamtaranto
Copy link
Owner Author

Include option to only output longest overhang per contig end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant