This repository has been archived by the owner on Mar 7, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Added zenodo badge. * Preparing for v0.1.0 * Fixed flake8 warnings. * Fixed typo. * Removed dead code (by vulture) * Fixed black and added CodeQL CD (#23) * Create codeql-analysis.yml * Update README.md * Fixed black issues. * Compiling pattern to regex after process submission * Fixed typo. * Moved common to separate sub-sub-module * Now supporting rich. * Fixed quality filter logging. * Updated final log. * Updated requirements. * Updated with poetry details and switch flake8 with black. * Updated GitHub Actions format check workflow. * Fixing workflow. * Simplified. * Improved rich log. * Implemented --flagstats option * Clarified help page * Fixed flagstat sorting. * Added --split-by option * Removed common submodule. * Implemented trimming by length or by quality. * Added emoticons at final message. * Setting up flag filter script. * Implemented flag_filter script. * Implemented flag subcommands. * Added logs when required flag is missing. * Prepared for v0.1.0 * Added docs folder. * Set theme jekyll-theme-cayman * Replicated basic README * Added usage page. * Added links and shortened feature list on GitHub repo README * Added filler text to showcase link bookmarks * Fixed link * Added minimum usage bash * Added TOCs * Moved zenodo badge * Added more badges. * Updated badges. * Updated badges. * Reordered badges. * Fixed mypy stub warning for rich and added compression level support. * Simplified compression level passage.
- Loading branch information
Showing
34 changed files
with
2,731 additions
and
672 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
theme: jekyll-theme-cayman | ||
title: "fastx-barber" | ||
description: "A package to trim and extract flags from FASTA and FASTQ files." | ||
google_analytics: "UA-99031010-8" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
<!-- MarkdownTOC --> | ||
|
||
- [Requirements](#requirements) | ||
- [Install](#install) | ||
- [Features](#features) | ||
- [Usage](#usage) | ||
- [Contributing](#contributing) | ||
- [License](#license) | ||
|
||
<!-- /MarkdownTOC --> | ||
|
||
--- | ||
|
||
[![DOI](https://zenodo.org/badge/281703558.svg)](https://zenodo.org/badge/latestdoi/281703558) ![](https://img.shields.io/librariesio/github/ggirelli/fastx-barber.svg?style=flat) ![](https://img.shields.io/github/license/ggirelli/fastx-barber.svg?style=flat) | ||
![](https://github.com/ggirelli/fastx-barber/workflows/Python%20package/badge.svg?branch=master&event=push) ![PyPI - Python Version](https://img.shields.io/pypi/pyversions/fastx-barber) ![PyPI - Format](https://img.shields.io/pypi/format/fastx-barber) ![PyPI - Status](https://img.shields.io/pypi/status/fastx-barber) | ||
![](https://img.shields.io/github/release/ggirelli/fastx-barber.svg?style=flat) ![](https://img.shields.io/github/release-date/ggirelli/fastx-barber.svg?style=flat) ![](https://img.shields.io/github/languages/code-size/ggirelli/fastx-barber.svg?style=flat) | ||
![](https://img.shields.io/github/watchers/ggirelli/fastx-barber.svg?label=Watch&style=social) ![](https://img.shields.io/github/stars/ggirelli/fastx-barber.svg?style=social) | ||
|
||
A Python3.6.1+ package to trim and extract flags from FASTA and FASTQ files. | ||
|
||
## Requirements | ||
|
||
`fastx-barber` has been tested with Python 3.6.1, 3.7, and 3.8. We recommend installing it using `pipx` (see [below](#install)) to avoid dependency conflicts with other packages. The packages it depends on are listed in our [dependency graph](https://github.com/ggirelli/fastx-barber/network/dependencies). We use [`poetry`](https://github.com/python-poetry/poetry) to handle our dependencies. | ||
|
||
## Install | ||
|
||
We recommend installing `fastx-barber` using [`pipx`](https://github.com/pipxproject/pipx). Check how to install `pipx` [here](https://github.com/pipxproject/pipx#install-pipx) if you don't have it yet! Once you have `pipx` ready on your system, install the latest stable release of `fastx-barber` by running: `pipx install fastx-barber`. If you see the stars (✨ 🌟 ✨), then the installation went well! | ||
|
||
## Features | ||
|
||
* Works on both FASTA and FASTQ files. | ||
* [Selects](usage#match) reads based on a pattern (regex). | ||
* [Trims](usage#trim) reads [by pattern](usage#trim-by-regular-expression) (regex), [length](usage#trim-by-length), or single-base [quality](usage#trim-by-quality). | ||
* [Extracts](usage#extract-flags) parts ([flags](usage#flags)) of reads based on a pattern, and stores them in the read headers. | ||
- Optionally extracts the corresponding portions of the quality string (only for fastq files). | ||
- Optionally filters based on quality score of extracted flags (only for fastq files). | ||
+ Supports Sanger QSCORE definition (not old Solexa/Illumina one). | ||
+ Supports custom PHRED offset. | ||
+ Optionally exports reads that do not pass the specified filters. | ||
- Optionally split output based on flag value. | ||
- Optionally calculates the frequency of each value of a set of flags (flagstats). | ||
- [Filtering by flag quality](usage#filter-by-flag-quality), [splitting by flag value](usage#split-by-flag-value), and [calculating flag value frequency](usage#calculate-flag-value-frequency) are all features available also as separate scripts. This allows to perform these operations on files with previously extracted flags. | ||
* [Filters a FASTX file with extracted flags by applying patterns to different flags](usage#match-flags-with-regular-expressions). | ||
* Regular expression support [*fuzzy* matching](https://pypi.org/project/regex/#approximate-fuzzy-matching-hg-issue-12-hg-issue-41-hg-issue-109) (*fuzzy matching* might affect the barber's speed). | ||
* Optionally exports reads that do not match the provided pattern(s). | ||
* Parallelizes processing by splitting the fastx file in chunks. | ||
|
||
## Usage | ||
|
||
Run: | ||
|
||
* `fbarber` to access the barber's services. | ||
* `fbarber flag` to extract or manipulate read flags. | ||
* `fbarber match` to select reads based on a pattern (regular expression). | ||
* `fbarber trim` to trim your reads. | ||
|
||
Add `-h` to see the full help page of a command or visit the [usage page](usage)! | ||
|
||
## Contributing | ||
|
||
We welcome any contributions to `fastx-barber`. In short, we use [`black`](https://github.com/psf/black) to standardize code format. Any code change also needs to pass `mypy` checks. For more details, please refer to our [contribution guidelines](https://github.com/ggirelli/fastx-barber/blob/master/CONTRIBUTING.md) if this is your first time contributing! Also, check out our [code of conduct](https://github.com/ggirelli/fastx-barber/blob/master/CODE_OF_CONDUCT.md). | ||
|
||
## License | ||
|
||
`MIT License - Copyright (c) 2020 Gabriele Girelli` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,140 @@ | ||
# Usage | ||
|
||
<!-- MarkdownTOC --> | ||
|
||
- [Match](#match) | ||
- [Trim](#trim) | ||
- [Trim by length](#trim-by-length) | ||
- [Trim by quality](#trim-by-quality) | ||
- [Trim by regular expression](#trim-by-regular-expression) | ||
- [Flags](#flags) | ||
- [Extract flags](#extract-flags) | ||
- [After flag extraction](#after-flag-extraction) | ||
- [Filter by flag quality](#filter-by-flag-quality) | ||
- [Match flags with regular expressions](#match-flags-with-regular-expressions) | ||
- [Split by flag value](#split-by-flag-value) | ||
- [Calculate flag value frequency](#calculate-flag-value-frequency) | ||
|
||
<!-- /MarkdownTOC --> | ||
|
||
## Match | ||
|
||
```bash | ||
usage: fbarber match [-h] [--pattern PATTERN] [--version] [--unmatched-output UNMATCHED_OUTPUT] | ||
[--compress-level COMPRESS_LEVEL] [--log-file LOG_FILE] [--chunk-size CHUNK_SIZE] | ||
[--threads THREADS] [--temp-dir TEMP_DIR] | ||
in.fastx[.gz] out.fastx[.gz] | ||
``` | ||
|
||
Lorem ipsum dolor sit amet, consectetur adipisicing, elit. Id ab, quod repellendus, autem obcaecati illo alias, ipsam vel asperiores iure dicta voluptatem nostrum suscipit, doloremque dolores tenetur omnis recusandae repudiandae! | ||
|
||
## Trim | ||
|
||
Lorem ipsum dolor sit, amet consectetur, adipisicing elit. Veritatis eaque modi ipsam sit laudantium consequatur accusamus voluptatibus aut suscipit! Autem iste minima laborum, quam magni doloribus consequatur eligendi asperiores sed! | ||
|
||
### Trim by length | ||
|
||
```bash | ||
usage: fbarber trim length [-h] [-l LENGTH] [-s {3,5}] [--version] [--compress-level COMPRESS_LEVEL] | ||
[--log-file LOG_FILE] [--chunk-size CHUNK_SIZE] [--threads THREADS] | ||
[--temp-dir TEMP_DIR] | ||
in.fastx[.gz] out.fastx[.gz] | ||
``` | ||
|
||
Lorem ipsum dolor sit, amet consectetur, adipisicing elit. Veritatis eaque modi ipsam sit laudantium consequatur accusamus voluptatibus aut suscipit! Autem iste minima laborum, quam magni doloribus consequatur eligendi asperiores sed! | ||
|
||
### Trim by quality | ||
|
||
```bash | ||
usage: fbarber trim quality [-h] [-q QSCORE] [-s {3,5}] [--version] [--phred-offset PHRED_OFFSET] | ||
[--compress-level COMPRESS_LEVEL] [--log-file LOG_FILE] | ||
[--chunk-size CHUNK_SIZE] [--threads THREADS] [--temp-dir TEMP_DIR] | ||
in.fastq[.gz] out.fastq[.gz] | ||
``` | ||
|
||
Lorem ipsum dolor sit, amet consectetur, adipisicing elit. Veritatis eaque modi ipsam sit laudantium consequatur accusamus voluptatibus aut suscipit! Autem iste minima laborum, quam magni doloribus consequatur eligendi asperiores sed! | ||
|
||
### Trim by regular expression | ||
|
||
```bash | ||
usage: fbarber trim regex [-h] [--pattern PATTERN] [--version] [--unmatched-output UNMATCHED_OUTPUT] | ||
[--compress-level COMPRESS_LEVEL] [--log-file LOG_FILE] | ||
[--chunk-size CHUNK_SIZE] [--threads THREADS] [--temp-dir TEMP_DIR] | ||
in.fastx[.gz] out.fastx[.gz] | ||
``` | ||
|
||
Lorem ipsum dolor sit, amet consectetur, adipisicing elit. Veritatis eaque modi ipsam sit laudantium consequatur accusamus voluptatibus aut suscipit! Autem iste minima laborum, quam magni doloribus consequatur eligendi asperiores sed! | ||
|
||
## Flags | ||
|
||
Lorem ipsum dolor sit amet, consectetur, adipisicing elit. Voluptate consectetur adipisci maxime ducimus voluptatem vero illo recusandae accusamus dolores rerum nemo similique vel amet, quidem possimus eligendi veniam quae officia. | ||
|
||
### Extract flags | ||
|
||
```bash | ||
usage: fbarber flag extract [-h] [--pattern PATTERN] [--version] [--unmatched-output UNMATCHED_OUTPUT] | ||
[--flag-delim FLAG_DELIM] | ||
[--selected-flags SELECTED_FLAGS [SELECTED_FLAGS ...]] | ||
[--flagstats FLAGSTATS [FLAGSTATS ...]] [--split-by SPLIT_BY] | ||
[--filter-qual-flags FILTER_QUAL_FLAGS [FILTER_QUAL_FLAGS ...]] | ||
[--filter-qual-output FILTER_QUAL_OUTPUT] [--phred-offset PHRED_OFFSET] | ||
[--no-qual-flags] [--comment-space COMMENT_SPACE] | ||
[--compress-level COMPRESS_LEVEL] [--log-file LOG_FILE] | ||
[--chunk-size CHUNK_SIZE] [--threads THREADS] [--temp-dir TEMP_DIR] | ||
in.fastx[.gz] out.fastx[.gz] | ||
``` | ||
Lorem ipsum dolor sit, amet consectetur, adipisicing elit. Veritatis eaque modi ipsam sit laudantium consequatur accusamus voluptatibus aut suscipit! Autem iste minima laborum, quam magni doloribus consequatur eligendi asperiores sed! | ||
### After flag extraction | ||
Lorem ipsum dolor sit, amet consectetur, adipisicing elit. Veritatis eaque modi ipsam sit laudantium consequatur accusamus voluptatibus aut suscipit! Autem iste minima laborum, quam magni doloribus consequatur eligendi asperiores sed! | ||
#### Filter by flag quality | ||
```bash | ||
usage: fbarber flag filter [-h] [--version] [--flag-delim FLAG_DELIM] [--comment-space COMMENT_SPACE] | ||
[--filter-qual-flags FILTER_QUAL_FLAGS [FILTER_QUAL_FLAGS ...]] | ||
[--filter-qual-output FILTER_QUAL_OUTPUT] [--phred-offset PHRED_OFFSET] | ||
[--compress-level COMPRESS_LEVEL] [--log-file LOG_FILE] | ||
[--chunk-size CHUNK_SIZE] [--threads THREADS] [--temp-dir TEMP_DIR] | ||
in.fastx[.gz] out.fastx[.gz] | ||
``` | ||
Lorem ipsum dolor sit, amet consectetur, adipisicing elit. Veritatis eaque modi ipsam sit laudantium consequatur accusamus voluptatibus aut suscipit! Autem iste minima laborum, quam magni doloribus consequatur eligendi asperiores sed! | ||
#### Match flags with regular expressions | ||
```bash | ||
usage: fbarber flag regex [-h] [--pattern PATTERN [PATTERN ...]] [--version] | ||
[--unmatched-output UNMATCHED_OUTPUT] [--flag-delim FLAG_DELIM] | ||
[--comment-space COMMENT_SPACE] [--compress-level COMPRESS_LEVEL] | ||
[--log-file LOG_FILE] [--chunk-size CHUNK_SIZE] [--threads THREADS] | ||
[--temp-dir TEMP_DIR] | ||
in.fastx[.gz] out.fastx[.gz] | ||
``` | ||
Lorem ipsum dolor sit, amet consectetur, adipisicing elit. Veritatis eaque modi ipsam sit laudantium consequatur accusamus voluptatibus aut suscipit! Autem iste minima laborum, quam magni doloribus consequatur eligendi asperiores sed! | ||
#### Split by flag value | ||
```bash | ||
usage: fbarber flag split [-h] [--version] [--flag-delim FLAG_DELIM] [--comment-space COMMENT_SPACE] | ||
[--split-by SPLIT_BY] [--compress-level COMPRESS_LEVEL] [--log-file LOG_FILE] | ||
[--chunk-size CHUNK_SIZE] [--threads THREADS] [--temp-dir TEMP_DIR] | ||
in.fastx[.gz] out.fastx[.gz] | ||
``` | ||
Lorem ipsum dolor sit, amet consectetur, adipisicing elit. Veritatis eaque modi ipsam sit laudantium consequatur accusamus voluptatibus aut suscipit! Autem iste minima laborum, quam magni doloribus consequatur eligendi asperiores sed! | ||
#### Calculate flag value frequency | ||
```bash | ||
usage: fbarber flag stats [-h] [--version] [--flag-delim FLAG_DELIM] [--comment-space COMMENT_SPACE] | ||
[--flagstats FLAGSTATS [FLAGSTATS ...]] [--compress-level COMPRESS_LEVEL] | ||
[--log-file LOG_FILE] [--chunk-size CHUNK_SIZE] [--threads THREADS] | ||
[--temp-dir TEMP_DIR] | ||
in.fastx[.gz] | ||
``` | ||
Lorem ipsum dolor sit, amet consectetur, adipisicing elit. Veritatis eaque modi ipsam sit laudantium consequatur accusamus voluptatibus aut suscipit! Autem iste minima laborum, quam magni doloribus consequatur eligendi asperiores sed! |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.