Skip to content
This repository has been archived by the owner on Mar 7, 2023. It is now read-only.

Commit

Permalink
v0.1.0 (#29)
Browse files Browse the repository at this point in the history
* Added zenodo badge.

* Preparing for v0.1.0

* Fixed flake8 warnings.

* Fixed typo.

* Removed dead code (by vulture)

* Fixed black and added CodeQL CD (#23)

* Create codeql-analysis.yml

* Update README.md

* Fixed black issues.

* Compiling pattern to regex after process submission

* Fixed typo.

* Moved common to separate sub-sub-module

* Now supporting rich.

* Fixed quality filter logging.

* Updated final log.

* Updated requirements.

* Updated with poetry details and switch flake8 with black.

* Updated GitHub Actions format check workflow.

* Fixing workflow.

* Simplified.

* Improved rich log.

* Implemented --flagstats option

* Clarified help page

* Fixed flagstat sorting.

* Added --split-by option

* Removed common submodule.

* Implemented trimming by length or by quality.

* Added emoticons at final message.

* Setting up flag filter script.

* Implemented flag_filter script.

* Implemented flag subcommands.

* Added logs when required flag is missing.

* Prepared for v0.1.0

* Added docs folder.

* Set theme jekyll-theme-cayman

* Replicated basic README

* Added usage page.

* Added links and shortened feature list on GitHub repo README

* Added filler text to showcase link bookmarks

* Fixed link

* Added minimum usage bash

* Added TOCs

* Moved zenodo badge

* Added more badges.

* Updated badges.

* Updated badges.

* Reordered badges.

* Fixed mypy stub warning for rich and added compression level support.

* Simplified compression level passage.
  • Loading branch information
ggirelli authored Sep 29, 2020
1 parent 1c3e1b7 commit 2bb42e3
Show file tree
Hide file tree
Showing 34 changed files with 2,731 additions and 672 deletions.
3 changes: 2 additions & 1 deletion .github/workflows/pythonpackage.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,8 @@ jobs:
- uses: dschep/install-poetry-action@v1.3
- name: Install package with poetry
run: poetry install
- uses: lgeiger/black-action@v1.0.1
- name: "Running black"
uses: lgeiger/black-action@v1.0.1
with:
args: ". --check"
- name: Lint with flake8
Expand Down
23 changes: 23 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,30 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.

## [Unreleased]

## [0.1.0]
### Added
- `--split-by` option to split output by flag during flag extraction.
- `flag_filter` script to apply quality filters after flag extraction.
- `trim length` to trim by length.
- `trim quality` to trim by quality.
- `flag split` to split file based on flag after flag extraction.
- `flag stats` to calculate flag stats after flag extraction.
- `flag regex` to filter flags based on regular expression after flag extraction.

### Changed
- Using rich for richer logging.
- Removed default pattern. Switched with example pattern in help page.
- Moved `trim` by pattern as command of `trim regex`.
- Moved `extract` command as sub-command of `flag`.

### Fixed
- Parallelization now working on Python 3.6+.
- Output compression now dependent only on output file extension.
- Logging proper number of reads passing flag quality filters.


## [0.0.1] - 2020-08-03

[Unreleased]: https://github.com/ggirelli/fastx-barber
[0.1.0]: https://github.com/ggirelli/fastx-barber/releases/tag/v0.1.0
[0.0.1]: https://github.com/ggirelli/fastx-barber/releases/tag/v0.0.1
8 changes: 5 additions & 3 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,12 +28,14 @@ If you would like to see a new feature implemented in `fastx-barber`, or to have

# Style your contributions

We like to have `fastx-barber` code styled with [`black`](https://github.com/psf/black) and checked with `mypy`. `mypy` and `flake8` conforming checks are automatically ran on all pull requests through GitHub Actions.
We like to have `fastx-barber` code styled with [`black`](https://github.com/psf/black) and checked with `mypy`. `mypy`, `flake8`, and `black` conforming checks are automatically ran on all pull requests through GitHub Actions.

# Change dependencies
# Changing dependencies

If your code changes `fastx-barber` dependencies, we recommend changing them in the `pyproject.toml` file and then regenerate `requirements.txt` by running:
If your code changes `fastx-barber` dependencies, we recommend to change them in the `pyproject.toml` file and then regenerate `requirements.txt` by running:

```
poetry export -f requirements.txt -o requirements.txt --without-hashes
```

See [poetry](https://github.com/python-poetry/poetry)'s documentation for more details on the format of the `pyproject.toml` file.
41 changes: 22 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,28 @@
# fastx-barber

A Python3.6+ package to trim and extract flags from FASTA and FASTQ files.

## Features

* Supports both FASTA and FASTQ files.
* Select your reads based on a pattern (regular expression).
* Trim your reads based on a pattern (regular expression).
* Extract parts (flags) of reads based on a pattern and store them in the read headers.
- Extract the corresponding portions of the quality string too (only for fastq files).
* All patterns use the `regex` Python package to support [*fuzzy* matching](https://pypi.org/project/regex/#approximate-fuzzy-matching-hg-issue-12-hg-issue-41-hg-issue-109).
- Using fuzzy matching might affect the barber's speed).
* Export reads that do not match the provided pattern.
* Parallelized processing by splitting the fastx file in chunks.
* Filter reads based on quality score of extracted flags.
- Supports Sanger QSCORE definition (not old Solexa/Illumina one).
- Allows to specify different PHRED offsets.
[![DOI](https://zenodo.org/badge/281703558.svg)](https://zenodo.org/badge/latestdoi/281703558) ![](https://img.shields.io/librariesio/github/ggirelli/fastx-barber.svg?style=flat) ![](https://img.shields.io/github/license/ggirelli/fastx-barber.svg?style=flat)
![](https://github.com/ggirelli/fastx-barber/workflows/Python%20package/badge.svg?branch=master&event=push) ![PyPI - Python Version](https://img.shields.io/pypi/pyversions/fastx-barber) ![PyPI - Format](https://img.shields.io/pypi/format/fastx-barber) ![PyPI - Status](https://img.shields.io/pypi/status/fastx-barber)
![](https://img.shields.io/github/release/ggirelli/fastx-barber.svg?style=flat) ![](https://img.shields.io/github/release-date/ggirelli/fastx-barber.svg?style=flat) ![](https://img.shields.io/github/languages/code-size/ggirelli/fastx-barber.svg?style=flat)
![](https://img.shields.io/github/watchers/ggirelli/fastx-barber.svg?label=Watch&style=social) ![](https://img.shields.io/github/stars/ggirelli/fastx-barber.svg?style=social)

[PyPi](https://pypi.org/project/fastx-barber/) | [docs](https://ggirelli.github.io/fastx-barber/)

A Python3.6.1+ package to trim and extract flags from FASTA and FASTQ files.

## Features (in short)

* Works on both FASTA and FASTQ files.
* Selects reads based on a pattern (regex).
* Trims reads by pattern (regex), length, or single-base quality.
* Extracts parts (flags) of reads based on a pattern and stores them in the read headers.
* Regular expression support [*fuzzy* matching](https://pypi.org/project/regex/#approximate-fuzzy-matching-hg-issue-12-hg-issue-41-hg-issue-109) (*fuzzy matching* might affect the barber's speed).
* Parallelizes processing by splitting the fastx file in chunks.

For more available features, check out our [docs](https://ggirelli.github.io/fastx-barber/)!

## Requirements

`fastx-barber` has been tested with Python 3.6, 3.7, and 3.8. We recommend installing it using `pipx` (see [below](https://github.com/ggirelli/fastx-barber#install)) to avoid dependency conflicts with other packages. The packages it depends on are listed in our [dependency graph](https://github.com/ggirelli/fastx-barber/network/dependencies). We use [`poetry`](https://github.com/python-poetry/poetry) to handle our dependencies.
`fastx-barber` has been tested with Python 3.6.1, 3.7, and 3.8. We recommend installing it using `pipx` (see [below](https://github.com/ggirelli/fastx-barber#install)) to avoid dependency conflicts with other packages. The packages it depends on are listed in our [dependency graph](https://github.com/ggirelli/fastx-barber/network/dependencies). We use [`poetry`](https://github.com/python-poetry/poetry) to handle our dependencies.

## Install

Expand All @@ -32,9 +35,9 @@ Once you have `pipx` ready on your system, install the latest stable release of
Run:

* `fbarber` to access the barber's services.
* `fbarber trim` to trim your reads.
* `fbarber flag` to extract or manipulate read flags.
* `fbarber match` to select reads based on a pattern (regular expression).
* `fbarber extract` to extract parts of a read and store them in the read name, and then trim it.
* `fbarber trim` to trim your reads.

Add `-h` to see the full help page of a command!

Expand Down
4 changes: 4 additions & 0 deletions docs/_config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
theme: jekyll-theme-cayman
title: "fastx-barber"
description: "A package to trim and extract flags from FASTA and FASTQ files."
google_analytics: "UA-99031010-8"
65 changes: 65 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
<!-- MarkdownTOC -->

- [Requirements](#requirements)
- [Install](#install)
- [Features](#features)
- [Usage](#usage)
- [Contributing](#contributing)
- [License](#license)

<!-- /MarkdownTOC -->

---

[![DOI](https://zenodo.org/badge/281703558.svg)](https://zenodo.org/badge/latestdoi/281703558) ![](https://img.shields.io/librariesio/github/ggirelli/fastx-barber.svg?style=flat) ![](https://img.shields.io/github/license/ggirelli/fastx-barber.svg?style=flat)
![](https://github.com/ggirelli/fastx-barber/workflows/Python%20package/badge.svg?branch=master&event=push) ![PyPI - Python Version](https://img.shields.io/pypi/pyversions/fastx-barber) ![PyPI - Format](https://img.shields.io/pypi/format/fastx-barber) ![PyPI - Status](https://img.shields.io/pypi/status/fastx-barber)
![](https://img.shields.io/github/release/ggirelli/fastx-barber.svg?style=flat) ![](https://img.shields.io/github/release-date/ggirelli/fastx-barber.svg?style=flat) ![](https://img.shields.io/github/languages/code-size/ggirelli/fastx-barber.svg?style=flat)
![](https://img.shields.io/github/watchers/ggirelli/fastx-barber.svg?label=Watch&style=social) ![](https://img.shields.io/github/stars/ggirelli/fastx-barber.svg?style=social)

A Python3.6.1+ package to trim and extract flags from FASTA and FASTQ files.

## Requirements

`fastx-barber` has been tested with Python 3.6.1, 3.7, and 3.8. We recommend installing it using `pipx` (see [below](#install)) to avoid dependency conflicts with other packages. The packages it depends on are listed in our [dependency graph](https://github.com/ggirelli/fastx-barber/network/dependencies). We use [`poetry`](https://github.com/python-poetry/poetry) to handle our dependencies.

## Install

We recommend installing `fastx-barber` using [`pipx`](https://github.com/pipxproject/pipx). Check how to install `pipx` [here](https://github.com/pipxproject/pipx#install-pipx) if you don't have it yet! Once you have `pipx` ready on your system, install the latest stable release of `fastx-barber` by running: `pipx install fastx-barber`. If you see the stars (✨ 🌟 ✨), then the installation went well!

## Features

* Works on both FASTA and FASTQ files.
* [Selects](usage#match) reads based on a pattern (regex).
* [Trims](usage#trim) reads [by pattern](usage#trim-by-regular-expression) (regex), [length](usage#trim-by-length), or single-base [quality](usage#trim-by-quality).
* [Extracts](usage#extract-flags) parts ([flags](usage#flags)) of reads based on a pattern, and stores them in the read headers.
- Optionally extracts the corresponding portions of the quality string (only for fastq files).
- Optionally filters based on quality score of extracted flags (only for fastq files).
+ Supports Sanger QSCORE definition (not old Solexa/Illumina one).
+ Supports custom PHRED offset.
+ Optionally exports reads that do not pass the specified filters.
- Optionally split output based on flag value.
- Optionally calculates the frequency of each value of a set of flags (flagstats).
- [Filtering by flag quality](usage#filter-by-flag-quality), [splitting by flag value](usage#split-by-flag-value), and [calculating flag value frequency](usage#calculate-flag-value-frequency) are all features available also as separate scripts. This allows to perform these operations on files with previously extracted flags.
* [Filters a FASTX file with extracted flags by applying patterns to different flags](usage#match-flags-with-regular-expressions).
* Regular expression support [*fuzzy* matching](https://pypi.org/project/regex/#approximate-fuzzy-matching-hg-issue-12-hg-issue-41-hg-issue-109) (*fuzzy matching* might affect the barber's speed).
* Optionally exports reads that do not match the provided pattern(s).
* Parallelizes processing by splitting the fastx file in chunks.

## Usage

Run:

* `fbarber` to access the barber's services.
* `fbarber flag` to extract or manipulate read flags.
* `fbarber match` to select reads based on a pattern (regular expression).
* `fbarber trim` to trim your reads.

Add `-h` to see the full help page of a command or visit the [usage page](usage)!

## Contributing

We welcome any contributions to `fastx-barber`. In short, we use [`black`](https://github.com/psf/black) to standardize code format. Any code change also needs to pass `mypy` checks. For more details, please refer to our [contribution guidelines](https://github.com/ggirelli/fastx-barber/blob/master/CONTRIBUTING.md) if this is your first time contributing! Also, check out our [code of conduct](https://github.com/ggirelli/fastx-barber/blob/master/CODE_OF_CONDUCT.md).

## License

`MIT License - Copyright (c) 2020 Gabriele Girelli`
140 changes: 140 additions & 0 deletions docs/usage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
# Usage

<!-- MarkdownTOC -->

- [Match](#match)
- [Trim](#trim)
- [Trim by length](#trim-by-length)
- [Trim by quality](#trim-by-quality)
- [Trim by regular expression](#trim-by-regular-expression)
- [Flags](#flags)
- [Extract flags](#extract-flags)
- [After flag extraction](#after-flag-extraction)
- [Filter by flag quality](#filter-by-flag-quality)
- [Match flags with regular expressions](#match-flags-with-regular-expressions)
- [Split by flag value](#split-by-flag-value)
- [Calculate flag value frequency](#calculate-flag-value-frequency)

<!-- /MarkdownTOC -->

## Match

```bash
usage: fbarber match [-h] [--pattern PATTERN] [--version] [--unmatched-output UNMATCHED_OUTPUT]
[--compress-level COMPRESS_LEVEL] [--log-file LOG_FILE] [--chunk-size CHUNK_SIZE]
[--threads THREADS] [--temp-dir TEMP_DIR]
in.fastx[.gz] out.fastx[.gz]
```

Lorem ipsum dolor sit amet, consectetur adipisicing, elit. Id ab, quod repellendus, autem obcaecati illo alias, ipsam vel asperiores iure dicta voluptatem nostrum suscipit, doloremque dolores tenetur omnis recusandae repudiandae!

## Trim

Lorem ipsum dolor sit, amet consectetur, adipisicing elit. Veritatis eaque modi ipsam sit laudantium consequatur accusamus voluptatibus aut suscipit! Autem iste minima laborum, quam magni doloribus consequatur eligendi asperiores sed!

### Trim by length

```bash
usage: fbarber trim length [-h] [-l LENGTH] [-s {3,5}] [--version] [--compress-level COMPRESS_LEVEL]
[--log-file LOG_FILE] [--chunk-size CHUNK_SIZE] [--threads THREADS]
[--temp-dir TEMP_DIR]
in.fastx[.gz] out.fastx[.gz]
```

Lorem ipsum dolor sit, amet consectetur, adipisicing elit. Veritatis eaque modi ipsam sit laudantium consequatur accusamus voluptatibus aut suscipit! Autem iste minima laborum, quam magni doloribus consequatur eligendi asperiores sed!

### Trim by quality

```bash
usage: fbarber trim quality [-h] [-q QSCORE] [-s {3,5}] [--version] [--phred-offset PHRED_OFFSET]
[--compress-level COMPRESS_LEVEL] [--log-file LOG_FILE]
[--chunk-size CHUNK_SIZE] [--threads THREADS] [--temp-dir TEMP_DIR]
in.fastq[.gz] out.fastq[.gz]
```

Lorem ipsum dolor sit, amet consectetur, adipisicing elit. Veritatis eaque modi ipsam sit laudantium consequatur accusamus voluptatibus aut suscipit! Autem iste minima laborum, quam magni doloribus consequatur eligendi asperiores sed!

### Trim by regular expression

```bash
usage: fbarber trim regex [-h] [--pattern PATTERN] [--version] [--unmatched-output UNMATCHED_OUTPUT]
[--compress-level COMPRESS_LEVEL] [--log-file LOG_FILE]
[--chunk-size CHUNK_SIZE] [--threads THREADS] [--temp-dir TEMP_DIR]
in.fastx[.gz] out.fastx[.gz]
```

Lorem ipsum dolor sit, amet consectetur, adipisicing elit. Veritatis eaque modi ipsam sit laudantium consequatur accusamus voluptatibus aut suscipit! Autem iste minima laborum, quam magni doloribus consequatur eligendi asperiores sed!

## Flags

Lorem ipsum dolor sit amet, consectetur, adipisicing elit. Voluptate consectetur adipisci maxime ducimus voluptatem vero illo recusandae accusamus dolores rerum nemo similique vel amet, quidem possimus eligendi veniam quae officia.

### Extract flags

```bash
usage: fbarber flag extract [-h] [--pattern PATTERN] [--version] [--unmatched-output UNMATCHED_OUTPUT]
[--flag-delim FLAG_DELIM]
[--selected-flags SELECTED_FLAGS [SELECTED_FLAGS ...]]
[--flagstats FLAGSTATS [FLAGSTATS ...]] [--split-by SPLIT_BY]
[--filter-qual-flags FILTER_QUAL_FLAGS [FILTER_QUAL_FLAGS ...]]
[--filter-qual-output FILTER_QUAL_OUTPUT] [--phred-offset PHRED_OFFSET]
[--no-qual-flags] [--comment-space COMMENT_SPACE]
[--compress-level COMPRESS_LEVEL] [--log-file LOG_FILE]
[--chunk-size CHUNK_SIZE] [--threads THREADS] [--temp-dir TEMP_DIR]
in.fastx[.gz] out.fastx[.gz]
```
Lorem ipsum dolor sit, amet consectetur, adipisicing elit. Veritatis eaque modi ipsam sit laudantium consequatur accusamus voluptatibus aut suscipit! Autem iste minima laborum, quam magni doloribus consequatur eligendi asperiores sed!
### After flag extraction
Lorem ipsum dolor sit, amet consectetur, adipisicing elit. Veritatis eaque modi ipsam sit laudantium consequatur accusamus voluptatibus aut suscipit! Autem iste minima laborum, quam magni doloribus consequatur eligendi asperiores sed!
#### Filter by flag quality
```bash
usage: fbarber flag filter [-h] [--version] [--flag-delim FLAG_DELIM] [--comment-space COMMENT_SPACE]
[--filter-qual-flags FILTER_QUAL_FLAGS [FILTER_QUAL_FLAGS ...]]
[--filter-qual-output FILTER_QUAL_OUTPUT] [--phred-offset PHRED_OFFSET]
[--compress-level COMPRESS_LEVEL] [--log-file LOG_FILE]
[--chunk-size CHUNK_SIZE] [--threads THREADS] [--temp-dir TEMP_DIR]
in.fastx[.gz] out.fastx[.gz]
```
Lorem ipsum dolor sit, amet consectetur, adipisicing elit. Veritatis eaque modi ipsam sit laudantium consequatur accusamus voluptatibus aut suscipit! Autem iste minima laborum, quam magni doloribus consequatur eligendi asperiores sed!
#### Match flags with regular expressions
```bash
usage: fbarber flag regex [-h] [--pattern PATTERN [PATTERN ...]] [--version]
[--unmatched-output UNMATCHED_OUTPUT] [--flag-delim FLAG_DELIM]
[--comment-space COMMENT_SPACE] [--compress-level COMPRESS_LEVEL]
[--log-file LOG_FILE] [--chunk-size CHUNK_SIZE] [--threads THREADS]
[--temp-dir TEMP_DIR]
in.fastx[.gz] out.fastx[.gz]
```
Lorem ipsum dolor sit, amet consectetur, adipisicing elit. Veritatis eaque modi ipsam sit laudantium consequatur accusamus voluptatibus aut suscipit! Autem iste minima laborum, quam magni doloribus consequatur eligendi asperiores sed!
#### Split by flag value
```bash
usage: fbarber flag split [-h] [--version] [--flag-delim FLAG_DELIM] [--comment-space COMMENT_SPACE]
[--split-by SPLIT_BY] [--compress-level COMPRESS_LEVEL] [--log-file LOG_FILE]
[--chunk-size CHUNK_SIZE] [--threads THREADS] [--temp-dir TEMP_DIR]
in.fastx[.gz] out.fastx[.gz]
```
Lorem ipsum dolor sit, amet consectetur, adipisicing elit. Veritatis eaque modi ipsam sit laudantium consequatur accusamus voluptatibus aut suscipit! Autem iste minima laborum, quam magni doloribus consequatur eligendi asperiores sed!
#### Calculate flag value frequency
```bash
usage: fbarber flag stats [-h] [--version] [--flag-delim FLAG_DELIM] [--comment-space COMMENT_SPACE]
[--flagstats FLAGSTATS [FLAGSTATS ...]] [--compress-level COMPRESS_LEVEL]
[--log-file LOG_FILE] [--chunk-size CHUNK_SIZE] [--threads THREADS]
[--temp-dir TEMP_DIR]
in.fastx[.gz]
```
Lorem ipsum dolor sit, amet consectetur, adipisicing elit. Veritatis eaque modi ipsam sit laudantium consequatur accusamus voluptatibus aut suscipit! Autem iste minima laborum, quam magni doloribus consequatur eligendi asperiores sed!
3 changes: 2 additions & 1 deletion fastx_barber/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,15 @@

from fastx_barber.const import __version__
from fastx_barber import const, scripts
from fastx_barber import io, seqio
from fastx_barber import io, scriptio, seqio
from fastx_barber import flag, match, qual, trim

__all__ = [
"__version__",
"const",
"scripts",
"io",
"scriptio",
"seqio",
"flag",
"match",
Expand Down
Loading

0 comments on commit 2bb42e3

Please sign in to comment.