Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Olgabot/split kmer #29

Open
wants to merge 143 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
143 commits
Select commit Hold shift + click to select a range
af7b902
Don't lint the pipeline for now
olgabot Apr 17, 2019
50e6739
PRs to master don't need to be from dev branch
olgabot Apr 17, 2019
2d00479
Don't change to 'test' directory
olgabot Apr 17, 2019
92a6f47
Change back to original direcotry
olgabot Apr 17, 2019
5bb2e52
remove sra id test for now
olgabot Apr 17, 2019
a29f6be
Add encrypted ncbi api key
olgabot Apr 17, 2019
9bc3407
un-comment sra id for test
olgabot Apr 17, 2019
56cd63c
use small sra id
olgabot Apr 17, 2019
190d49f
Don't compute DNA if not specified
olgabot Jul 1, 2019
ed6e87c
Fix not_dna flag
olgabot Jul 1, 2019
46843fc
Update dockerfile to use olgabot/dayhoff branch"
olgabot Jul 1, 2019
e4ab6b9
Don't install sourmash via conda for now because of version conflicts
olgabot Jul 1, 2019
a688c6b
Use olgabot/dayhoff tagged docker image
olgabot Jul 1, 2019
d1fe3e5
Use dash instead of slash in container name
olgabot Jul 1, 2019
8faa0bc
Use olgabot/dayhoff tagged docker image
olgabot Jul 1, 2019
c4739cd
Use olgabot-dayhoff version
olgabot Jul 4, 2019
589ce92
Don't specify container in main.nf
olgabot Jul 4, 2019
c4574be
Put container name in nextflow.config
olgabot Jul 4, 2019
7f895ad
Use Olga's dayhoff branch on sourmash to build
olgabot Jul 5, 2019
bf45db7
Add changelog
olgabot Jul 5, 2019
035c660
Merge pull request #26 from czbiohub/olgabot/docker-sourmash-dayhoff
olgabot Jul 5, 2019
751dfa4
Merge branch 'dev' into olgabot/fix-tests
olgabot Jul 5, 2019
16f5e8c
Merge pull request #9 from czbiohub/olgabot/fix-tests
olgabot Jul 5, 2019
c394f78
Add summary section and single ended reads
olgabot Jul 1, 2019
80d3f31
Use defulat optoins
olgabot Jul 5, 2019
045dc49
Add test configuration
olgabot Jul 1, 2019
10e3b1b
Add more escaped slashes
olgabot Jul 1, 2019
e22296b
Add docs and changelog to make 'nf-core lint' happy
olgabot Jul 3, 2019
48f7b26
demux --> nf-kmer-similarity
olgabot Jul 3, 2019
6419235
Add outputs for execution reports
olgabot Jul 3, 2019
ee85dff
Add date from initial commit
olgabot Jul 3, 2019
55054c7
Don't lint for now
olgabot Jul 3, 2019
eec23be
Add .github folder
olgabot Jul 3, 2019
342294c
Rename to nf-kmer-similarity
olgabot Jul 3, 2019
7e0b63a
Allow PRs to master from anywhere
olgabot Jul 3, 2019
0fd88d3
Fix sample input processing
olgabot Jul 5, 2019
902cb10
Add comment about read_paths
olgabot Jul 3, 2019
02760aa
Add a bunch of documentatin
olgabot Jul 3, 2019
3b1ca28
Add default value for log2_sketch_size"
olgabot Jul 3, 2019
b629b89
Use dev tag on docker image (Makefile)
olgabot Jul 5, 2019
d38a6a0
Move timeline to later
olgabot Jul 5, 2019
fd76e43
Use reorganizedd inputs
olgabot Jul 5, 2019
2859209
Re-add PRs to master only from dev restriction
olgabot Jul 4, 2019
268f3cd
No docker push for now
olgabot Jul 5, 2019
a1e766e
Actually let's keep docker_push for now
olgabot Jul 4, 2019
df4b424
Add workflow.onComplete from rnaseq
olgabot Jul 4, 2019
1b6bcd7
Add code of conduct
olgabot Jul 4, 2019
f4b3eee
reduce test size
olgabot Jul 4, 2019
87142a9
fix test data reading
olgabot Jul 4, 2019
065fd28
Don't use dayhoff encoding yet
olgabot Jul 4, 2019
e1da42d
Fix default parameters'
olgabot Jul 5, 2019
71c7cec
Remove samples_*
olgabot Jul 5, 2019
d9eedc7
No reference genome for this summary
olgabot Jul 5, 2019
eeac91d
Remove makeHISATIndex
olgabot Jul 4, 2019
47dd5bb
Add scrape_software_versions
olgabot Jul 4, 2019
6934a41
Remove biohub-specific aws configuration and add manifest
olgabot Jul 5, 2019
d0eeb5c
Ignore pycache
olgabot Jul 5, 2019
1abe9f2
Add assets
olgabot Jul 4, 2019
f6d4fe9
Remove read_paths_singles line"
olgabot Jul 5, 2019
2682612
Add molecule, kmer size, log2 sketch size to summary
olgabot Jul 5, 2019
45eb47c
Add one_signature_per_record to Summary
olgabot Jul 5, 2019
344412a
Remove rnaseq-specific checks in workflow.onComplete
olgabot Jul 5, 2019
3b17c09
Use dev tag on docker image
olgabot Jul 5, 2019
8d6faf6
Use dash instead of slash in container name
olgabot Jul 1, 2019
edfb9d6
Remove trace dir stuff from beginning of nextflow.config
olgabot Jul 9, 2019
31ee9f1
Print name of file when reporting it isn't found
olgabot Jul 9, 2019
e0e6270
Use 8GB for low memory
olgabot Jul 9, 2019
df36fd7
Use base config labels for memory resources
olgabot Jul 9, 2019
a5a5022
Use awsbatch and test configs
olgabot Jul 15, 2019
47f8398
Remove git cruft
olgabot Jul 15, 2019
46ae6e5
Use more traditional nf-core dockerfiel
olgabot Jul 15, 2019
9bc14ed
Double escape sourmash compute commands
olgabot Jul 15, 2019
13ed6b6
Actually attach custom configs
olgabot Jul 17, 2019
955fb45
Actually activate conda environment
olgabot Jul 17, 2019
204cd05
Add splitKmer field and start writing split kmer analysis
olgabot Jul 17, 2019
07349ce
Add splitKmer test configuration
olgabot Jul 17, 2019
598c182
Output ska splitkmer vs sourmash sketches to separate directories
olgabot Jul 17, 2019
41032c1
Wrap compare into splitKmer if/else
olgabot Jul 17, 2019
e1968a8
Use separate channels for ska vs sourmash sketches
olgabot Jul 17, 2019
1a64c42
Docker still not building :(
olgabot Jul 18, 2019
6095a3c
Remove } and don't have docker enabled by default
olgabot Jul 18, 2019
055215a
Get .travis.yml from olgabot/dayhoff
olgabot Jul 18, 2019
837b738
Use nf-core dockerfile
olgabot Jul 17, 2019
9e17b66
remove manual sourmash installs
olgabot Jul 18, 2019
adee8cd
Use apt to install gcc and friends
olgabot Jul 22, 2019
6c169ea
Tests finally work!!!
olgabot Jul 24, 2019
832e945
with compare sketches
phoenixAja Jul 26, 2019
ee63906
better outdir naming
phoenixAja Jul 30, 2019
2688ee9
no pipefail
olgabot Aug 1, 2019
86e5070
Comment out process shell
olgabot Aug 1, 2019
d2191dc
Merge branch 'dev' into olgabot/dayhoff
olgabot Aug 1, 2019
6bd3594
split kmer with different ksizes
phoenixAja Aug 2, 2019
e97e72a
fixed comparisons step for multiple ksizes
phoenixAja Aug 4, 2019
a8ed8f1
changed publishDir path in comparisons to be conistant with sketch pu…
phoenixAja Aug 4, 2019
3de12e9
multiple ska output files included
phoenixAja Aug 5, 2019
3274ea0
added truncate process if fastqs are too large
phoenixAja Aug 6, 2019
55ac8a6
Merge pull request #1 from czbiohub/olgabot/dayhoff
olgabot Aug 23, 2019
a574c35
Increase max time to 16h
olgabot Aug 26, 2019
7aa217a
added docker run options
phoenixAja Sep 10, 2019
ea758fd
changed truncate step to use seqtk
phoenixAja Sep 10, 2019
9d74582
added params.truncate to info
phoenixAja Sep 10, 2019
992d547
readme and changelog updates
phoenixAja Sep 10, 2019
ba75d41
Dockerfile and environment.yml for ska portion of pipeline
phoenixAja Sep 10, 2019
bce840e
remove scratch files to avoid confusion
olgabot Sep 15, 2019
8c1da07
Build Docker image with environment
olgabot Sep 18, 2019
4f5f60c
Remove docker enabled and set default kmer sizes
olgabot Sep 19, 2019
b2e2224
Rename truncate --> subsample
olgabot Sep 19, 2019
373a742
Don't allow for 'protein' molecule specified when --splitKmer is set
olgabot Sep 19, 2019
a1d70b2
Don't set default ksizes in nextflow config as it changes when splitk…
olgabot Sep 19, 2019
5069e2c
Add default value for subsampling the fastq files
olgabot Sep 19, 2019
13a2103
Don't specify container in main.nf
olgabot Sep 19, 2019
27f6df4
Remove commented code
olgabot Sep 19, 2019
a1df8e4
Remove commented test-y stuff
olgabot Sep 19, 2019
b0fe3e6
Indent split kmer sketching
olgabot Sep 19, 2019
13c83ea
Update readme with lots of info .. maybe should be in usage/docs.md ?
olgabot Sep 19, 2019
276e122
Add note about divisibility by 3
olgabot Oct 8, 2019
03316d9
Add bash type and remove seqtk subsampling from first example
olgabot Oct 8, 2019
bad3510
Add a bunch more explanation about split kmers
olgabot Oct 8, 2019
d0314a4
Add separate ska test configuration
olgabot Oct 8, 2019
c3f07b0
Add note about PRs to dev
olgabot Oct 8, 2019
5936a60
Use nfcore/kmermaid container
olgabot Oct 8, 2019
e2112e8
Add travis matrix to separate linting, etc
olgabot Oct 8, 2019
d872b9a
Remove 'env' section since this is now covered by the matrix
olgabot Oct 8, 2019
7476637
Add if statement to make exit code 0 to assert --splitKmer + molecule…
olgabot Oct 8, 2019
41b9e1c
Use --user to install with pip
olgabot Oct 8, 2019
e0de975
Add markdownlint
olgabot Oct 8, 2019
8f99b31
Add test_ska config to git
olgabot Oct 8, 2019
c8e3564
Don't put if statement logic into FLAGS variableg
olgabot Oct 8, 2019
cf58ca1
Add PR to dev on checklist
olgabot Oct 8, 2019
0ec026e
'script' not under a dash
olgabot Oct 8, 2019
952c46e
Can't use 'user' in this virtualenv??
olgabot Oct 8, 2019
c26c952
Organize output directories
olgabot Oct 8, 2019
61892ef
Fix markdownlint
olgabot Oct 8, 2019
a0af972
Separate out ska and sourmash outputs
olgabot Oct 8, 2019
c72ef56
Add ska to environment.yml'
olgabot Oct 20, 2019
be04112
Update changelog with ska, seqtk
olgabot Oct 20, 2019
f03dd69
Merge pull request #11 from nf-core/olgabot/add-ska-to-container
olgabot Oct 20, 2019
622116d
Merge branch 'dev' into olgabot/split-kmer
olgabot Oct 20, 2019
7767621
Add ska subsampling to test suite
olgabot Oct 20, 2019
9d77643
Markdownlint fixes
olgabot Oct 20, 2019
54331bd
Whitespace and capitalize
olgabot Oct 20, 2019
94e4ad7
Add java version back to minimum nextflow version
olgabot Oct 20, 2019
fbc2556
Remove nf-core/tools install from non-linting stuff
olgabot Oct 20, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
Many thanks to contributing to nf-core/kmer-similarity!
Many thanks to contributing to nf-core/kmermaid!

To ensure that your build passes, please make sure your pull request is to the `dev` branch rather than to `master`. Thank you!

Please fill in the appropriate checklist below (delete whatever is not relevant). These are the most common things requested on pull requests (PRs).

## PR checklist
- [ ] PR is to `dev` rather than `master`
- [ ] This comment contains a description of changes (with reason)
- [ ] If you've fixed a bug or added code that should be tested, add tests!
- [ ] If necessary, also make a PR on the [nf-core/kmer-similarity branch on the nf-core/test-datasets repo]( https://github.com/nf-core/test-datasets/pull/new/nf-core/kmer-similarity)
Expand Down
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
work/
results
my-results
__pycache__

# Nextflow outputs
timeline.html*
Expand Down Expand Up @@ -89,4 +90,4 @@ atlassian-ide-plugin.xml
com_crashlytics_export_strings.xml
crashlytics.properties
crashlytics-build.properties
fabric.properties
fabric.properties
43 changes: 31 additions & 12 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,32 +6,51 @@ python: '3.6'
cache: pip
matrix:
fast_finish: true
include:
- name: "Minimum Nextflow version, regular test suite"
env: NXF_VER='0.32.0' SUITE=test FLAGS=
language: java
jdk: openjdk8
- name: "Latest Nextflow version, regular test suite"
env: NXF_VER='' SUITE=test FLAGS=
- name: "Latest Nextflow version, regular test suite with splitKmers, ensure that `protein` can't be specified"
# Check exit code to make sure it is nonzero for --splitKmer + --molecules protein
script:
- nextflow run ${TRAVIS_BUILD_DIR} -profile test,docker --splitKmer ; if [ $? -eq 0 ]; then echo "--splitKmer + --molecules protein should fail but did not" && exit 1 ; else echo "Correctly failed --splitKmer + --molecules protein" ; fi
- name: "Latest Nextflow version, split k-mer test suite"
env: NXF_VER='' SUITE=test_ska FLAGS=
- name: "Latest Nextflow version, split k-mer test suite, test subsampling"
env: NXF_VER='' SUITE=test_ska FLAGS=--subsample 10
- name: "Lint the pipeline code"
install:
# Install nf-core/tools
- pip install --upgrade pip
- pip install nf-core
script: nf-core lint ${TRAVIS_BUILD_DIR}
python: '3.6'
jdk: openjdk8
- name: "Lint the documentation"
script: markdownlint ${TRAVIS_BUILD_DIR} -c ${TRAVIS_BUILD_DIR}/.github/markdownlint.yml
python: '3.6'

before_install:
# PRs to master are only ok if coming from dev branch
- '[ $TRAVIS_PULL_REQUEST = "false" ] || [ $TRAVIS_BRANCH != "master" ] || ([ $TRAVIS_PULL_REQUEST_SLUG = $TRAVIS_REPO_SLUG ] && [ $TRAVIS_PULL_REQUEST_BRANCH = "dev" ])'
# Pull the docker image first so the test doesn't wait for this
- docker pull czbiohub/nf-kmer-similarity:dev
- docker pull nfcore/kmermaid:dev
# Fake the tag locally so that the pipeline runs properly
- docker tag czbiohub/nf-kmer-similarity:dev czbiohub/nf-kmer-similarity:dev
- docker tag nfcore/kmermaid:dev nfcore/kmermaid:dev

install:
# Install Nextflow
- mkdir /tmp/nextflow && cd /tmp/nextflow
- wget -qO- get.nextflow.io | bash
- sudo ln -s /tmp/nextflow/nextflow /usr/local/bin/nextflow
# Install nf-core/tools
- pip install nf-core
# Reset
- mkdir ${TRAVIS_BUILD_DIR}/tests && cd ${TRAVIS_BUILD_DIR}/tests

env:
- NXF_VER='19.03.0-edge' # Specify a minimum NF version that should be tested and work
- NXF_VER='' # Plus: get the latest NF version and check that it works
# Install markdownlint-cli
- sudo apt-get install npm && npm install -g markdownlint-cli

script:
# Lint the pipeline code
# Skip linting for now since container is built by czbiohub
# - nf-core lint ${TRAVIS_BUILD_DIR}
# Run the pipeline with the test profile
- nextflow run ${TRAVIS_BUILD_DIR} -profile test,docker
- nextflow run ${TRAVIS_BUILD_DIR} -profile ${SUITE},docker ${FLAGS}
18 changes: 17 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,20 @@
# nf-core/nf-kmer-similarity: Changelog

## v1.0dev - 6 March 2019
## v1.1dev

* Add option to use Dayhoff encoding for sourmash
* Add `ska` and `seqtk` to container dependencies

## v1.0 - 6 March 2019

Initial release of nf-core/nf-kmer-similarity, created with the [nf-core](http://nf-co.re/) template.

## v1.1dev - 9 September 2019

#### Pipeline Updates
* Added fastq subsampling/truncating optional parameter using [seqtk](https://github.com/lh3/seqtk)
* Added support for kmer comparisons using Split Kmer Analysis [SKA](https://github.com/simonrharris/SKA)

#### Dependency Updates
* seqtk -> 1.3
* ska -> 1.0
43 changes: 14 additions & 29 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,20 +1,17 @@
FROM continuumio/anaconda3
MAINTAINER olga.botvinnik@czbiohub.org
FROM nfcore/base
LABEL description="Docker image containing all requirements for nf-core/kmermaid pipeline"

COPY environment.yml /
RUN conda env create -f /environment.yml && conda clean -a
ENV PATH /opt/conda/envs/nfcore-kmermaid-0.1dev/bin:$PATH

# Suggested tags from https://microbadger.com/labels
ARG VCS_REF
LABEL org.label-schema.vcs-ref=$VCS_REF \
org.label-schema.vcs-url="e.g. https://github.com/czbiohub/nf-kmer-similarity"

org.label-schema.vcs-url="e.g. https://github.com/nf-core/kmermaid"

WORKDIR /home

USER root

# Add user "main" because that's what is expected by this image
RUN useradd -ms /bin/bash main


ENV PACKAGES zlib1g git g++ make ca-certificates gcc zlib1g-dev libc6-dev procps

### don't modify things below here for version updates etc.
Expand All @@ -25,28 +22,16 @@ RUN apt-get update && \
apt-get install -y --no-install-recommends ${PACKAGES} && \
apt-get clean

RUN conda install --yes Cython bz2file pytest numpy matplotlib scipy sphinx alabaster

RUN which -a pip
RUN which -a python
ENV SOURMASH_VERSION 'olgabot/dayhoff'
RUN cd /home && \
git clone https://github.com/dib-lab/khmer.git -b master && \
cd khmer && \
python3 setup.py install

# Check that khmer was installed properly
RUN trim-low-abund.py --help
RUN trim-low-abund.py --version


# Required for multiprocessing of 10x bam file
# RUN pip install pathos bamnostic

# ENV SOURMASH_VERSION master
RUN cd /home && \
git clone https://github.com/dib-lab/sourmash.git && \
git clone --branch $SOURMASH_VERSION https://github.com/czbiohub/sourmash.git && \
cd sourmash && \
python3 setup.py install
python setup.py install

RUN which -a sourmash

RUN which -a python3
RUN python3 --version
RUN sourmash info
COPY docker/sysctl.conf /etc/sysctl.conf
7 changes: 7 additions & 0 deletions Dockerfile.ska
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
FROM nfcore/base
LABEL authors="Phoenix Logan" \
description="Docker image containing all requirements for ska portion of the nfcore/kmermaid pipeline"

COPY environment.ska.yml /
RUN conda env create -f environment.ska.yml && conda clean -a
ENV PATH /opt/conda/envs/nf-core-splitkmeranalysis-1.0.0/bin:$PATH
43 changes: 34 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,33 +1,58 @@
# nf-kmer-similarity

This is a [Nextflow](nextflow.io) workflow for running k-mer similarity
This is a [Nextflow](nextflow.io) workflow for running k-mer similarity.

[![Docker Cloud Build Status](https://img.shields.io/docker/cloud/build/czbiohub/nf-kmer-similarity.svg)](https://cloud.docker.com/u/czbiohub/repository/docker/czbiohub/nf-kmer-similarity)

## Usage

### With a samples.csv file:
By default, this pipeline creates a [MinHash](https://en.wikipedia.org/wiki/MinHash) sketch of sequencing reads using [sourmash](https://sourmash.readthedocs.io), then compares them all using a [Jaccard index](https://en.wikipedia.org/wiki/Jaccard_index) . Here are the default parameters:

```
- log2 sketch sizes of 10, 12, 14, 16 (as if `--log2_sketch_sizes 10,12,14,16` was specified on the command line), so 2^10, 2^12, 2^14, 2^16 = 1024, 4096, 16 384, 65 536 hashed k-mers for each sample
- Compute both DNA and protein signatures (as if `--molecules dna,protein` was specified on the command line). The protein k-mers are obtained by doing [six-frame translation](https://en.wikipedia.org/wiki/Reading_frame#/media/File:Open_reading_frame.jpg) on the DNA k-mers
- K-mer sizes of 21, 27, 33, 51 (as if `--ksizes 21,27,33,51` was specified on the command line).
- If using the `--splitKmer` option, keep in mind that the k-mer size in this case is the two halves of the split k-mer, which you can visualize as `[---ksize---]N[---ksize---]`. So the default k-mer sizes for `--splitKmer` is 9 and 15, for a total sequence unit size of `2*15+1 = 31` and `2*9+1 = 19` which is as if you specified on the command line `--splitKmer --ksize 9,15`. Additionally k-mer sizes with `--splitKmer` must be divisible by 3 (yes, this is inconvenient)

### With a samples.csv file

This is where you'd have a csv file with a `sample_id,read1,read2` header containing the sample id and paths to each of your R1 and R2 read files.

```bash
nextflow run czbiohub/nf-kmer-similarity --outdir s3://olgabot-maca/nf-kmer-similarity/ --samples samples.csv
```

### With R1, R2 read pairs:
### With R1, R2 read pairs

```
```bash
nextflow run czbiohub/nf-kmer-similarity --outdir s3://olgabot-maca/nf-kmer-similarity/ \
--read_pairs 's3://olgabot-maca/sra/homo_sapiens/smartseq2_quartzseq/*{R1,R2}*.fastq.gz,s3://olgabot-maca/sra/danio_rerio/smart-seq/whole_kidney_marrow_prjna393431/*{1,2}.fastq.gz'
```

### With SRA ids:
### With SRA ids

```
```bash
nextflow run czbiohub/nf-kmer-similarity --outdir s3://olgabot-maca/nf-kmer-similarity/ --sra SRP016501
```

### With fasta files:
### With fasta files

```
```bash
nextflow run czbiohub/nf-kmer-similarity --outdir s3://olgabot-maca/nf-kmer-similarity/ \
--fastas '*.fasta'
```

### With Split Kmer Analysis [SKA](https://github.com/simonrharris/SKA)

Note: the meaning of `ksize` is different with split k-mers, so now the value specified by `--ksize` is just under half of the total sampled sequence size, where the middle base can be any base (`N`) `[---ksize---]N[---ksize---]`. Note that `--splitKmer` can only work with DNA sequence and does not work with `protein` specified in `--molecules`.

```bash
nextflow run czbiohub/nf-kmer-similarity --outdir s3://olgabot-maca/nf-kmer-similarity/ --samples samples.csv --splitKmer
```

### With Split Kmer Analysis [SKA](https://github.com/simonrharris/SKA) and fastq subsampling with [seqtk](https://github.com/lh3/seqtk)

The `subsample` command is often necessary because the `ska` tool uses ALL the reads rather than a MinHash subsampling of them. If your input files are rather big, then the `ska` sketching command (`ska fastq`) runs out of memory, or it takes so long that it's untenable. The `--subsample` command specifies the number of reads to be used.

```bash
nextflow run czbiohub/nf-kmer-similarity --outdir s3://olgabot-maca/nf-kmer-similarity/ --samples samples.csv --splitKmer --subsample 1000
```
4 changes: 2 additions & 2 deletions conf/base.config
Original file line number Diff line number Diff line change
Expand Up @@ -13,15 +13,15 @@ process {

cpus = { check_max( 2, 'cpus' ) }
memory = { check_max( 8.GB * task.attempt, 'memory' ) }
time = { check_max( 2.h * task.attempt, 'time' ) }
time = { check_max( 16.h * task.attempt, 'time' ) }

errorStrategy = { task.exitStatus in [143,137,104,134,139] ? 'retry' : 'terminate' }
maxRetries = 1
maxErrors = '-1'

// Process-specific resource requirements
withLabel: low_memory {
memory = { check_max( 16.GB * task.attempt, 'memory' ) }
memory = { check_max( 8.GB * task.attempt, 'memory' ) }
}
withLabel: mid_memory {
memory = { check_max( 32.GB * task.attempt, 'memory' ) }
Expand Down
2 changes: 1 addition & 1 deletion conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ params {
// fastas = 'testing/fastas/*.fasta'
ksizes = '3,9'
log2_sketch_sizes = '2,4'
molecules = 'dna,protein'
molecules = 'dna,protein,dayhoff'
// read_pairs = 'testing/fastqs/*{1,2}.fastq.gz'
// sra = "SRP016501"
read_paths = [
Expand Down
27 changes: 27 additions & 0 deletions conf/test_ska.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
/*
* -------------------------------------------------
* Nextflow config file for running tests
* -------------------------------------------------
* Defines bundled input files and everything required
* to run a fast and simple test. Use as follows:
* nextflow run nf-core/rnaseq -profile test
*/

params {
config_profile_name = 'Test profile'
config_profile_description = 'Minimal test dataset to check pipeline function'
// Limit resources so that this can run on Travis
max_cpus = 2
max_memory = 6.GB
max_time = 48.h
// Input data
ksizes = '3,6'
molecules = 'dna'
splitKmer = true
read_paths = [
['SRR4050379', ['https://github.com/czbiohub/test-datasets/raw/kmer-similarity/testdata/SRR4050379_pass_1.fastq.gz',
'https://github.com/czbiohub/test-datasets/raw/kmer-similarity/testdata/SRR4050379_pass_2.fastq.gz']],
['SRR4050380', ['https://github.com/czbiohub/test-datasets/raw/kmer-similarity/testdata/SRR4050380_pass_1.fastq.gz',
'https://github.com/czbiohub/test-datasets/raw/kmer-similarity/testdata/SRR4050380_pass_2.fastq.gz']],
]
}
6 changes: 3 additions & 3 deletions docs/installation.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# nf-core/nf-kmer-similarity: Installation
# nf-core/kmermaid: Installation

To start using the nf-core/nf-kmer-similarity pipeline, follow the steps below:
To start using the nf-core/kmermaid pipeline, follow the steps below:

1. [Install Nextflow](#1-install-nextflow)
2. [Install the pipeline](#2-install-the-pipeline)
Expand Down Expand Up @@ -72,7 +72,7 @@ Be warned of two important points about this default configuration:
#### 3.1) Software deps: Docker
First, install docker on your system: [Docker Installation Instructions](https://docs.docker.com/engine/installation/)

Then, running the pipeline with the option `-profile standard,docker` tells Nextflow to enable Docker for this run. An image containing all of the software requirements will be automatically fetched and used from dockerhub (https://hub.docker.com/r/nfcore/nf-kmer-similarity).
Then, running the pipeline with the option `-profile standard,docker` tells Nextflow to enable Docker for this run. An image containing all of the software requirements will be automatically fetched and used from [dockerhub](https://hub.docker.com/r/nfcore/nf-kmer-similarity).

#### 3.1) Software deps: Singularity
If you're not able to use Docker then [Singularity](http://singularity.lbl.gov/) is a great alternative.
Expand Down
Loading