Nextflow Pipeline: Parameters Reference

Nextflow Pipeline: Parameters Reference#

This document provides a comprehensive reference for all parameters available in the Nextflow pipeline for CAZyme annotation in microbiome data. Parameters are organized by functional category.

Note

Before running the pipeline, make sure you have cloned the repository. See Nextflow Pipeline: Usage for installation instructions. All examples assume you are in the dbcan-nf directory or use the full path to main.nf.

Input/Output Options#

Input/Output Parameters#
Parameter	Type	Required	Description
`--input`	string	Yes	Path to comma-separated file containing information about the samples in the experiment. Must have 3 columns (sample, fastq_1, fastq_2) and a header row. See Nextflow Pipeline: Usage for samplesheet format details.
`--outdir`	string	Yes	The output directory where the results will be saved. Use absolute paths for Cloud infrastructure.
`--email`	string	No	Email address for completion summary. Set this to receive a summary email when the workflow exits. Can be set in `~/.nextflow/config` to avoid specifying on every run.
`--email_on_fail`	string	No	Email address for completion summary, only sent when pipeline fails.
`--plaintext_email`	boolean	No	Send plain-text email instead of HTML. Default: `false`.

Analysis Mode Selection#

Mode Selection Parameters#
Parameter	Type	Default	Description
`--type`	string	`shortreads`	Analysis mode to use. Options: `shortreads`, `longreads`, `assemfree`. See mode-specific documentation for details.

Quality Control Options#

Quality Control Parameters#
Parameter	Type	Default	Description
`--skip_fastqc`	boolean	`false`	Skip FastQC quality control analysis. When enabled, FastQC steps are bypassed.
`--skip_trimming`	boolean	`false`	Skip TrimGalore adapter trimming. When enabled, trimming steps are bypassed.

Kraken2 Options#

Kraken2 Parameters#
Parameter	Type	Default	Description
`--skip_kraken_extraction`	boolean	`false`	Skip Kraken2 taxonomic classification and read extraction. When enabled, all reads are used without filtering.
`--kraken_db`	string	`null`	Path to custom Kraken2 database directory. If not specified, the pipeline will build a standard database automatically.
`--kraken_tax`	string	`9606`	NCBI taxonomy ID for taxonomic filtering. Default `9606` corresponds to Homo sapiens. Reads matching this taxon are extracted for downstream analysis.

Assembly Options (Short Reads Mode)#

These parameters are only applicable when --type shortreads is used.

Short Reads Assembly Parameters#
Parameter	Type	Default	Description
`--subsample`	boolean	`false`	Enable subsampling mode. Downsample each sample before assembly to reduce computational requirements. Mutually exclusive with `--coassembly`. See Short Reads: Subsampling Mode for details.
`--subsample_size`	integer	`20000000`	Number of reads per file to retain when subsampling is enabled. Only used when `--subsample` is set.
`--coassembly`	boolean	`false`	Enable co-assembly mode. Combine all samples and perform a single MEGAHIT assembly. Requires at least 2 samples. Mutually exclusive with `--subsample`. See Short Reads: Co-assembly Mode for details.

Long Reads Options#

These parameters are only applicable when --type longreads is used.

Long Reads Parameters#
Parameter	Type	Default	Description
`--flye_mode`	string	`--pacbio-hifi`	Flye assembly mode. Options include `--pacbio-hifi`, `--pacbio-raw`, `--nano-raw`, `--nano-hq`. See Flye documentation for details.

Database Options#

Database Parameters#
Parameter	Type	Default	Description
`--dbcan_db`	string	`null`	Path to custom dbCAN database directory. If not specified, the pipeline will download the database automatically.

MultiQC Options#

MultiQC Parameters#
Parameter	Type	Default	Description
`--multiqc_title`	string	`null`	MultiQC report title. Printed as page header, used for filename if not otherwise specified.
`--multiqc_config`	string	`null`	Custom config file to supply to MultiQC.
`--multiqc_logo`	string	`null`	Custom logo file to supply to MultiQC. File name must also be set in the MultiQC config file.
`--multiqc_methods_description`	string	`null`	Custom MultiQC YAML file containing HTML including a methods description.
`--max_multiqc_email_size`	string	`25.MB`	File size limit when attaching MultiQC reports to summary emails.

Generic Options#

These options are common to all nf-core pipelines and are typically set in a Nextflow config file.

Generic Parameters#
Parameter	Type	Default	Description
`--publish_dir_mode`	string	`copy`	Method used to save pipeline results to output directory. Options: `symlink`, `rellink`, `link`, `copy`, `copyNoFollow`, `move`.
`--monochrome_logs`	boolean	`false`	Do not use coloured log outputs.
`--hook_url`	string	`null`	Incoming hook URL for messaging service. Currently, MS Teams and Slack are supported.
`--validate_params`	boolean	`true`	Boolean whether to validate parameters against the schema at runtime.

Parameter Usage Examples#

Basic short reads analysis:

nextflow run main.nf \
  --input samplesheet.csv \
  --outdir results \
  --type shortreads \
  -profile docker

Short reads with subsampling:

nextflow run main.nf \
  --input samplesheet.csv \
  --outdir results \
  --type shortreads \
  --subsample \
  --subsample_size 5000000 \
  -profile docker

Long reads analysis:

nextflow run main.nf \
  --input samplesheet.csv \
  --outdir results \
  --type longreads \
  --flye_mode --nano-raw \
  -profile docker

Assembly-free analysis:

nextflow run main.nf \
  --input samplesheet.csv \
  --outdir results \
  --type assemfree \
  -profile docker

With custom databases:

nextflow run main.nf \
  --input samplesheet.csv \
  --outdir results \
  --type shortreads \
  --dbcan_db /path/to/dbcan_db \
  --kraken_db /path/to/kraken_db \
  -profile docker

Skipping quality control steps:

nextflow run main.nf \
  --input samplesheet.csv \
  --outdir results \
  --type shortreads \
  --skip_fastqc \
  --skip_trimming \
  -profile docker

Using parameter files:

nextflow run main.nf \
  -profile docker \
  -params-file params.yaml

With params.yaml:

input: './samplesheet.csv'
outdir: './results/'
type: 'shortreads'
subsample: true
subsample_size: 5000000
skip_kraken_extraction: false
kraken_tax: '9606'