Skip to content

Configuration

All parameters are passed on the command line with --param value. Boolean flags are set with true or false.


Execution mode

Parameter Type Default Description
-profile string local Execution profile. local — up to 4 threads per task. local_highCPU — up to 24 threads. slurm — submit all tasks as SLURM jobs.
-resume flag Resume from the last completed step. Requires the work-dir to be intact.
-work-dir path ./work Directory for temporary and intermediate files. Use a fast scratch filesystem for large datasets.
--outdir path ./nf_output Directory for final output files.

Read input options

Provide either or both of --reads and --SRA_index. Alternatively, provide --bam_input if you have pre-processed BAM files (can't be combined with --reads or --SRA_index). See below for details.

Local FASTQ files

Parameter Type Default Description
--reads glob Glob pattern pointing to paired-end FASTQ files. Must be single-quoted. Accepts paired-end reads only.

SRA / ENA accessions

Parameter Type Default Description
--SRA_index path Path to a plain-text file listing NCBI/ENA single or paired-end Illumina read accessions (one per line). Accepts SRR, SRX, SRP, PRJNA, ERR, etc.
--NCBI_API_key string Required with --SRA_index. Your personal NCBI API key.
--SRR_sample_map path false CSV file mapping SRR IDs to sample names (SRR_ID,Sample_Name). Allows merging multiple runs per sample and renaming samples. See Getting started for format.

Pre-processed BAM files

Parameter Type Default Description
--bam_input glob Glob pattern for pre-existing BAM files. Skips all read-processing steps (trimming, mapping, deduplication) and starts directly with variant calling. Cannot be combined with --reads or --SRA_index.

BAM file requirements

BAM files provided via --bam_input must be:

  • Coordinate-sorted
  • Containing @RG read group information in the header
  • Accompanied by a .bai index file in the same directory

These requirements are not validated by the pipeline — ensure your files comply before running.


Reference genome options

Parameter Type Default Description
--reference path Required. Absolute path to the reference genome in FASTA format. Must have the .fasta extension (not .fa, .fna, .fas).
--reference_segments integer 0 Size in bp of genome segments used for parallel variant calling. 0 disables segmentation. Smaller values increase parallelism at the cost of overhead.
--min_contig_length integer false Filter reference contigs shorter than this value (bp). Useful for excluding small scaffolds. false disables filtering.
--bwa_index path Path prefix of pre-built BWA-mem2 index files. Skips BWA indexing. The pipeline expects the index files (.amb, .ann, .bwt.2bit.64, .pac, .0123) to be co-located with the path prefix.

Genotyping options

Parameter Type Default Description
--ploidy integer Required. Ploidy level for GATK HaplotypeCaller. 1 for haploid (fungi, bacteria), 2 for diploid. Higher values can represent pooled samples.
--call_invar_sites boolean false When true, GATK HaplotypeCaller also emits invariant (monomorphic) sites. Substantially increases output size. Useful for some downstream analyses requiring full genome coverage.

Output options

Parameter Type Default Description
--keep_bam boolean false When true, saves final per-sample BAM files (after duplicate marking) to <outdir>/bam_files/.
--keep_gvcf boolean false When true, saves per-sample GVCF files to <outdir>/gvcf_files/.

SLURM and concurrency

These settings are found in nextflow.config and can be edited directly.

Setting Default Description
executor.queueSize 300 Maximum number of tasks submitted to SLURM at once.
executor.submitRateLimit '120/1min' Maximum task submission rate (prevents overwhelming the scheduler).
SRA download maxForks 10 Maximum concurrent SRA downloads (PE and SE each). Reduce if NCBI rate-limits your connection.
fastp maxForks 20 Maximum concurrent trimming tasks (PE and SE each; I/O intensive).
GATK HC maxForks 150 Maximum concurrent HaplotypeCaller tasks.