Provide either or both of --reads and --SRA_index. Alternatively, provide --bam_input if you have pre-processed BAM files (can't be combined with --reads or --SRA_index). See below for details.
Path to a plain-text file listing NCBI/ENA single or paired-end Illumina read accessions (one per line). Accepts SRR, SRX, SRP, PRJNA, ERR, etc.
--NCBI_API_key
string
—
Required with --SRA_index. Your personal NCBI API key.
--SRR_sample_map
path
false
CSV file mapping SRR IDs to sample names (SRR_ID,Sample_Name). Allows merging multiple runs per sample and renaming samples. See Getting started for format.
Glob pattern for pre-existing BAM files. Skips all read-processing steps (trimming, mapping, deduplication) and starts directly with variant calling. Cannot be combined with --reads or --SRA_index.
BAM file requirements
BAM files provided via --bam_input must be:
Coordinate-sorted
Containing @RG read group information in the header
Accompanied by a .bai index file in the same directory
These requirements are not validated by the pipeline — ensure your files comply before running.
Required. Absolute path to the reference genome in FASTA format. Must have the .fasta extension (not .fa, .fna, .fas).
--reference_segments
integer
0
Size in bp of genome segments used for parallel variant calling. 0 disables segmentation. Smaller values increase parallelism at the cost of overhead.
--min_contig_length
integer
false
Filter reference contigs shorter than this value (bp). Useful for excluding small scaffolds. false disables filtering.
--bwa_index
path
—
Path prefix of pre-built BWA-mem2 index files. Skips BWA indexing. The pipeline expects the index files (.amb, .ann, .bwt.2bit.64, .pac, .0123) to be co-located with the path prefix.
Required. Ploidy level for GATK HaplotypeCaller. 1 for haploid (fungi, bacteria), 2 for diploid. Higher values can represent pooled samples.
--call_invar_sites
boolean
false
When true, GATK HaplotypeCaller also emits invariant (monomorphic) sites. Substantially increases output size. Useful for some downstream analyses requiring full genome coverage.