Changelog¶
v1.0.9 — 2026-05-13¶
Improvements¶
- SLURM partition compatibility. Added
--slurm_queueparameter (required with-profile slurm) to specify the target SLURM partition. The pipeline now exits immediately with a clear error message if this parameter is missing when running on SLURM. The partition should allow a maximum walltime of at least 7 days; shorter limits (1–2 days) may work for small genomes or low-depth sequencing.
v1.0.8 — 2026-05-06¶
Bug fixes¶
- BAM files are now deleted mid-run after all GATKHC tasks complete.
A newcleanupBAMsprocess receives a count-based sentinel that resolves only after everyGATKHCtask across all intervals has emitted. This guarantees that no BAM is removed while any interval job still needs it, yet reclaims the space as soon as the last GATKHC finishes — without waiting for the full pipeline to succeed. BAMs from--bam_inputruns are never touched. - Fixed zero-length vector crash in
r_process_summary_fastp.Rfor SE samples.
as.numeric(NULL)produces a zero-length vector, which silently corrupted the summary matrix. Bothduplication$rateandinsert_size$peaknow use an explicitlength(v) == 0guard before assignment.
v1.0.7 — 2026-05-05¶
New features¶
- Replaced
CombineGVCFswithGenomicsDBImportfor GVCF consolidation.
The legacyCombineGVCFsstep has been replaced by GATK'sGenomicsDBImport, which uses on-disk TileDB storage instead of in-memory merging. This eliminates GC-thrashing at large sample counts (tested at 1,605 samples). Batch size is configurable via--genomicsdb_batch_size(default: 50, the Broad production default).
Compatibility fixes (Nextflow ≥ 26.x)¶
log.infobanner moved inside theworkflowblock — top-level executable statements are no longer permitted in Nextflow 26.x.- C-style
for (int i = ...)andwhileloops replaced with Groovy functional expressions (takeWhile/collect,.step().each). ${HOME}environment variable interpolation innextflow.configreplaced withenv('HOME')— bare env-var interpolation was removed in Nextflow 26.x.
Improvements¶
- JVM flags
-XX:-UsePerfData --enable-native-access=ALL-UNNAMEDadded to all GATK processes to suppress Java 17+ module-restriction warnings and/tmp/hsperfdata_*lock conflicts under concurrent Singularity jobs. gatkIndexbase memory increased from 2 GB to 4 GB to prevent a-Xmx0gcrash on the first attempt.- Fixed
r_process_summary_fastp.Rcrash on single-end samples:j$insert_size$peakisNULLfor SE data;as.numeric(NULL)produces a zero-length vector that silently failed the matrix assignment. Now handled with an explicit length check.
v1.0.6 — 2026-04-10¶
New features¶
-
Automatic Singularity image download.
All container images are pulled automatically by Nextflow on the first run — no manualsingularity pullstep is needed. Images are fetched from quay.io/biocontainers and cached in$HOME/.singularity/cache(configurable viasingularity.cacheDir). Subsequent runs reuse the cached images without re-downloading. -
Bundled example dataset.
The repository ships with a ready-to-run E. coli LTEE example (example/) that lets you verify your setup end-to-end. Three public SRA samples (SRR2589044, SRR2584863, SRR2584866) mapped against the REL606 reference genome serve as the test case. See Getting started → Step 4 for instructions.
Improvements¶
- Switched all R processes to
rocker/tidyverse:4.4.3for a reliable, actively maintained R environment. - Improved QC HTML report: accessible Okabe-Ito colour palette, version and runtime metadata in the header, auto-zoomed BWA mapping-rate axis, violin + jitter distribution plot.
- Fastp JSON summary parsing rewritten with
jsonliteto correctly separate before- and after-filtering statistics. - Fixed
geom_vline(linewidth=)for ggplot2 ≥ 3.5 compatibility. - Singularity
autoMountsenabled and environment whitelist added for seamless APPTAINER/SINGULARITY work-directory binding on HPC. - Pipeline statistics report (
pipeline_statistics.nf) consolidated from per-process output files.
v1.0.5 and earlier¶
See the GitHub commit history for changes prior to v1.0.6.