Reusable Snakemake workflows for RNA-seq quality control, preprocessing, transcript quantification, de novo transcriptome assembly, and reference-guided assembly.
The recommended entry point is the seqworkflow command. It creates a run-specific config file, links FASTQs into the expected layout, and launches the selected Snakemake workflow. In normal use, you do not need to edit YAML config files by hand.
Choose one execution style:
- Local: Snakemake and all workflow tools installed on your machine.
- Apptainer: Apptainer plus the
seqworkflows.sifimage. - OrbStack/Docker: OrbStack or Docker plus the versioned
ghcr.io/natmurad/seqworkflows:1.0.0image.
Add the repository command-line tools to your PATH:
export PATH="/path/to/seqWorkflows/bin:$PATH"Show all modes and options:
seqworkflow --helpRun a paired-end reference preprocessing workflow:
seqworkflow preprocessPE R1.fastq.gz R2.fastq.gz results/preprocessPE \
--ref-genome genome.fa \
--gtf-file annotation.gtf \
--sample sample1 \
--threads 10 \
--jobs 4 \
--strandedness reverseRun a dry-run first:
seqworkflow preprocessPE R1.fastq.gz R2.fastq.gz results/preprocessPE \
--ref-genome genome.fa \
--gtf-file annotation.gtf \
--dry-runseqworkflows also works as an alias, but seqworkflow is the preferred command name.
| Mode | Input | Purpose |
|---|---|---|
qcPE |
paired-end FASTQs | FastQC, trimming, fastp, MultiQC |
preprocessPE |
paired-end FASTQs + reference | QC, trimming, STAR, RSEM |
preprocessSE |
single-end FASTQ + reference | QC, trimming, STAR, RSEM |
denovoPE |
paired-end FASTQs | Trinity de novo assembly, annotation, abundance, DE/GO |
denovoSE |
single-end FASTQ | Trinity de novo assembly, annotation, abundance, DE/GO |
refguidedPE |
paired-end FASTQs + reference | STAR/RSEM, genome-guided Trinity, annotation, DE/GO |
| Area | Status | Notes |
|---|---|---|
seqworkflow CLI |
Supported | Creates per-run config and input symlinks for all listed modes |
qcPE |
Supported | Paired-end QC/trimming workflow; qC_PE remains a legacy alias |
preprocessPE |
Supported | Main paired-end reference workflow |
preprocessSE |
Supported | Main single-end reference workflow |
denovoPE |
Supported | Requires valid Trinity sample/contrast files for real DE analysis |
denovoSE |
Supported | Requires valid Trinity sample/contrast files for real DE analysis |
refguidedPE |
Supported | Paired-end reference-guided workflow |
| SRA rules | Experimental | Rule files are present, but not exposed as a top-level CLI mode yet |
| single-cell rules | Experimental | Cell Ranger is not bundled in the unified container |
| SignalP step | Optional/external | SignalP is not bundled because it usually requires separate licensing |
Paired-end QC only:
seqworkflow qcPE R1.fastq.gz R2.fastq.gz results/qc \
--sample sample1 \
--jobs 4Single-end preprocessing:
seqworkflow preprocessSE reads.fastq.gz results/preprocessSE \
--ref-genome genome.fa \
--gtf-file annotation.gtf \
--sample sample1 \
--strandedness reversePaired-end de novo workflow:
seqworkflow denovoPE R1.fastq.gz R2.fastq.gz results/denovoPE \
--sample sample1 \
--sample-file sample_file.txt \
--contrast-file contrast_file.txt \
--jobs 4Reference-guided paired-end workflow:
seqworkflow refguidedPE R1.fastq.gz R2.fastq.gz results/refguidedPE \
--ref-genome genome.fa \
--gtf-file annotation.gtf \
--sample sample1 \
--sample-file sample_file.txt \
--contrast-file contrast_file.txtFor mode-specific options:
seqworkflow preprocessPE --help
seqworkflow denovoPE --helpMinimum inputs:
- Raw FASTQ files:
.fastq.gz,.fq.gz,.fastq, or.fq. - Reference workflows: genome FASTA and annotation GTF/GFF.
- De novo and reference-guided workflows:
sample_file.txtandcontrast_file.txtfor Trinity differential expression steps.
If --sample-file or --contrast-file is omitted for a de novo mode, seqworkflow creates a minimal placeholder file so the DAG can be built. For real differential expression analysis, pass proper sample and contrast files.
Each run writes into the OUTDIR you provide. Reference workflows keep the
generated RSEM/STAR index in a shared sibling directory so later runs can reuse
it:
PARENT_DIR/
ref/rsemRef/ shared RSEM/STAR reference index, when needed
OUTDIR/
config/ generated YAML config for this run
input/ symlinks to input FASTQs using pipeline naming
trimmed/ trimmed FASTQs
qC/ FastQC, fastp, MultiQC and logs
map/ STAR alignments
counts/ RSEM outputs and count matrices
assembly_trinity/ Trinity outputs, when needed
Actual subdirectories depend on the selected mode.
Use a distinct --rsem-ref-dir for each genome build and annotation pair. Do
not reuse an index generated from a different FASTA or GTF/GFF file. The CLI
stores SHA-256 hashes in ref/rsemRef/.seqworkflow-reference.json and stops
before running Snakemake if a later run tries to reuse the directory with
incompatible inputs.
The project has one unified container recipe:
containers/seqworkflows.def
containers/seqworkflows.Dockerfile
env/seqworkflows.yml
env/seqworkflows-linux-64.lock.yml
env/seqworkflows.yml is the readable package recipe. env/seqworkflows-linux-64.lock.yml is the pinned Linux container lockfile exported from a successful build and used by the Docker/Apptainer recipes.
Build:
apptainer build seqworkflows.sif containers/seqworkflows.defRun:
seqworkflow preprocessPE R1.fastq.gz R2.fastq.gz results/preprocessPE \
--ref-genome genome.fa \
--gtf-file annotation.gtf \
--runtime apptainer \
--container-image seqworkflows.sifBuild with OrbStack or Docker running:
scripts/build_orbstack.shRun the workflow inside the image:
docker run --rm -it \
-v "$PWD:/work" \
-w /work \
ghcr.io/natmurad/seqworkflows:1.0.0 \
bin/seqworkflow preprocessPE R1.fastq.gz R2.fastq.gz results/preprocessPE \
--ref-genome genome.fa \
--gtf-file annotation.gtf \
--runtime orbstackThe build script uses docker if it is available in PATH. If not, it uses OrbStack's bundled Docker CLI at /Applications/OrbStack.app/Contents/MacOS/xbin/docker. The image is built as linux/amd64 because Bioconda has broader package support there than on macOS ARM.
Versioned images are published to GHCR by .github/workflows/publish-container.yml
when a release tag such as v1.0.0 is pushed.
In OrbStack/Docker mode, Snakemake runs inside the already-started container. In Apptainer mode, Snakemake uses the configured container image for each rule.
SignalP and Cell Ranger are not bundled in the unified image because they usually require separate licensing or vendor downloads.
- FastQC
- Trimmomatic
- fastp
- MultiQC
- STAR
- RSEM
- samtools
- Trinity
- Bowtie2
- BLAST+
- CD-HIT
- BUSCO
- TransDecoder
- HMMER
- Trinotate
- DESeq2 / GOseq
Run the lightweight CLI tests:
python3 tests/test_seqworkflow_cli.pyThese tests validate the command-line interface, generated configs, input symlinks, and per-mode wiring without requiring Snakemake. Full workflow validation still requires running --dry-run or real jobs in an environment with Snakemake and the workflow tools available.
The generated config file is written to:
OUTDIR/config/<mode>.yaml
For the QC mode, qcPE is the public CLI name and the legacy Snakefile/config name remains qC_PE.
You can still run Snakemake manually with a custom config:
SEQWORKFLOWS_CONFIG=config/my_dataset.yaml snakemake -s preprocessPE -j 4Or with Apptainer through Snakemake:
SEQWORKFLOWS_CONFIG=config/my_dataset.yaml snakemake -s preprocessPE -j 4 --sdm apptainerDetailed paired-end preprocessing documentation is available in docs/preprocessPE.md. Container build notes are in containers/README.md. Recent project changes are summarized in CHANGELOG.md.
Workflows are composed from rule files under:
rules/PE/
rules/SE/
rules/BOTH/
rules/SRA/
rules/SINGLECELL/
New workflows can be built by composing existing rules in a top-level Snakefile.