seqWorkflows

Reusable Snakemake workflows for RNA-seq quality control, preprocessing, transcript quantification, de novo transcriptome assembly, and reference-guided assembly.

The recommended entry point is the seqworkflow command. It creates a run-specific config file, links FASTQs into the expected layout, and launches the selected Snakemake workflow. In normal use, you do not need to edit YAML config files by hand.

Requirements

Choose one execution style:

Local: Snakemake and all workflow tools installed on your machine.
Apptainer: Apptainer plus the seqworkflows.sif image.
OrbStack/Docker: OrbStack or Docker plus the versioned ghcr.io/natmurad/seqworkflows:1.0.0 image.

Quick Start

Add the repository command-line tools to your PATH:

export PATH="/path/to/seqWorkflows/bin:$PATH"

Show all modes and options:

seqworkflow --help

Run a paired-end reference preprocessing workflow:

seqworkflow preprocessPE R1.fastq.gz R2.fastq.gz results/preprocessPE \
  --ref-genome genome.fa \
  --gtf-file annotation.gtf \
  --sample sample1 \
  --threads 10 \
  --jobs 4 \
  --strandedness reverse

Run a dry-run first:

seqworkflow preprocessPE R1.fastq.gz R2.fastq.gz results/preprocessPE \
  --ref-genome genome.fa \
  --gtf-file annotation.gtf \
  --dry-run

seqworkflows also works as an alias, but seqworkflow is the preferred command name.

Modes

Mode	Input	Purpose
`qcPE`	paired-end FASTQs	FastQC, trimming, fastp, MultiQC
`preprocessPE`	paired-end FASTQs + reference	QC, trimming, STAR, RSEM
`preprocessSE`	single-end FASTQ + reference	QC, trimming, STAR, RSEM
`denovoPE`	paired-end FASTQs	Trinity de novo assembly, annotation, abundance, DE/GO
`denovoSE`	single-end FASTQ	Trinity de novo assembly, annotation, abundance, DE/GO
`refguidedPE`	paired-end FASTQs + reference	STAR/RSEM, genome-guided Trinity, annotation, DE/GO

Supported Status

Area	Status	Notes
`seqworkflow` CLI	Supported	Creates per-run config and input symlinks for all listed modes
`qcPE`	Supported	Paired-end QC/trimming workflow; `qC_PE` remains a legacy alias
`preprocessPE`	Supported	Main paired-end reference workflow
`preprocessSE`	Supported	Main single-end reference workflow
`denovoPE`	Supported	Requires valid Trinity sample/contrast files for real DE analysis
`denovoSE`	Supported	Requires valid Trinity sample/contrast files for real DE analysis
`refguidedPE`	Supported	Paired-end reference-guided workflow
SRA rules	Experimental	Rule files are present, but not exposed as a top-level CLI mode yet
single-cell rules	Experimental	Cell Ranger is not bundled in the unified container
SignalP step	Optional/external	SignalP is not bundled because it usually requires separate licensing

Examples

Paired-end QC only:

seqworkflow qcPE R1.fastq.gz R2.fastq.gz results/qc \
  --sample sample1 \
  --jobs 4

Single-end preprocessing:

seqworkflow preprocessSE reads.fastq.gz results/preprocessSE \
  --ref-genome genome.fa \
  --gtf-file annotation.gtf \
  --sample sample1 \
  --strandedness reverse

Paired-end de novo workflow:

seqworkflow denovoPE R1.fastq.gz R2.fastq.gz results/denovoPE \
  --sample sample1 \
  --sample-file sample_file.txt \
  --contrast-file contrast_file.txt \
  --jobs 4

Reference-guided paired-end workflow:

seqworkflow refguidedPE R1.fastq.gz R2.fastq.gz results/refguidedPE \
  --ref-genome genome.fa \
  --gtf-file annotation.gtf \
  --sample sample1 \
  --sample-file sample_file.txt \
  --contrast-file contrast_file.txt

For mode-specific options:

seqworkflow preprocessPE --help
seqworkflow denovoPE --help

Inputs

Minimum inputs:

Raw FASTQ files: .fastq.gz, .fq.gz, .fastq, or .fq.
Reference workflows: genome FASTA and annotation GTF/GFF.
De novo and reference-guided workflows: sample_file.txt and contrast_file.txt for Trinity differential expression steps.

If --sample-file or --contrast-file is omitted for a de novo mode, seqworkflow creates a minimal placeholder file so the DAG can be built. For real differential expression analysis, pass proper sample and contrast files.

Output Layout

Each run writes into the OUTDIR you provide. Reference workflows keep the generated RSEM/STAR index in a shared sibling directory so later runs can reuse it:

PARENT_DIR/
  ref/rsemRef/            shared RSEM/STAR reference index, when needed
  OUTDIR/
    config/               generated YAML config for this run
    input/                symlinks to input FASTQs using pipeline naming
    trimmed/              trimmed FASTQs
    qC/                   FastQC, fastp, MultiQC and logs
    map/                  STAR alignments
    counts/               RSEM outputs and count matrices
    assembly_trinity/     Trinity outputs, when needed

Actual subdirectories depend on the selected mode.

Use a distinct --rsem-ref-dir for each genome build and annotation pair. Do not reuse an index generated from a different FASTA or GTF/GFF file. The CLI stores SHA-256 hashes in ref/rsemRef/.seqworkflow-reference.json and stops before running Snakemake if a later run tries to reuse the directory with incompatible inputs.

Containers

The project has one unified container recipe:

containers/seqworkflows.def
containers/seqworkflows.Dockerfile
env/seqworkflows.yml
env/seqworkflows-linux-64.lock.yml

env/seqworkflows.yml is the readable package recipe. env/seqworkflows-linux-64.lock.yml is the pinned Linux container lockfile exported from a successful build and used by the Docker/Apptainer recipes.

Apptainer

Build:

apptainer build seqworkflows.sif containers/seqworkflows.def

Run:

seqworkflow preprocessPE R1.fastq.gz R2.fastq.gz results/preprocessPE \
  --ref-genome genome.fa \
  --gtf-file annotation.gtf \
  --runtime apptainer \
  --container-image seqworkflows.sif

OrbStack / Docker

Build with OrbStack or Docker running:

scripts/build_orbstack.sh

Run the workflow inside the image:

docker run --rm -it \
  -v "$PWD:/work" \
  -w /work \
  ghcr.io/natmurad/seqworkflows:1.0.0 \
  bin/seqworkflow preprocessPE R1.fastq.gz R2.fastq.gz results/preprocessPE \
    --ref-genome genome.fa \
    --gtf-file annotation.gtf \
    --runtime orbstack

The build script uses docker if it is available in PATH. If not, it uses OrbStack's bundled Docker CLI at /Applications/OrbStack.app/Contents/MacOS/xbin/docker. The image is built as linux/amd64 because Bioconda has broader package support there than on macOS ARM.

Versioned images are published to GHCR by .github/workflows/publish-container.yml when a release tag such as v1.0.0 is pushed.

In OrbStack/Docker mode, Snakemake runs inside the already-started container. In Apptainer mode, Snakemake uses the configured container image for each rule.

SignalP and Cell Ranger are not bundled in the unified image because they usually require separate licensing or vendor downloads.

Main Tools

FastQC
Trimmomatic
fastp
MultiQC
STAR
RSEM
samtools
Trinity
Bowtie2
BLAST+
CD-HIT
BUSCO
TransDecoder
HMMER
Trinotate
DESeq2 / GOseq

Tests

Run the lightweight CLI tests:

python3 tests/test_seqworkflow_cli.py

These tests validate the command-line interface, generated configs, input symlinks, and per-mode wiring without requiring Snakemake. Full workflow validation still requires running --dry-run or real jobs in an environment with Snakemake and the workflow tools available.

Advanced Usage

The generated config file is written to:

OUTDIR/config/<mode>.yaml

For the QC mode, qcPE is the public CLI name and the legacy Snakefile/config name remains qC_PE.

You can still run Snakemake manually with a custom config:

SEQWORKFLOWS_CONFIG=config/my_dataset.yaml snakemake -s preprocessPE -j 4

Or with Apptainer through Snakemake:

SEQWORKFLOWS_CONFIG=config/my_dataset.yaml snakemake -s preprocessPE -j 4 --sdm apptainer

Detailed paired-end preprocessing documentation is available in docs/preprocessPE.md. Container build notes are in containers/README.md. Recent project changes are summarized in CHANGELOG.md.

Development

Workflows are composed from rule files under:

rules/PE/
rules/SE/
rules/BOTH/
rules/SRA/
rules/SINGLECELL/

New workflows can be built by composing existing rules in a top-level Snakefile.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

seqWorkflows

Requirements

Quick Start

Modes

Supported Status

Examples

Inputs

Output Layout

Containers

Apptainer

OrbStack / Docker

Main Tools

Tests

Advanced Usage

Development

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
.github/workflows		.github/workflows
bin		bin
config		config
containers		containers
data		data
docs		docs
env		env
rules		rules
scripts		scripts
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
README.md		README.md
denovoPE		denovoPE
denovoSE		denovoSE
preprocessPE		preprocessPE
preprocessSE		preprocessSE
qC_PE		qC_PE
refguidedPE		refguidedPE
singleCell		singleCell

Folders and files

Latest commit

History

Repository files navigation

seqWorkflows

Requirements

Quick Start

Modes

Supported Status

Examples

Inputs

Output Layout

Containers

Apptainer

OrbStack / Docker

Main Tools

Tests

Advanced Usage

Development

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages