Stackgen is a developer-first CLI tool to bootstrap production-ready data engineering projects in seconds.
Build batch, streaming, or full data pipelines with a single command.
- ⚡ Generate complete data engineering project scaffolds
- 🧠 Supports Batch, Streaming, and Full pipelines
- 🐳 Built-in Docker-based infrastructure setup
- 🔄 Includes Spark, Kafka, Airflow integrations
- 🎯 Interactive CLI with clean UX
- 🧩 Template-based architecture (Jinja2)
- Apache Spark (batch jobs)
- Apache Airflow (orchestration)
- Apache Kafka (event streaming)
- Spark Structured Streaming
- Batch + Streaming combined
pip install stackgen-clistackgen init my-project? Select pipeline type:
❯ Batch Pipeline (Spark + Airflow)
Streaming Pipeline (Kafka + Spark)
Full Pipeline (Spark + Airflow + Kafka)
cd my-project
docker-compose upmy-project/
├── airflow/ # DAGs
├── spark/jobs/ # Batch & streaming jobs
├── kafka/ # Kafka producer (if streaming)
├── config/ # Config files
├── docker-compose.yml
└── requirements.txt
- Python
- Apache Spark
- Apache Kafka
- Apache Airflow
- Docker
- Jinja2 (templating)
- CLI collects user input (pipeline type)
- Generator builds project structure
- Jinja templates render files dynamically
- Docker setup enables instant execution
- Feature toggles (Kafka / Airflow selection)
- Config-driven generation (YAML support)
- Cloud integrations (S3, Snowflake, BigQuery)
- Plugin system for custom stacks
- CI/CD pipeline templates
Contributions are welcome! Feel free to open issues or submit PRs.
Stackgen aims to become a go-to CLI tool for data engineers to quickly bootstrap scalable and production-ready data platforms.
Minhaz Alam
If you like this project, give it a ⭐ on GitHub!