databricks · mwojtyczka · Jul 2, 2026 · Jul 2, 2026
diff --git a/contrib/dbt_factory/.gitignore b/contrib/dbt_factory/.gitignore
@@ -0,0 +1,13 @@
+.venv/
+__pycache__/
+*.pyc
+.databricks/
+logs/
+dbt_packages/
+dbt_profiles/.user.yml
+uv.lock
+
+# dbt build artifacts, but keep the committed manifest that resources/__init__.py reads at
+# deploy time. Regenerate it with `make manifest`.
+target/*
+!target/manifest.json
diff --git a/contrib/dbt_factory/Makefile b/contrib/dbt_factory/Makefile
@@ -0,0 +1,27 @@
+.PHONY: setup deps manifest validate deploy run test
+
+# Install dependencies into the .venv used by the bundle (databricks.yml -> python.venv_path).
+setup:
+	uv sync --dev
+
+# Install dbt package dependencies declared in packages.yml / dependencies.yml (if any).
+deps:
+	uv run dbt deps
+
+# Regenerate the dbt manifest that resources/__init__.py reads at deploy time.
+# `dbt parse` does not connect to a warehouse; it only reads the project files.
+manifest: deps
+	uv run dbt parse --profiles-dir dbt_profiles
+
+validate:
+	databricks bundle validate
+
+# Regenerate the manifest and deploy the generated job to the dev target.
+deploy: manifest
+	databricks bundle deploy --target dev
+
+run:
+	databricks bundle run dbt_factory_job
+
+test:
+	uv run pytest tests
diff --git a/contrib/dbt_factory/NOTICE b/contrib/dbt_factory/NOTICE
@@ -0,0 +1,33 @@
+This project includes code adapted from the "databricks-dbt-factory" library.
+
+    Source:       https://github.com/mwojtyczka/databricks-dbt-factory
+    Full history: https://github.com/mwojtyczka/databricks-dbt-factory/commits/main
+    Adapted from: commit e767a9d865581226e4f144fb17b7a822df1ea1f4 (v0.2.1)
+    Location:     src/databricks_dbt_factory/
+
+The code under src/databricks_dbt_factory/ originates from that repository (reformatted to
+this repository's style; otherwise functionally unchanged) and is provided under the MIT
+license reproduced below, which this NOTICE preserves for attribution. All other files are
+part of the bundle-examples repository and are covered by that repository's license.
+
+--------------------------------------------------------------------------------
+
+MIT License
+
+Copyright (c) 2024-present mwojtyczka <wojtyczka.marcin@gmail.com>
+
+Permission is hereby granted, free of charge, to any person obtaining a copy of this
+software and associated documentation files (the "Software"), to deal in the Software
+without restriction, including without limitation the rights to use, copy, modify, merge,
+publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons
+to whom the Software is furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all copies or
+substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
+INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR
+PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE
+FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
+OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+DEALINGS IN THE SOFTWARE.
diff --git a/contrib/dbt_factory/README.md b/contrib/dbt_factory/README.md
@@ -0,0 +1,212 @@
+# dbt_factory
+
+This example runs a [dbt](https://docs.getdbt.com/) project on Databricks as a
+**Databricks Workflow with one task per dbt object** (model, seed, snapshot, test) instead of
+running the whole project as a single opaque task.
+
+It does this by combining two pieces:
+
+* **[databricks-dbt-factory](https://github.com/mwojtyczka/databricks-dbt-factory)** — a small
+  library that reads a dbt `manifest.json` and expands it into Databricks job tasks, wiring up
+  the dependencies between them. Its source is included under `src/databricks_dbt_factory/`
+  (see [`NOTICE`](NOTICE) for attribution and license).
+* **[PyDABs](https://docs.databricks.com/dev-tools/bundles/python)** — the Databricks Asset
+  Bundle Python resources hook. At `databricks bundle deploy` time the Databricks CLI calls
+  `load_resources` in [`resources/__init__.py`](resources/__init__.py), which runs the factory
+  against the manifest and returns the generated job.
+
+The result: **no per-model job YAML is checked in**. The task graph is generated on the fly from
+the dbt manifest each time you deploy.
+
+## Why one task per dbt object?
+
+By default dbt's integration with Databricks Workflows treats the whole project as a single
+task — a black box. Expanding it into one task per object gives:
+
+* **Faster execution** — independent models run in parallel, and the notebook task type keeps
+  dbt's dependencies pre-cached in the serverless environment, avoiding a cold start on every task.
+* **Visibility & simplified troubleshooting** — pinpoint and fix issues at the model level right
+  in the Databricks Workflows UI.
+* **Enhanced logging & notifications** — per-task logs and precise, model-level error alerts.
+* **Improved retriability** — retry only the failed model tasks without rerunning the whole project.
+* **Seamless testing** — dbt data tests run as their own tasks right after each model finishes,
+  for faster validation and feedback.
+
+This example uses **serverless compute** and the **notebook task type** (each task triggers dbt
+through a small runner notebook using the `dbtRunner` Python API) for the fastest task start
+times. See the [databricks-dbt-factory README](https://github.com/mwojtyczka/databricks-dbt-factory#benefits)
+for more.
+
+## How it works
+
+The [`dbt-factory` template](../templates/dbt-factory) scaffolds a self-contained project.
+From then on, each `databricks bundle deploy` regenerates the Workflow from your current dbt
+manifest — add or remove a model and the task graph follows on the next deploy, with no per-model
+YAML to maintain.
+
+```mermaid
+flowchart TD
+    subgraph setup["One-time setup"]
+      T["dbt-factory bundle template"] -->|databricks bundle init| B["Scaffolded project:<br/>dbt project + PyDABs hook + factory code"]
+      X["Existing dbt project<br/>(optional)"] -.->|move models/seeds/... into src/| B
+    end
+    subgraph deploy["Every deploy"]
+      C["make manifest<br/>(dbt parse)"] --> D["target/manifest.json"]
+      D --> E["databricks bundle deploy"]
+      E --> F["PyDABs load_resources reads the<br/>manifest and generates the job"]
+    end
+    subgraph runtime["At run time — serverless"]
+      G["Databricks Workflow:<br/>one task per model / seed / snapshot / test"] --> H["Each task triggers dbt<br/>via the runner notebook"]
+      H --> I[("SQL warehouse")]
+    end
+    B --> C
+    F --> G
+
+    classDef optional stroke:#999,stroke-dasharray:5 4,color:#888;
+    class X optional;
+```
+
+## Project structure
+
+```
+dbt_factory/
+├── databricks.yml              # Bundle definition; wires up the PyDABs `load_resources` hook
+├── dbt_project.yml             # dbt project (models under src/models, etc.)
+├── dbt_profiles/profiles.yml   # dbt profiles for the deployed job (dev / prod targets)
+├── profile_template.yml        # prompts for `dbt init` (local development)
+├── resources/__init__.py       # PyDABs glue: manifest -> generated job (the only integration code)
+├── src/
+│   ├── models/                 # your dbt models (example: orders_raw, orders_daily)
+│   └── databricks_dbt_factory/ # vendored factory library (unchanged; see NOTICE)
+├── target/manifest.json        # committed dbt manifest, read at deploy time (regenerate with `make manifest`)
+├── tests/                      # tests for the vendored factory + the PyDABs integration
+├── pyproject.toml              # dependencies (installed into .venv via `uv sync`)
+└── Makefile                    # convenience targets: setup, manifest, validate, deploy, run, test
+```
+
+## Setup
+
+1. Install the [Databricks CLI](https://docs.databricks.com/dev-tools/cli/databricks-cli.html)
+   and the [uv](https://docs.astral.sh/uv/) package manager.
+
+2. Authenticate to your Databricks workspace:
+   ```
+   $ databricks configure
+   ```
+
+3. Install dependencies into the `.venv` the bundle uses:
+   ```
+   $ make setup      # == uv sync --dev
+   ```
+
+4. Edit `dbt_profiles/profiles.yml` and set your SQL warehouse `http_path`, `catalog`, and
+   `schema`. Set the workspace host in `databricks.yml` (and the prod `root_path` / permissions).
+
+## The dbt manifest
+
+`resources/__init__.py` reads `target/manifest.json` at deploy time to build the task graph. A
+manifest is committed so the bundle deploys out of the box. **After you change your models,
+regenerate it:**
+
+```
+$ make manifest      # == uv run dbt deps && uv run dbt parse
+```
+
+`dbt parse` only reads your project files; it does not connect to a warehouse. The manifest
+location is configurable — point at a different file via the `DBT_MANIFEST_PATH` environment
+variable or by editing `MANIFEST_PATH` in `resources/__init__.py`.
+
+> **Large projects with many parallel tasks.** At runtime each task runs dbt from the shared
+> project directory and writes dbt's artifacts (`target/`, `logs/`) there, which can contend
+> under high parallelism. To avoid this, generate a `target/partial_parse.msgpack` (a local
+> `dbt parse` produces it next to the manifest) and ship it with the bundle — it's `.gitignore`d
+> by default, so force-add it (`git add -f target/partial_parse.msgpack`). The runner notebook
+> then routes each task's artifacts to a private temp dir and skips re-parsing. See the
+> databricks-dbt-factory README, "Faster parsing on large projects".
+
+## Deploy and run
+
+```
+$ databricks bundle deploy --target dev      # or: make deploy
+$ databricks bundle run dbt_factory_job       # or: make run
+```
+
+Open the run URL the CLI prints to watch the generated per-model task graph execute. Deploying
+in `dev` mode prefixes resources with `[dev your_name]` and pauses the daily schedule; deploy to
+`prod` with `--target prod`.
+
+## Configuring the generated job
+
+A few knobs are exposed as constants at the top of `resources/__init__.py`:
+
+* `BUNDLE_TESTS` — when `True`, single-model tests are bundled into one `dbt test` task per
+  resource (fewer task startups; faster for test-heavy projects). Default `False` (one task per
+  test node, for maximum per-test visibility).
+* `ENVIRONMENT_KEY` — the serverless environment key (default `Default`).
+* `EXTRA_DBT_COMMAND_OPTIONS` — extra options appended to every generated dbt command.
+
+The dbt target, warehouse, catalog, and schema are configured in `dbt_profiles/profiles.yml`
+and selected per bundle target via `--target ${bundle.target}`.
+
+## Migrating an existing dbt project
+
+Bring your own dbt project by **generating a fresh project from the template and moving your dbt
+files into it.** You don't touch dependencies, the vendored factory, or any paths — the generated
+project already ships all of that.
+
+1. Generate a new project (or copy this `dbt_factory` example):
+
+   ```
+   $ databricks bundle init https://github.com/databricks/bundle-examples --template-dir contrib/templates/dbt-factory
+   ```
+
+2. Remove the starter models and copy your dbt sources into the matching `src/` subdirectories:
+
+   ```
+   $ rm -r src/models/example
+   # Copy whichever of these your project has (skip the ones you don't use):
+   $ cp -R /path/to/your/dbt/models/*     src/models/
+   $ cp -R /path/to/your/dbt/seeds/*      src/seeds/
+   $ cp -R /path/to/your/dbt/snapshots/*  src/snapshots/
+   $ cp -R /path/to/your/dbt/macros/*     src/macros/
+   $ cp -R /path/to/your/dbt/tests/*      src/tests/
+   ```
+
+   The generated `dbt_project.yml` already points `model-paths`, `seed-paths`, etc. at these
+   `src/` folders, so your files are picked up as-is. Merge any model/seed configuration from your
+   own `dbt_project.yml` into the generated one (keep the generated `name`/`profile`), and remove
+   the leftover `models: dbt_factory: example:` block that referenced the deleted starter models —
+   otherwise `dbt parse` warns that those config paths don't apply to any resource. If you use dbt
+   packages, copy your `packages.yml` to the project root too.
+
+3. Point `dbt_profiles/profiles.yml` at your warehouse (`http_path`, `catalog`, `schema`). Leave
+   the `host`/`token` lines as they are — the runner notebook sets those at runtime.
+
+4. Generate the manifest and deploy:
+
+   ```
+   $ make setup
+   $ make manifest      # dbt parse -> target/manifest.json
+   $ databricks bundle deploy --target dev
+   ```
+
+That's the whole migration: no dependency wrangling and no path edits, because your project keeps
+the generated layout (dbt project at the bundle root, factory under `src/`). If you'd rather keep
+your project's existing directory structure instead of `src/`, edit the `*-paths` in
+`dbt_project.yml` to point at your folders — nothing else changes.
+
+## Tests
+
+```
+$ make test      # == uv run pytest tests
+```
+
+This runs the vendored factory's own test suite (proving the vendored core is intact) plus an
+offline test that exercises the PyDABs integration against the committed manifest — no workspace
+required.
+
+## Local development with dbt
+
+You can still develop the dbt project locally with the dbt CLI. Initialize your own profile with
+`dbt init` (see `profile_template.yml`), then use `dbt run`, `dbt test`, etc. as usual. See the
+[`dbt_sql`](../../dbt_sql) example for a more detailed local-dbt walkthrough.
diff --git a/contrib/dbt_factory/databricks.yml b/contrib/dbt_factory/databricks.yml
@@ -0,0 +1,39 @@
+# This is a Databricks Asset Bundle definition for dbt_factory.
+# See https://docs.databricks.com/dev-tools/bundles/index.html for documentation.
+#
+# The job for this bundle is NOT defined in YAML. Instead it is generated at deploy time
+# from the dbt manifest by resources/__init__.py (see the `python.resources` hook below),
+# producing one Databricks task per dbt object (model / seed / snapshot / test).
+bundle:
+  name: dbt_factory
+  uuid: 19ecc815-cff0-449c-91c1-e68239d49ccb
+
+# PyDABs: the Databricks CLI calls `load_resources` during `bundle deploy` to build resources
+# defined in Python. See resources/__init__.py.
+python:
+  venv_path: .venv
+  resources:
+    - "resources:load_resources"
+
+# Deployment targets. The dbt target is selected via `--target ${bundle.target}`, so these
+# names match the dbt outputs in dbt_profiles/profiles.yml.
+targets:
+  dev:
+    # The default target uses 'mode: development' to create a development copy.
+    # - Deployed resources get prefixed with '[dev my_user_name]'
+    # - Any job schedules and triggers are paused by default.
+    # See also https://docs.databricks.com/dev-tools/bundles/deployment-modes.html.
+    mode: development
+    default: true
+    workspace:
+      host: https://company.databricks.com
+
+  prod:
+    mode: production
+    workspace:
+      host: https://company.databricks.com
+      # We explicitly deploy to /Workspace/Users/user@company.com to make sure we only have a single copy.
+      root_path: /Workspace/Users/user@company.com/.bundle/${bundle.name}/${bundle.target}
+    permissions:
+      - user_name: user@company.com
+        level: CAN_MANAGE
diff --git a/contrib/dbt_factory/dbt_profiles/profiles.yml b/contrib/dbt_factory/dbt_profiles/profiles.yml
@@ -0,0 +1,37 @@
+
+# This file defines dbt profiles for deployed dbt jobs.
+# The generated Databricks job selects the target via `--target ${bundle.target}`,
+# so the dbt target names below (dev / prod) match the bundle targets in databricks.yml.
+dbt_factory:
+  target: dev # default target
+  outputs:
+
+    # Doing local development with the dbt CLI?
+    # Then you should create your own profile in your .dbt/profiles.yml using 'dbt init'
+    # (See README.md)
+
+    # The default target when deployed with the Databricks CLI
+    dev:
+      type: databricks
+      method: http
+      catalog: catalog
+      schema: default
+
+      http_path: /sql/1.0/warehouses/abcdef1234567890
+
+      # The workspace host / token are provided by the runner notebook at runtime
+      # (see src/databricks_dbt_factory/notebook/run_dbt_command.py).
+      host: "{{ env_var('DBT_HOST', '') }}"
+      token: "{{ env_var('DBT_ACCESS_TOKEN', '') }}"
+
+    # The production target when deployed with the Databricks CLI
+    prod:
+      type: databricks
+      method: http
+      catalog: catalog
+      schema: default
+
+      http_path: /sql/1.0/warehouses/abcdef1234567890
+
+      host: "{{ env_var('DBT_HOST', '') }}"
+      token: "{{ env_var('DBT_ACCESS_TOKEN', '') }}"
diff --git a/contrib/dbt_factory/dbt_project.yml b/contrib/dbt_factory/dbt_project.yml
@@ -0,0 +1,28 @@
+name: 'dbt_factory'
+version: '1.0.0'
+config-version: 2
+
+# This setting configures which "profile" dbt uses for this project.
+profile: 'dbt_factory'
+
+# These configurations specify where dbt should look for different types of files.
+# Everything dbt-related lives under src/ so the project can also hold non-dbt resources
+# (such as the vendored databricks_dbt_factory library under src/databricks_dbt_factory).
+model-paths: ["src/models"]
+analysis-paths: ["src/analyses"]
+test-paths: ["src/tests"]
+seed-paths: ["src/seeds"]
+macro-paths: ["src/macros"]
+snapshot-paths: ["src/snapshots"]
+
+clean-targets: # directories to be removed by `dbt clean`
+  - "target"
+  - "dbt_packages"
+
+# In this example config, we tell dbt to build all models in the example/
+# directory as views by default. These settings can be overridden in the
+# individual model files using the `{{ config(...) }}` macro.
+models:
+  dbt_factory:
+    example:
+      +materialized: view