Skip to content

mlsys-io/Lumilake

Repository files navigation

Lumilake

License Python 3.12+ Lint Tests

Lumilake is a data analytics engine for agentic workflows. It accepts workflow specs (native graph JSON, YAML, or n8n JSON), optimizes the runtime graph with HALO, and dispatches tasks through FlowMesh.

What Lumilake Provides

  • Workflow parsing for native graph specs, YAML workflows, and n8n exports.
  • HALO scheduling for multi-step AI and data workflows.
  • A FastAPI server for job submission, status, cancellation, results, workers, and traces.
  • A CLI and Python SDK for local deployment and server API access.
  • Data access routed exclusively through lumid-data-app — DataRetrievalOp in sql, s3, and agent modes all dispatch via LUMID_DATA_URL.
  • Shared hook integration through lumid-hooks, plus Lumilake-owned optimizer plugins.

Install

From PyPI:

pip install "lumilake[cli]"

From a source checkout:

uv sync --all-packages --all-extras --all-groups

The PyPI lumilake distribution is a code-free metapackage; install one of the extras below to get a working set. The server runtime is published as a Docker image only and is intentionally not on PyPI.

Extra Includes
sdk Python SDK HTTP clients (lumilake-sdk → module lumilake)
cli lumilake command line interface plus deploy lifecycle (lumilake-cli + lumilake-deploy)
deploy Local Docker / FlowMesh deployment helpers (lumilake-deploy)
hook Resource-kind helpers for shared hook integrations (lumilake-hook)
all Everything above.

Quick Start

The server runs as the published Docker image. lumilake deploy reads its env files from --project-dir (or the current working directory). Either point at a deployment directory with -C / --project-dir, or cd to it first.

mkdir -p ~/lumilake-deploy
lumilake deploy -C ~/lumilake-deploy init --flowmesh   # ~/lumilake-deploy/.env + .env.flowmesh
$EDITOR ~/lumilake-deploy/.env                          # fill in LUMID_DATA_URL / S3_*_PREFIX / model keys
lumilake deploy -C ~/lumilake-deploy pull               # fetch ghcr.io/mlsys-io/lumilake_server:<tag>
lumilake deploy -C ~/lumilake-deploy up                 # bring the stack up via docker compose
lumilake deploy -C ~/lumilake-deploy purge dev --dry-run # preview cleanup of one local image tag

LUMILAKE_DEPLOY_DIR=~/lumilake-deploy is an equivalent override. The deployment directory only needs to hold your .env files (and any local state docker compose creates) — the compose file and server image are resolved from the installed lumilake-deploy package and GHCR. The server listens on http://127.0.0.1:9000 by default — open /docs for the API browser.

Note: a real workflow run requires a reachable lumid-data-app instance — LUMID_DATA_URL (and optionally LUMID_DATA_TOKEN) must be set, since every DataRetrievalOp (sql, s3, and agent modes) routes through it. See docs/ENV.md for the env contract and docs/E2E_DEMO.md for the full three-step demo flow (bring up lumid-data-app → load demo data → run a workflow).

Hello world

The repo ships a hello-world.yaml template — FormatOpLambdaOpLLMChatOp — that is the smallest copy-paste starting point for a Lumilake YAML workflow. Submit it once the stack is up and LUMID_DATA_URL points at a reachable lumid-data-app instance.

# From a source checkout:
uv run lumilake job submit examples/templates/yaml/hello-world.yaml \
    --format yaml --input 'Name=world' --output-prefix demo/hello-world

# From a PyPI install (download the template alongside lumilake):
curl -O https://raw.githubusercontent.com/mlsys-io/lumilake_OSS/main/examples/templates/yaml/hello-world.yaml
lumilake job submit hello-world.yaml \
    --format yaml --input 'Name=world' --output-prefix demo/hello-world
lumilake job watch <job_id>
lumilake job result <job_id>

The template uses Qwen/Qwen3-8B, which is the bundled text-demo model. You do not need to pre-populate or inspect cached_models before the first run; it can be empty until after a worker serves a job. Only edit config.model if your FlowMesh stack is configured for a different model or the job fails with a missing-model / worker-placement error.

Real workflows

Submit and inspect a workflow. From a source checkout the example workflow file is at examples/templates/yaml/trading-agent.yaml; PyPI installs do not ship the templates, so pass an absolute path to a workflow file you have locally:

# From a source checkout:
uv run lumilake job submit examples/templates/yaml/trading-agent.yaml \
    --format yaml --input 'Stock=NVDA,AAPL,MSFT' --output-prefix demo/trading-agent

# From a PyPI install (lumilake on PATH; supply your own workflow file):
lumilake job submit /path/to/your/workflow.yaml \
    --format yaml --input 'Stock=NVDA,AAPL,MSFT' --output-prefix demo/trading-agent

lumilake job list
lumilake job watch <job_id>

lumilake deploy up writes ~/.lumilake/config.toml so subsequent calls find the local server automatically. For remote / hosted servers, set LUMILAKE_BASE_URL instead.

See docs/E2E_DEMO.md for a full reproduction using the bundled demo workflows and dataset.

Data Access

All DataRetrievalOp modes (sql, s3, and agent) route through lumid-data-app via the type: lumid FlowMesh connector. LUMID_DATA_URL (and, for authenticated deployments, LUMID_DATA_TOKEN) gate every retrieval. S3_DATA_PREFIX is a logical blob-key prefix in lumid-data-app's store used as the base for S3-input key expansion.

Job records and runtime artifacts are written under S3_ARCHIVE_PREFIX, also resolved against lumid-data-app's blob store via the same LUMID_DATA_URL / LUMID_DATA_TOKEN.

Deployment

Examples below assume you're set up with LUMILAKE_DEPLOY_DIR=~/lumilake-deploy (or pass -C ~/lumilake-deploy explicitly). Workspace-checkout users can prefix the commands with uv run; PyPI-install users invoke lumilake directly.

Generate .env from the bundled template:

lumilake deploy init

Generate both Lumilake and bundled FlowMesh env files:

lumilake deploy init --flowmesh

If another FlowMesh stack is already running on the same host, check ports before deploy up. Common co-tenant FlowMesh defaults are HTTP 8000, gRPC 50051, Redis control 6379, and Redis telemetry 6380. The bundled stack reads SERVER_HTTP_PORT, SERVER_GRPC_PORT, REDIS_CONTROL_PORT, and REDIS_TELEMETRY_PORT from .env.flowmesh; change them to free ports and keep LUMILAKE_RUNTIME_ORCHESTRATOR_URL in .env aligned with SERVER_HTTP_PORT.

Common deployment commands:

lumilake deploy doctor
lumilake deploy pull         # or `build` to compile from source
lumilake deploy up
lumilake deploy status
lumilake deploy logs server --tail 200
lumilake deploy restart server
lumilake deploy down
lumilake deploy clean

Use deploy down to stop services while keeping data volumes (non-destructive). Use deploy clean or deploy reset only when you want to remove local stack state; both delete every Lumilake-managed volume, and reset prompts for confirmation by default (--yes skips the prompt).

Python SDK

from lumilake import LumilakeClient

with LumilakeClient(base_url="http://127.0.0.1:9000") as client:
    print(client.health())
    print(client.jobs.list())

Install the SDK extra for HTTP clients:

pip install "lumilake[sdk]"

Install deploy support as well if you want client.deploy.* methods:

pip install "lumilake[sdk,deploy]"

See docs/SDK.md for the SDK resource map.

Documentation

  • docs/ENV.md - environment variables and data-plane modes.
  • docs/CLI.md - command groups and common CLI usage.
  • docs/WORKFLOWS.md - workflow input formats and YAML structure.
  • docs/OPS.md - built-in operation classes.
  • docs/SDK.md - sync and async Python client usage.
  • docs/API.md - server route overview and response shape.
  • docs/ARCHITECTURE.md - module layout and runtime flow.
  • docs/PLUGINS.md - shared hooks and Lumilake plugin model.
  • docs/CODE_STYLE.md - coding rules for contributors and agents.

Plugins

Lumilake wires shared hook protocols from lumid-hooks for identity, permissions, resource registration, submission guards, and usage sinks. Optimizer registration remains Lumilake-specific.

A minimal in-memory plugin is available under examples/plugins/simple_plugin/.

Repository Layout

.
├── src/lumilake_server/       # server runtime — image-only, not on PyPI
├── packages/sdk/              # `lumilake-sdk` → module `lumilake` (Client, envs, log)
├── packages/cli/              # `lumilake-cli` → `lumilake_cli` (Typer entry point)
├── packages/deploy/           # `lumilake-deploy` — packaged compose + .env.example assets
├── packages/hook/             # `lumilake-hook` → `lumilake_hook` (resource-kind helpers)
├── examples/                  # workflow templates and sample plugins
├── tests/                     # pytest suite
├── scripts/                   # CI and developer helpers
├── Dockerfile                 # builds ghcr.io/mlsys-io/lumilake_server
├── .env.example -> packages/deploy/.../assets/.env.example   # symlink for editors
├── uv.lock
└── pyproject.toml             # metapackage (`lumilake`) with [sdk]/[cli]/[deploy]/[hook]/[all] extras

Development

uv sync --group lint --group test --extra cli
uv run pre-commit install --install-hooks -t pre-commit -t prepare-commit-msg -t commit-msg
uv run pre-commit run --all-files
uv run pytest tests/

After changing dependencies, run:

uv lock

See CONTRIBUTING.md for PR title format, CI workflows, DCO sign-off, dependency guidance, and local testing notes.

License

Apache-2.0. See LICENSE.

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages