Skip to content

dn00/ai-workflow-engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ai-workflow-engine

Deterministic AI Workflow Automation

A production-shaped AI workflow automation system that converts unstructured requests into safe, replayable, auditable actions. The core thesis: LLMs propose, deterministic code decides — LLM output is treated as untrusted input, gated by validation, policy rules, and human review before any side effects execute. Supports multiple workflow types including access requests, invoice processing, and FHIR R4 prior authorization.

Quick Start

Prerequisites: Python >= 3.11, uv, Claude Code CLI (optional, for real LLM calls)

# Install dependencies
make install

# Start the demo server (uses Claude CLI for LLM extraction)
make demo

# Or run with mock LLM (no Claude CLI needed)
uvicorn app.main:app --host 127.0.0.1 --port 8000

The server starts at http://127.0.0.1:8000. Open http://127.0.0.1:8000/ui/intake for the web UI, or http://127.0.0.1:8000/docs for the interactive API docs.

How It Works

A plain-text access request flows through a deterministic pipeline:

User submits natural language request
    |
    v
LLM extracts structured proposal (stored as receipt before parsing)
    |
    v
Validate: required fields, system allowlist, forbidden systems
    |
    +--> validation failure --> terminal (no effects)
    |
    v
Policy gate: approve / review_required / reject
    |
    +--> rejected --> terminal (no effects)
    |
    +--> review_required --> pause for human --> approve or reject
    |
    +--> approved --> execute simulated effect --> completed

Every step emits an event to an append-only log. The current state is derived by folding events through a reducer — enabling deterministic replay and audit.

Demo Scenarios

Access Request

Scenario Input Outcome
Auto-approve Single low-risk system (e.g. Confluence), manager present Policy approves, effect simulated
Human review High urgency or known system (e.g. AWS) Pauses at review_required, resumes after human decision
Rejection Forbidden system (e.g. production_db) Blocked at validation, no effects
Replay Any completed run Reconstructs from events, verifies projection matches

Prior Authorization (FHIR R4)

Scenario Input Outcome
Routine imaging MRI knee, known payer, valid ICD-10/CPT, conservative treatment documented Auto-approved, FHIR Claim + ClaimResponse generated
High-cost surgery Total knee arthroplasty, medical necessity documented review_required → clinical reviewer decides
Emergent bypass STEMI, emergent cardiac catheterization Auto-approved regardless of other factors
Missing necessity Surgery request, no prior treatments documented review_required (missing_medical_necessity)
Invalid codes Malformed ICD-10 or CPT codes Rejected at validation

Sample clinical notes are in data/prior_auth/. The LLM extracts structured clinical facts (ICD-10 diagnoses, CPT procedures, medical necessity justification) from unstructured referral text, then deterministic validation and policy rules decide the authorization outcome.

See docs/demo-script.md for step-by-step API and UI walkthroughs.

Architecture

Four layers with strict downward dependencies:

Layer Responsibility Key modules
4 — API / UX HTTP interface FastAPI REST (6 endpoints), Jinja2 web UI
3 — Runtime Orchestration LocalRunner, LLM + effect adapters
2 — Domain Pure logic Parser, validator, normalizer, policy engine, reducer
1 — Persistence Storage SQLite, abstract repositories, append-only events

Layer 2 is pure — no I/O, no side effects. Layer 3 orchestrates I/O around Layer 2 calls. Layer 4 is a thin shell. All interfaces are abstract, so any layer can be swapped (e.g. Postgres, real IAM effects, Temporal runner).

See docs/architecture.md for the full design including state machine, event model, data flow, and extension seams.

API

Method Path Purpose
POST /runs/ Create a new workflow run
GET /runs/{id} Get run summary with projection
GET /runs/{id}/events List events for a run
POST /runs/{id}/review Submit review decision
POST /runs/{id}/replay Replay a completed run
GET /runs/{id}/bundle Export replay bundle

Testing

make test     # uv run --extra dev pytest -q
make eval     # deterministic workflow golden-case evals
make lint     # uv run --extra dev ruff check + format check
make format   # uv run --extra dev ruff format + fix

The current verified baseline is 624 passed, 1 warning using make test. The current eval baseline is 17/17 passed using make eval.

Project Structure

app/
  main.py             # FastAPI app factory + lifespan DI
  api/                # REST endpoints + Pydantic schemas
  web/                # Jinja2 web UI routes
  templates/          # HTML templates
  core/               # Shared kernel: models, enums, reducer, replay, runners
  workflows/          # Workflow modules (access_request, invoice_intake, invoice_exception, prior_auth)
  effects/            # Effect adapters (simulated)
  llm/                # LLM adapters (mock, CLI via claude -p)
  retrieval/          # Document loading, chunking, retrieval, prompt context
  db/                 # SQLite persistence + abstract repositories
tests/
  unit/               # Domain logic tests (parser, validator, policy, reducer)
  integration/        # API + web integration tests (full HTTP lifecycle)
scripts/
  call-claude.py      # Subprocess wrapper for claude -p
  export_bundle.py    # CLI bundle export

Bundle Export

Export a replay bundle for any completed run:

make export-bundle RUN_ID=<run-id>

About

An extensible AI workflow system that turns unstructured access requests into validated, reviewable, and auditable workflows. LLM outputs are treated as untrusted input and gated through schema validation, policy checks, audit logging, and human review before execution.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors