Microsoft + NVIDIA Hybrid Multi-Agent Workflow

A production-grade reference implementation demonstrating a hybrid multi-agent workflow combining Microsoft Foundry Agent Service with NVIDIA NIM microservices on Azure Container Apps with serverless GPUs.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        User / Demo Runner                       │
│                       (run_demo.py)                             │
└──────────────┬──────────────────────────────────────────────────┘
               │  "Extract line items and flag anomalies"
               ▼
┌──────────────────────────────┐    W3C traceparent    ┌──────────────────────────────┐
│    Coordinator / Orchestrator │ ──── POST /parse ───▶ │   GPU Parse Specialist Agent  │
│                              │                       │                              │
│  • OpenAI Responses API      │                       │  • FastAPI on ACA + GPU      │
│  • Tool: parse_invoice       │                       │  • Nemotron Parse NIM        │
│  • Microsoft Foundry Agent   │ ◀── JSON response ── │  • Normalize + Anomaly       │
│  • Audit logging             │                       │  • Audit logging             │
└──────────────────────────────┘                       └──────────────────────────────┘
               │                                                    │
               └────────────────┬───────────────────────────────────┘
                                ▼
                    ┌───────────────────────┐
                    │   OpenTelemetry OTLP  │
                    │   (Jaeger / Aspire /  │
                    │    Azure Monitor)     │
                    └───────────────────────┘

Demo Scenario

User prompt: "Here's a PDF URL. Extract the line items into JSON and tell me if anything looks off."

The orchestrator delegates structured extraction to the GPU parse specialist, which:

Fetches the PDF
Extracts invoice data via NVIDIA Nemotron Parse NIM (or mock mode)
Normalizes the data
Runs anomaly detection (subtotal mismatch, price outliers, missing fields)
Returns structured JSON with warnings

Quick Start

Prerequisites

Python 3.10+
pip

1. Install Dependencies

pip install -r requirements.txt

2. Generate Sample Invoices

python sample_data/generate_sample_invoice_pdf.py

3. Serve Sample PDFs (in a separate terminal)

cd sample_data && python -m http.server 8000

4. Run the Demo

python run_demo.py

This runs in direct mode — deterministic, no OpenAI API key required.

5. Run with OpenAI Agent (optional)

cp .env.example .env
# Edit .env and set OPENAI_API_KEY
python run_demo.py --mode agent

Configuration

All settings via environment variables (see .env.example):

Variable	Description	Default
`OPENAI_API_KEY`	OpenAI API key (agent mode only)	—
`OPENAI_MODEL`	Model to use	`gpt-4o`
`PARSER_URL`	Parser service URL	`http://localhost:8001`
`PARSER_API_KEY`	API key for parser auth	`demo-api-key-change-me`
`NIM_ENDPOINT`	NVIDIA NIM endpoint (blank = mock)	—
`NIM_MODEL_ID`	NIM model identifier	`nvidia/nemotron-parse`
`NIM_API_KEY`	NIM API key	—
`ACA_GPU_SKU`	GPU SKU for ACA deployment	`A10`
`OTEL_EXPORTER_OTLP_ENDPOINT`	OTLP collector endpoint	—
`LOG_LEVEL`	Logging level	`INFO`
`DEMO_PDF_URL`	PDF URL for demo	`http://localhost:8000/sample_invoice_anomaly.pdf`

Project Structure

├── run_demo.py                    # One-command demo entry point
├── requirements.txt               # Python dependencies
├── .env.example                   # Environment template
├── Dockerfile                     # Parser service container
├── orchestrator/
│   ├── agent.py                   # Coordinator agent (OpenAI Responses API)
│   ├── config.py                  # Environment configuration
│   ├── telemetry.py               # OpenTelemetry setup
│   └── audit.py                   # Structured audit logging
├── parser_service/
│   ├── main.py                    # FastAPI application
│   ├── config.py                  # Environment configuration
│   ├── models.py                  # Pydantic data models
│   ├── nim_client.py              # Nemotron Parse NIM client + mock
│   ├── normalizer.py              # Data normalization
│   ├── anomaly.py                 # Anomaly detection rules
│   ├── telemetry.py               # OpenTelemetry setup
│   └── audit.py                   # Structured audit logging
├── sample_data/
│   └── generate_sample_invoice_pdf.py  # Deterministic PDF generator
├── deploy/
│   └── aca_deploy.sh              # Azure Container Apps deployment
└── docs/
    ├── deploying_models_foundry.md # Azure AI Foundry model deployment guide
    ├── deploying_to_aca.md         # Full ACA deployment guide
    ├── demo_script.md              # 2-minute conference talk track
    └── troubleshooting.md          # Common failures

Deployment Guides

Guide	Description
Deploying Models with Azure AI Foundry	Set up GPT-4o and Nemotron Parse models
Deploying to Azure Container Apps	Full ACA deployment with GPU support, CI/CD, monitoring

ACA Deployment

Build and Push

# Login to ACR
az acr login --name <your-acr>

# Build
docker build -t <your-acr>.azurecr.io/gpu-parse-agent:latest .

# Push
docker push <your-acr>.azurecr.io/gpu-parse-agent:latest

Deploy with Script

chmod +x deploy/aca_deploy.sh
./deploy/aca_deploy.sh

Manual GPU Configuration

# Add GPU workload profile
az containerapp env workload-profile add \
  --name multi-agent-env \
  --resource-group rg-multi-agent-demo \
  --workload-profile-name gpu-profile \
  --workload-profile-type NC24-A100 \
  --min-nodes 0 --max-nodes 1

# Assign app to GPU profile
az containerapp update \
  --name gpu-parse-agent \
  --resource-group rg-multi-agent-demo \
  --workload-profile-name gpu-profile

Observability

Tracing: W3C traceparent header propagated end-to-end
Correlation: X-Request-Id header on all requests
OTLP Export: Set OTEL_EXPORTER_OTLP_ENDPOINT to your collector
Spans: orchestrator.handle_request → orchestrator.call_parser → parser.handle_parse → parser.fetch_pdf → parser.call_nim → parser.normalize → parser.anomaly_checks

Anomaly Rules

#	Rule	Code
1	`subtotal ≠ sum(line_items.amount)`	`SUBTOTAL_MISMATCH`
2	`unit_price > 5× median price`	`PRICE_OUTLIER`
3	Missing vendor, date, or total	`MISSING_FIELDS`

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.vscode		.vscode
deploy		deploy
docs		docs
javaorchestrator		javaorchestrator
orchestrator		orchestrator
parser_service		parser_service
prompts		prompts
sample_data		sample_data
.acrignore		.acrignore
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Build Secure, Observable, Production-ready Agents with a Control Plane.pdf		Build Secure, Observable, Production-ready Agents with a Control Plane.pdf
Dockerfile		Dockerfile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_demo.py		run_demo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Microsoft + NVIDIA Hybrid Multi-Agent Workflow

Architecture

Demo Scenario

Quick Start

Prerequisites

1. Install Dependencies

2. Generate Sample Invoices

3. Serve Sample PDFs (in a separate terminal)

4. Run the Demo

5. Run with OpenAI Agent (optional)

Configuration

Project Structure

Deployment Guides

ACA Deployment

Build and Push

Deploy with Script

Manual GPU Configuration

Observability

Anomaly Rules

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Microsoft + NVIDIA Hybrid Multi-Agent Workflow

Architecture

Demo Scenario

Quick Start

Prerequisites

1. Install Dependencies

2. Generate Sample Invoices

3. Serve Sample PDFs (in a separate terminal)

4. Run the Demo

5. Run with OpenAI Agent (optional)

Configuration

Project Structure

Deployment Guides

ACA Deployment

Build and Push

Deploy with Script

Manual GPU Configuration

Observability

Anomaly Rules

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages