SHIPIT Agent

A clean, powerful, open-source Python runtime for building tool-using AI agents.

One consistent API over every major LLM provider — with tools, skills, memory, MCP, a rule-based permission layer, prompt caching, deep multi-agent orchestration, RAG, and structured streaming events.

📖 Documentation · 📦 PyPI · Quick start · Changelog · Security

What is SHIPIT Agent?

SHIPIT Agent is a small, explicit runtime for building production agents in Python. You bring an LLM; the runtime gives you the loop around it — tool calling, retries, streaming, memory, sessions, permissions, and cost tracking — plus a deep library of batteries (40+ built-in tools, 17 SaaS connectors, RAG, multi-agent orchestration, browser automation).

It is provider-agnostic by design: the same agent code runs on OpenAI, Anthropic, AWS Bedrock, Google Vertex/Gemini, Groq, Together, Ollama, or any of 100+ models through LiteLLM. Swap the model in one line — nothing else changes.

from shipit_agent import Agent
from shipit_agent.llms import build_llm_from_env

agent = Agent.with_builtins(llm=build_llm_from_env())   # any provider
print(agent.run("Find every TODO in this repo and summarize them.").output)

The only hard dependency is pydantic. Everything else (a provider SDK, Playwright, a vector store) is an optional extra you install when you need it. Python 3.11+ · MIT · 1850+ tests.

Highlights

🤖 The Agent — one runtime: tool calling, retries, parallel tools, context compaction, and a final-answer guarantee. Agent.with_builtins() ships the full tool catalogue.
🔌 Any LLM — OpenAI · Anthropic · Bedrock · Vertex · Gemini · Groq · Together · Ollama · OpenRouter · 100+ via LiteLLM. Native adapters where it matters, one interface everywhere.
🛡️ Control plane — a fast, rule-based permission engine (allow/deny/ask), plan mode (read-only research before acting), and hooks that can block or rewrite any tool call.
⚡ Prompt caching — cross-provider cache-read accounting (Anthropic/Bedrock/Vertex cache_control + OpenAI automatic caching) so repeated calls bill at a fraction of the cost.
🧰 Tools & connectors — 40+ built-in tools (bash, SQL, files, web search, code execution, vision, PDF…) and 17 SaaS connectors (GitHub, Slack, Gmail, Jira, Salesforce, Stripe…).
🔗 MCP — connect Model Context Protocol servers over stdio, HTTP, or a persistent subprocess.
🧠 Deep agents — GoalAgent, ReflectiveAgent, Supervisor/Worker, ShipCrew, and the create_deep_agent() factory for autonomous, multi-step, multi-agent work.
📚 Super RAG — hybrid vector + BM25 search with auto-cited sources and pluggable backends (Chroma, Qdrant, pgvector).
🚀 Autopilot — long-running autonomous loops with a critic, artifacts, fan-out, and a scheduler.
🖥️ Computer use — drive a real browser via screenshots + a vision model (works in Jupyter).
📊 Production-ready — sessions, memory consolidation, structured output with validation-retry, streaming events (+ SSE/WebSocket packets), tracing (file/OTel/LangSmith), and budgets.

Installation

Requirements: Python 3.11+ (3.11 – 3.14 supported). The only hard dependency is pydantic; provider SDKs and heavier features are opt-in extras.

From PyPI (recommended)

pip install shipit-agent

Optional extras

Install only what you need — each extra pulls in the relevant third-party packages:

Extra	Installs	For
`openai`	`openai`	OpenAI / OpenAI-compatible
`anthropic`	`anthropic`	native Anthropic (Claude)
`bedrock`	`boto3`	AWS Bedrock
`google`	`google-generativeai`	Gemini
`groq` / `together` / `ollama`	provider SDK	Groq / Together / Ollama
`litellm`	`litellm`	100+ models via one interface
`playwright`	`playwright`	browser automation / computer use
`pdf`	`pypdf`	the PDF tool
`sql`	`sqlalchemy`	the SQL tool (add your own driver)
`rag-chroma` / `rag-qdrant` / `rag-pgvector`	vector store	RAG backends
`rag-openai` / `rag-cohere` / `rag-sentence-transformers`	embedder	RAG embeddings
`otel` / `langsmith`	exporters	tracing
`all`	everything	kitchen sink

pip install "shipit-agent[anthropic]"        # one provider
pip install "shipit-agent[anthropic,playwright,rag-chroma]"   # combine
pip install "shipit-agent[all]"              # everything

Browser automation / computer use also needs the Chromium binary:
pip install "shipit-agent[playwright]" && playwright install chromium

From source (development)

git clone https://github.com/shipiit/shipit_agent.git
cd shipit_agent
pip install -e ".[dev]"     # editable install with test/docs tooling
pytest -q                   # 1850+ tests
ruff check .

Alternatives: pip install . (non-editable), pip install -r requirements.txt, or poetry install.

Verify

import shipit_agent
print(shipit_agent.__version__)

Notebook tip: if imports look out of date, your kernel may be using an older globally installed copy. Run pip install -U shipit-agent (or pip install -e . from the repo) in the kernel's environment.

Environment setup

The fastest way to choose a model is environment variables — copy .env.example to .env and fill in what you use:

# Pick the provider; build_llm_from_env() reads these:
SHIPIT_LLM_PROVIDER=bedrock            # openai | anthropic | bedrock | vertex | gemini | groq | together | ollama | litellm

# …then the provider's own credentials, e.g.:
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
AWS_REGION_NAME=us-east-1             # Bedrock uses your AWS region / profile (no key)
SHIPIT_BEDROCK_MODEL=bedrock/us.anthropic.claude-3-5-sonnet-20240620-v1:0
GEMINI_API_KEY=...
GROQ_API_KEY=...
SHIPIT_LITELLM_MODEL=openrouter/openai/gpt-4o-mini    # for the litellm provider

from shipit_agent import Agent
from shipit_agent.llms import build_llm_from_env

agent = Agent.with_builtins(llm=build_llm_from_env())   # reads SHIPIT_LLM_PROVIDER + creds
print(agent.run("Hello, who are you?").output)

Run diagnostics any time with agent.doctor() to check provider config, credentials, and tools.

Use any LLM provider

The agent never cares which model it talks to. Configure once via env, or instantiate an adapter directly:

from shipit_agent.llms import (
    build_llm_from_env, OpenAIChatLLM, AnthropicChatLLM,
    BedrockChatLLM, GeminiChatLLM, GroqChatLLM, LiteLLMChatLLM,
)

llm = build_llm_from_env("bedrock")                       # env-driven (prod)
llm = OpenAIChatLLM(model="gpt-4o")                       # native OpenAI
llm = AnthropicChatLLM(model="claude-opus-4-1")           # native Anthropic
llm = BedrockChatLLM(model="bedrock/us.meta.llama4-maverick-17b-instruct-v1:0")
llm = GeminiChatLLM(model="gemini/gemini-2.0-flash")
llm = GroqChatLLM(model="groq/llama-3.3-70b-versatile")
llm = LiteLLMChatLLM(model="together_ai/meta-llama/Llama-3.1-70B-Instruct-Turbo")

Provider	Adapter	Env (`SHIPIT_LLM_PROVIDER=`)	Auth
OpenAI	`OpenAIChatLLM`	`openai`	`OPENAI_API_KEY`
Anthropic	`AnthropicChatLLM`	`anthropic`	`ANTHROPIC_API_KEY`
AWS Bedrock	`BedrockChatLLM`	`bedrock`	AWS region / profile
Google Vertex	`VertexAIChatLLM`	`vertex`	service-account JSON
Gemini	`GeminiChatLLM`	`gemini`	`GEMINI_API_KEY`
Groq	`GroqChatLLM`	`groq`	`GROQ_API_KEY`
Together	`TogetherChatLLM`	`together`	`TOGETHERAI_API_KEY`
Ollama (local)	`OllamaChatLLM`	`ollama`	—
LiteLLM / OpenRouter	`LiteLLMChatLLM` / `LiteLLMProxyChatLLM`	`litellm`	per provider

Core building blocks

Custom tools

Wrap any Python callable — the agent reads its signature and calls it when useful:

from shipit_agent import Agent, FunctionTool

def get_weather(city: str) -> str:
    """Current weather for a city."""
    return f"{city}: 22°C, clear"

agent = Agent.with_builtins(llm=llm, tools=[FunctionTool.from_callable(get_weather)])
agent.run("What's the weather in Tokyo — umbrella?")

Skills

Reusable behaviour templates that shape how the agent thinks and which tools it reaches for:

agent = Agent.with_builtins(
    llm=llm,
    skills=["code-workflow-assistant", "database-architect"],
    auto_use_skills=True,      # also match skills from the prompt
)

Sessions & memory

session = agent.chat_session(session_id="user-42")
session.send("My name is Ada. I build compilers.")
session.send("What was my name again?")          # → remembers across turns

Persist across processes with FileSessionStore, and distill conversations into durable facts with MemoryConsolidator.

Structured output

from pydantic import BaseModel

class Ticket(BaseModel):
    title: str; priority: str; tags: list[str]

result = agent.run("Triage: 'login broken on Safari'", output_schema=Ticket)
print(result.parsed)            # validated; auto-retries inside the same conversation

Streaming

for event in agent.stream("Write a haiku about shipping code"):
    if event.type == "text_delta":
        print(event.payload["chunk"], end="", flush=True)
    elif event.type == "tool_called":
        print("→", event.payload["name"])

Events also serialize to ready-made SSE / WebSocket packets for web UIs.

The control plane

A Claude Code-style safety layer — rule-based, no extra LLM call.

from shipit_agent import Agent, PermissionEngine

agent = Agent.with_builtins(
    llm=llm,
    permissions=PermissionEngine(
        deny=["bash", "*_delete"],   # never run these
        ask=["sql"],                 # require approval
        allow=["read*", "grep*"],    # always fine
    ),
)

# Read-only "plan mode" — research and propose, take no action:
plan = agent.plan("Migrate the billing schema to multi-tenant.").output

Modes: default, acceptEdits, plan, bypass.
permission_callback(name, args) for programmatic human-in-the-loop approval.
Blocking hooks — before_tool hooks can deny a call or rewrite its arguments; on_user_prompt can redact prompts:

@hooks.on_before_tool
def guard(name, args):
    if name == "bash" and "rm -rf" in args.get("command", ""):
        return {"decision": "deny", "reason": "destructive command"}

Performance: prompt caching

The runtime rebuilds the same system prompt + tool schemas each turn — the ideal cacheable prefix.

from shipit_agent.llms import AnthropicChatLLM
llm = AnthropicChatLLM("claude-opus-4-1", prompt_caching=True)   # default on for Claude

cache_control breakpoints are placed on tools + system prompt; responses surface cache_read_input_tokens / cache_creation_input_tokens, which flow into CostTracker. Caching spans Anthropic, Bedrock, Vertex (cache_control) and OpenAI (automatic) — cache reads bill at ~10% of input.

Deep agents & orchestration

from shipit_agent import create_deep_agent, Goal

# Autonomous goal decomposition with a planner / explorer / coder / verifier loop:
agent = create_deep_agent(llm=llm, tools=[...])
result = agent.run(Goal(objective="Build and test a REST API for todos"))

GoalAgent (decompose → execute), ReflectiveAgent (self-improve to a quality bar), Supervisor + Worker (hierarchical), ShipCrew (role-based crews), AdaptiveAgent, and PersistentAgent (checkpoint + resume) are all first-class.

Super RAG

from shipit_agent import RAG, Agent
from shipit_agent.rag.embedder import HashingEmbedder

rag = RAG.default(embedder=HashingEmbedder())
rag.index_text("Payments run on Stripe; refunds settle in 5–7 days.", source="ops.md")

agent = Agent.with_builtins(llm=llm, rag=rag)     # retrieves, then answers with cited sources
print(agent.run("How long do refunds take?").rag_sources)

Hybrid vector + BM25 ranking, a document chunker, multiple embedders/rerankers, and pluggable backends (Chroma, Qdrant, pgvector).

Autopilot, computer use & connectors

# Drive a real browser (works in Jupyter):
from shipit_agent.computer_use import ComputerUseAgent, PlaywrightBrowserSession

with PlaywrightBrowserSession.launch(headless=True) as browser:
    ComputerUseAgent(llm=claude_llm, browser=browser,
                     goal="Find the iPhone 15 Pro price on apple.com").run()

Autopilot runs long, unattended jobs with a critic, artifacts, fan-out, and a scheduler. 17 SaaS connectors — GitHub, GitLab, Slack, Gmail, Google Drive/Sheets/Calendar, Jira, Linear, Notion, Confluence, HubSpot, Salesforce, Stripe, Zendesk, Figma, LinkedIn — share a credential store with built-in OAuth helpers.

MCP

from shipit_agent.mcp import RemoteMCPServer

agent = Agent.with_builtins(llm=llm, mcps=[RemoteMCPServer(name="fs", transport=...)])

Connect MCP servers over stdio, HTTP, or a persistent subprocess — their tools join the agent's tool set automatically.

Observability & cost

Tracing — FileTraceStore, OpenTelemetry, and LangSmith exporters.
Cost & budgets — CostTracker prices every call from a model table; Budget enforces a ceiling (and flags unknown-model pricing instead of silently billing $0).
Verifier network — an optional cheap LLM that vetoes hallucinated tool calls and detects stalling, complementing the rule-based permission engine.

Examples & notebooks

examples/ — runnable scripts (basic agent, custom tools, parallel tools, cost budgets, multi-turn memory, async runtime, secure tools, the verifier guard, and more).
notebooks/ — 60+ Jupyter notebooks covering agents, streaming, MCP, connectors, deep agents, RAG, skills, autopilot, the control plane, prompt caching, and the memory tool.

python examples/run_multi_tool_agent.py

Documentation

🌐 Full documentation site — searchable, with guides for every subsystem.
📓 CHANGELOG.md · release-notes/ — per-release detail.
🧰 TOOLS.md — the built-in tool catalogue.
🔐 SECURITY.md · ⚖️ LICENSE.md (MIT).

Contributing

Issues and PRs are welcome. Install the dev extras, keep the suite green, and run the linter:

pip install -e ".[dev]"
pytest -q
ruff check .

See CONTRIBUTING.md for the full guide.

Built with Love. Powered by your choice of AI models.
_{Ship it fast. Ship it right.}

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
.github		.github
.shipit_notebook_workspace		.shipit_notebook_workspace
docs		docs
examples		examples
notebooks		notebooks
release-notes		release-notes
scripts		scripts
shipit_agent		shipit_agent
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.gitleaks.toml		.gitleaks.toml
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
TOOLS.md		TOOLS.md
banner.svg		banner.svg
docs.md		docs.md
gitignore		gitignore
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
shipit-icon.svg		shipit-icon.svg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SHIPIT Agent

What is SHIPIT Agent?

Highlights

Installation

From PyPI (recommended)

Optional extras

From source (development)

Verify

Environment setup

Use any LLM provider

Core building blocks

Custom tools

Skills

Sessions & memory

Structured output

Streaming

The control plane

Performance: prompt caching

Deep agents & orchestration

Super RAG

Autopilot, computer use & connectors

MCP

Observability & cost

Examples & notebooks

Documentation

Contributing

About

Uh oh!

Releases 14

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

SHIPIT Agent

What is SHIPIT Agent?

Highlights

Installation

From PyPI (recommended)

Optional extras

From source (development)

Verify

Environment setup

Use any LLM provider

Core building blocks

Custom tools

Skills

Sessions & memory

Structured output

Streaming

The control plane

Performance: prompt caching

Deep agents & orchestration

Super RAG

Autopilot, computer use & connectors

MCP

Observability & cost

Examples & notebooks

Documentation

Contributing

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 14

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages