A clean, powerful, open-source Python runtime for building tool-using AI agents.
One consistent API over every major LLM provider — with tools, skills, memory, MCP, a rule-based permission layer, prompt caching, deep multi-agent orchestration, RAG, and structured streaming events.
📖 Documentation · 📦 PyPI · Quick start · Changelog · Security
SHIPIT Agent is a small, explicit runtime for building production agents in Python. You bring an LLM; the runtime gives you the loop around it — tool calling, retries, streaming, memory, sessions, permissions, and cost tracking — plus a deep library of batteries (40+ built-in tools, 17 SaaS connectors, RAG, multi-agent orchestration, browser automation).
It is provider-agnostic by design: the same agent code runs on OpenAI, Anthropic, AWS Bedrock, Google Vertex/Gemini, Groq, Together, Ollama, or any of 100+ models through LiteLLM. Swap the model in one line — nothing else changes.
from shipit_agent import Agent
from shipit_agent.llms import build_llm_from_env
agent = Agent.with_builtins(llm=build_llm_from_env()) # any provider
print(agent.run("Find every TODO in this repo and summarize them.").output)The only hard dependency is
pydantic. Everything else (a provider SDK, Playwright, a vector store) is an optional extra you install when you need it. Python 3.11+ · MIT · 1850+ tests.
- 🤖 The Agent — one runtime: tool calling, retries, parallel tools, context compaction, and a
final-answer guarantee.
Agent.with_builtins()ships the full tool catalogue. - 🔌 Any LLM — OpenAI · Anthropic · Bedrock · Vertex · Gemini · Groq · Together · Ollama · OpenRouter · 100+ via LiteLLM. Native adapters where it matters, one interface everywhere.
- 🛡️ Control plane — a fast, rule-based permission engine (allow/deny/ask), plan mode (read-only research before acting), and hooks that can block or rewrite any tool call.
- ⚡ Prompt caching — cross-provider cache-read accounting (Anthropic/Bedrock/Vertex
cache_control+ OpenAI automatic caching) so repeated calls bill at a fraction of the cost. - 🧰 Tools & connectors — 40+ built-in tools (bash, SQL, files, web search, code execution, vision, PDF…) and 17 SaaS connectors (GitHub, Slack, Gmail, Jira, Salesforce, Stripe…).
- 🔗 MCP — connect Model Context Protocol servers over stdio, HTTP, or a persistent subprocess.
- 🧠 Deep agents —
GoalAgent,ReflectiveAgent,Supervisor/Worker,ShipCrew, and thecreate_deep_agent()factory for autonomous, multi-step, multi-agent work. - 📚 Super RAG — hybrid vector + BM25 search with auto-cited sources and pluggable backends (Chroma, Qdrant, pgvector).
- 🚀 Autopilot — long-running autonomous loops with a critic, artifacts, fan-out, and a scheduler.
- 🖥️ Computer use — drive a real browser via screenshots + a vision model (works in Jupyter).
- 📊 Production-ready — sessions, memory consolidation, structured output with validation-retry, streaming events (+ SSE/WebSocket packets), tracing (file/OTel/LangSmith), and budgets.
Requirements: Python 3.11+ (3.11 – 3.14 supported). The only hard dependency is
pydantic; provider SDKs and heavier features are opt-in extras.
pip install shipit-agentInstall only what you need — each extra pulls in the relevant third-party packages:
| Extra | Installs | For |
|---|---|---|
openai |
openai |
OpenAI / OpenAI-compatible |
anthropic |
anthropic |
native Anthropic (Claude) |
bedrock |
boto3 |
AWS Bedrock |
google |
google-generativeai |
Gemini |
groq / together / ollama |
provider SDK | Groq / Together / Ollama |
litellm |
litellm |
100+ models via one interface |
playwright |
playwright |
browser automation / computer use |
pdf |
pypdf |
the PDF tool |
sql |
sqlalchemy |
the SQL tool (add your own driver) |
rag-chroma / rag-qdrant / rag-pgvector |
vector store | RAG backends |
rag-openai / rag-cohere / rag-sentence-transformers |
embedder | RAG embeddings |
otel / langsmith |
exporters | tracing |
all |
everything | kitchen sink |
pip install "shipit-agent[anthropic]" # one provider
pip install "shipit-agent[anthropic,playwright,rag-chroma]" # combine
pip install "shipit-agent[all]" # everythingBrowser automation / computer use also needs the Chromium binary:
pip install "shipit-agent[playwright]" && playwright install chromium
git clone https://github.com/shipiit/shipit_agent.git
cd shipit_agent
pip install -e ".[dev]" # editable install with test/docs tooling
pytest -q # 1850+ tests
ruff check .Alternatives: pip install . (non-editable), pip install -r requirements.txt, or
poetry install.
import shipit_agent
print(shipit_agent.__version__)Notebook tip: if imports look out of date, your kernel may be using an older globally installed copy. Run
pip install -U shipit-agent(orpip install -e .from the repo) in the kernel's environment.
The fastest way to choose a model is environment variables — copy
.env.example to .env and fill in what you use:
# Pick the provider; build_llm_from_env() reads these:
SHIPIT_LLM_PROVIDER=bedrock # openai | anthropic | bedrock | vertex | gemini | groq | together | ollama | litellm
# …then the provider's own credentials, e.g.:
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
AWS_REGION_NAME=us-east-1 # Bedrock uses your AWS region / profile (no key)
SHIPIT_BEDROCK_MODEL=bedrock/us.anthropic.claude-3-5-sonnet-20240620-v1:0
GEMINI_API_KEY=...
GROQ_API_KEY=...
SHIPIT_LITELLM_MODEL=openrouter/openai/gpt-4o-mini # for the litellm providerfrom shipit_agent import Agent
from shipit_agent.llms import build_llm_from_env
agent = Agent.with_builtins(llm=build_llm_from_env()) # reads SHIPIT_LLM_PROVIDER + creds
print(agent.run("Hello, who are you?").output)Run diagnostics any time with agent.doctor() to check provider config, credentials, and tools.
The agent never cares which model it talks to. Configure once via env, or instantiate an adapter directly:
from shipit_agent.llms import (
build_llm_from_env, OpenAIChatLLM, AnthropicChatLLM,
BedrockChatLLM, GeminiChatLLM, GroqChatLLM, LiteLLMChatLLM,
)
llm = build_llm_from_env("bedrock") # env-driven (prod)
llm = OpenAIChatLLM(model="gpt-4o") # native OpenAI
llm = AnthropicChatLLM(model="claude-opus-4-1") # native Anthropic
llm = BedrockChatLLM(model="bedrock/us.meta.llama4-maverick-17b-instruct-v1:0")
llm = GeminiChatLLM(model="gemini/gemini-2.0-flash")
llm = GroqChatLLM(model="groq/llama-3.3-70b-versatile")
llm = LiteLLMChatLLM(model="together_ai/meta-llama/Llama-3.1-70B-Instruct-Turbo")| Provider | Adapter | Env (SHIPIT_LLM_PROVIDER=) |
Auth |
|---|---|---|---|
| OpenAI | OpenAIChatLLM |
openai |
OPENAI_API_KEY |
| Anthropic | AnthropicChatLLM |
anthropic |
ANTHROPIC_API_KEY |
| AWS Bedrock | BedrockChatLLM |
bedrock |
AWS region / profile |
| Google Vertex | VertexAIChatLLM |
vertex |
service-account JSON |
| Gemini | GeminiChatLLM |
gemini |
GEMINI_API_KEY |
| Groq | GroqChatLLM |
groq |
GROQ_API_KEY |
| Together | TogetherChatLLM |
together |
TOGETHERAI_API_KEY |
| Ollama (local) | OllamaChatLLM |
ollama |
— |
| LiteLLM / OpenRouter | LiteLLMChatLLM / LiteLLMProxyChatLLM |
litellm |
per provider |
Wrap any Python callable — the agent reads its signature and calls it when useful:
from shipit_agent import Agent, FunctionTool
def get_weather(city: str) -> str:
"""Current weather for a city."""
return f"{city}: 22°C, clear"
agent = Agent.with_builtins(llm=llm, tools=[FunctionTool.from_callable(get_weather)])
agent.run("What's the weather in Tokyo — umbrella?")Reusable behaviour templates that shape how the agent thinks and which tools it reaches for:
agent = Agent.with_builtins(
llm=llm,
skills=["code-workflow-assistant", "database-architect"],
auto_use_skills=True, # also match skills from the prompt
)session = agent.chat_session(session_id="user-42")
session.send("My name is Ada. I build compilers.")
session.send("What was my name again?") # → remembers across turnsPersist across processes with FileSessionStore, and distill conversations into durable facts
with MemoryConsolidator.
from pydantic import BaseModel
class Ticket(BaseModel):
title: str; priority: str; tags: list[str]
result = agent.run("Triage: 'login broken on Safari'", output_schema=Ticket)
print(result.parsed) # validated; auto-retries inside the same conversationfor event in agent.stream("Write a haiku about shipping code"):
if event.type == "text_delta":
print(event.payload["chunk"], end="", flush=True)
elif event.type == "tool_called":
print("→", event.payload["name"])Events also serialize to ready-made SSE / WebSocket packets for web UIs.
A Claude Code-style safety layer — rule-based, no extra LLM call.
from shipit_agent import Agent, PermissionEngine
agent = Agent.with_builtins(
llm=llm,
permissions=PermissionEngine(
deny=["bash", "*_delete"], # never run these
ask=["sql"], # require approval
allow=["read*", "grep*"], # always fine
),
)
# Read-only "plan mode" — research and propose, take no action:
plan = agent.plan("Migrate the billing schema to multi-tenant.").output- Modes:
default,acceptEdits,plan,bypass. permission_callback(name, args)for programmatic human-in-the-loop approval.- Blocking hooks —
before_toolhooks can deny a call or rewrite its arguments;on_user_promptcan redact prompts:
@hooks.on_before_tool
def guard(name, args):
if name == "bash" and "rm -rf" in args.get("command", ""):
return {"decision": "deny", "reason": "destructive command"}The runtime rebuilds the same system prompt + tool schemas each turn — the ideal cacheable prefix.
from shipit_agent.llms import AnthropicChatLLM
llm = AnthropicChatLLM("claude-opus-4-1", prompt_caching=True) # default on for Claudecache_control breakpoints are placed on tools + system prompt; responses surface
cache_read_input_tokens / cache_creation_input_tokens, which flow into CostTracker. Caching
spans Anthropic, Bedrock, Vertex (cache_control) and OpenAI (automatic) — cache reads bill
at ~10% of input.
from shipit_agent import create_deep_agent, Goal
# Autonomous goal decomposition with a planner / explorer / coder / verifier loop:
agent = create_deep_agent(llm=llm, tools=[...])
result = agent.run(Goal(objective="Build and test a REST API for todos"))GoalAgent (decompose → execute), ReflectiveAgent (self-improve to a quality bar),
Supervisor + Worker (hierarchical), ShipCrew (role-based crews), AdaptiveAgent, and
PersistentAgent (checkpoint + resume) are all first-class.
from shipit_agent import RAG, Agent
from shipit_agent.rag.embedder import HashingEmbedder
rag = RAG.default(embedder=HashingEmbedder())
rag.index_text("Payments run on Stripe; refunds settle in 5–7 days.", source="ops.md")
agent = Agent.with_builtins(llm=llm, rag=rag) # retrieves, then answers with cited sources
print(agent.run("How long do refunds take?").rag_sources)Hybrid vector + BM25 ranking, a document chunker, multiple embedders/rerankers, and pluggable backends (Chroma, Qdrant, pgvector).
# Drive a real browser (works in Jupyter):
from shipit_agent.computer_use import ComputerUseAgent, PlaywrightBrowserSession
with PlaywrightBrowserSession.launch(headless=True) as browser:
ComputerUseAgent(llm=claude_llm, browser=browser,
goal="Find the iPhone 15 Pro price on apple.com").run()Autopilot runs long, unattended jobs with a critic, artifacts, fan-out, and a scheduler. 17 SaaS connectors — GitHub, GitLab, Slack, Gmail, Google Drive/Sheets/Calendar, Jira, Linear, Notion, Confluence, HubSpot, Salesforce, Stripe, Zendesk, Figma, LinkedIn — share a credential store with built-in OAuth helpers.
from shipit_agent.mcp import RemoteMCPServer
agent = Agent.with_builtins(llm=llm, mcps=[RemoteMCPServer(name="fs", transport=...)])Connect MCP servers over stdio, HTTP, or a persistent subprocess — their tools join the agent's tool set automatically.
- Tracing —
FileTraceStore, OpenTelemetry, and LangSmith exporters. - Cost & budgets —
CostTrackerprices every call from a model table;Budgetenforces a ceiling (and flags unknown-model pricing instead of silently billing $0). - Verifier network — an optional cheap LLM that vetoes hallucinated tool calls and detects stalling, complementing the rule-based permission engine.
examples/— runnable scripts (basic agent, custom tools, parallel tools, cost budgets, multi-turn memory, async runtime, secure tools, the verifier guard, and more).notebooks/— 60+ Jupyter notebooks covering agents, streaming, MCP, connectors, deep agents, RAG, skills, autopilot, the control plane, prompt caching, and the memory tool.
python examples/run_multi_tool_agent.py- 🌐 Full documentation site — searchable, with guides for every subsystem.
- 📓 CHANGELOG.md · release-notes/ — per-release detail.
- 🧰 TOOLS.md — the built-in tool catalogue.
- 🔐 SECURITY.md · ⚖️ LICENSE.md (MIT).
Issues and PRs are welcome. Install the dev extras, keep the suite green, and run the linter:
pip install -e ".[dev]"
pytest -q
ruff check .See CONTRIBUTING.md for the full guide.
Built with Love. Powered by your choice of AI models.
Ship it fast. Ship it right.