Skip to content

shipiit/shipit_agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SHIPIT Agent — production-grade Python agent runtime

SHIPIT

SHIPIT Agent

A clean, powerful, open-source Python runtime for building tool-using AI agents.

One consistent API over every major LLM provider — with tools, skills, memory, MCP, a rule-based permission layer, prompt caching, deep multi-agent orchestration, RAG, and structured streaming events.

📖 Documentation · 📦 PyPI · Quick start · Changelog · Security

PyPI Python versions Downloads License Docs

Anthropic Bedrock OpenAI Gemini Vertex AI Groq Together Ollama LiteLLM


What is SHIPIT Agent?

SHIPIT Agent is a small, explicit runtime for building production agents in Python. You bring an LLM; the runtime gives you the loop around it — tool calling, retries, streaming, memory, sessions, permissions, and cost tracking — plus a deep library of batteries (40+ built-in tools, 17 SaaS connectors, RAG, multi-agent orchestration, browser automation).

It is provider-agnostic by design: the same agent code runs on OpenAI, Anthropic, AWS Bedrock, Google Vertex/Gemini, Groq, Together, Ollama, or any of 100+ models through LiteLLM. Swap the model in one line — nothing else changes.

from shipit_agent import Agent
from shipit_agent.llms import build_llm_from_env

agent = Agent.with_builtins(llm=build_llm_from_env())   # any provider
print(agent.run("Find every TODO in this repo and summarize them.").output)

The only hard dependency is pydantic. Everything else (a provider SDK, Playwright, a vector store) is an optional extra you install when you need it. Python 3.11+ · MIT · 1850+ tests.


Highlights

  • 🤖 The Agent — one runtime: tool calling, retries, parallel tools, context compaction, and a final-answer guarantee. Agent.with_builtins() ships the full tool catalogue.
  • 🔌 Any LLM — OpenAI · Anthropic · Bedrock · Vertex · Gemini · Groq · Together · Ollama · OpenRouter · 100+ via LiteLLM. Native adapters where it matters, one interface everywhere.
  • 🛡️ Control plane — a fast, rule-based permission engine (allow/deny/ask), plan mode (read-only research before acting), and hooks that can block or rewrite any tool call.
  • ⚡ Prompt caching — cross-provider cache-read accounting (Anthropic/Bedrock/Vertex cache_control + OpenAI automatic caching) so repeated calls bill at a fraction of the cost.
  • 🧰 Tools & connectors — 40+ built-in tools (bash, SQL, files, web search, code execution, vision, PDF…) and 17 SaaS connectors (GitHub, Slack, Gmail, Jira, Salesforce, Stripe…).
  • 🔗 MCP — connect Model Context Protocol servers over stdio, HTTP, or a persistent subprocess.
  • 🧠 Deep agentsGoalAgent, ReflectiveAgent, Supervisor/Worker, ShipCrew, and the create_deep_agent() factory for autonomous, multi-step, multi-agent work.
  • 📚 Super RAG — hybrid vector + BM25 search with auto-cited sources and pluggable backends (Chroma, Qdrant, pgvector).
  • 🚀 Autopilot — long-running autonomous loops with a critic, artifacts, fan-out, and a scheduler.
  • 🖥️ Computer use — drive a real browser via screenshots + a vision model (works in Jupyter).
  • 📊 Production-ready — sessions, memory consolidation, structured output with validation-retry, streaming events (+ SSE/WebSocket packets), tracing (file/OTel/LangSmith), and budgets.

Installation

Requirements: Python 3.11+ (3.11 – 3.14 supported). The only hard dependency is pydantic; provider SDKs and heavier features are opt-in extras.

From PyPI (recommended)

pip install shipit-agent

Optional extras

Install only what you need — each extra pulls in the relevant third-party packages:

Extra Installs For
openai openai OpenAI / OpenAI-compatible
anthropic anthropic native Anthropic (Claude)
bedrock boto3 AWS Bedrock
google google-generativeai Gemini
groq / together / ollama provider SDK Groq / Together / Ollama
litellm litellm 100+ models via one interface
playwright playwright browser automation / computer use
pdf pypdf the PDF tool
sql sqlalchemy the SQL tool (add your own driver)
rag-chroma / rag-qdrant / rag-pgvector vector store RAG backends
rag-openai / rag-cohere / rag-sentence-transformers embedder RAG embeddings
otel / langsmith exporters tracing
all everything kitchen sink
pip install "shipit-agent[anthropic]"        # one provider
pip install "shipit-agent[anthropic,playwright,rag-chroma]"   # combine
pip install "shipit-agent[all]"              # everything

Browser automation / computer use also needs the Chromium binary:

pip install "shipit-agent[playwright]" && playwright install chromium

From source (development)

git clone https://github.com/shipiit/shipit_agent.git
cd shipit_agent
pip install -e ".[dev]"     # editable install with test/docs tooling
pytest -q                   # 1850+ tests
ruff check .

Alternatives: pip install . (non-editable), pip install -r requirements.txt, or poetry install.

Verify

import shipit_agent
print(shipit_agent.__version__)

Notebook tip: if imports look out of date, your kernel may be using an older globally installed copy. Run pip install -U shipit-agent (or pip install -e . from the repo) in the kernel's environment.


Environment setup

The fastest way to choose a model is environment variables — copy .env.example to .env and fill in what you use:

# Pick the provider; build_llm_from_env() reads these:
SHIPIT_LLM_PROVIDER=bedrock            # openai | anthropic | bedrock | vertex | gemini | groq | together | ollama | litellm

# …then the provider's own credentials, e.g.:
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
AWS_REGION_NAME=us-east-1             # Bedrock uses your AWS region / profile (no key)
SHIPIT_BEDROCK_MODEL=bedrock/us.anthropic.claude-3-5-sonnet-20240620-v1:0
GEMINI_API_KEY=...
GROQ_API_KEY=...
SHIPIT_LITELLM_MODEL=openrouter/openai/gpt-4o-mini    # for the litellm provider
from shipit_agent import Agent
from shipit_agent.llms import build_llm_from_env

agent = Agent.with_builtins(llm=build_llm_from_env())   # reads SHIPIT_LLM_PROVIDER + creds
print(agent.run("Hello, who are you?").output)

Run diagnostics any time with agent.doctor() to check provider config, credentials, and tools.


Use any LLM provider

The agent never cares which model it talks to. Configure once via env, or instantiate an adapter directly:

from shipit_agent.llms import (
    build_llm_from_env, OpenAIChatLLM, AnthropicChatLLM,
    BedrockChatLLM, GeminiChatLLM, GroqChatLLM, LiteLLMChatLLM,
)

llm = build_llm_from_env("bedrock")                       # env-driven (prod)
llm = OpenAIChatLLM(model="gpt-4o")                       # native OpenAI
llm = AnthropicChatLLM(model="claude-opus-4-1")           # native Anthropic
llm = BedrockChatLLM(model="bedrock/us.meta.llama4-maverick-17b-instruct-v1:0")
llm = GeminiChatLLM(model="gemini/gemini-2.0-flash")
llm = GroqChatLLM(model="groq/llama-3.3-70b-versatile")
llm = LiteLLMChatLLM(model="together_ai/meta-llama/Llama-3.1-70B-Instruct-Turbo")
Provider Adapter Env (SHIPIT_LLM_PROVIDER=) Auth
OpenAI OpenAIChatLLM openai OPENAI_API_KEY
Anthropic AnthropicChatLLM anthropic ANTHROPIC_API_KEY
AWS Bedrock BedrockChatLLM bedrock AWS region / profile
Google Vertex VertexAIChatLLM vertex service-account JSON
Gemini GeminiChatLLM gemini GEMINI_API_KEY
Groq GroqChatLLM groq GROQ_API_KEY
Together TogetherChatLLM together TOGETHERAI_API_KEY
Ollama (local) OllamaChatLLM ollama
LiteLLM / OpenRouter LiteLLMChatLLM / LiteLLMProxyChatLLM litellm per provider

Core building blocks

Custom tools

Wrap any Python callable — the agent reads its signature and calls it when useful:

from shipit_agent import Agent, FunctionTool

def get_weather(city: str) -> str:
    """Current weather for a city."""
    return f"{city}: 22°C, clear"

agent = Agent.with_builtins(llm=llm, tools=[FunctionTool.from_callable(get_weather)])
agent.run("What's the weather in Tokyo — umbrella?")

Skills

Reusable behaviour templates that shape how the agent thinks and which tools it reaches for:

agent = Agent.with_builtins(
    llm=llm,
    skills=["code-workflow-assistant", "database-architect"],
    auto_use_skills=True,      # also match skills from the prompt
)

Sessions & memory

session = agent.chat_session(session_id="user-42")
session.send("My name is Ada. I build compilers.")
session.send("What was my name again?")          # → remembers across turns

Persist across processes with FileSessionStore, and distill conversations into durable facts with MemoryConsolidator.

Structured output

from pydantic import BaseModel

class Ticket(BaseModel):
    title: str; priority: str; tags: list[str]

result = agent.run("Triage: 'login broken on Safari'", output_schema=Ticket)
print(result.parsed)            # validated; auto-retries inside the same conversation

Streaming

for event in agent.stream("Write a haiku about shipping code"):
    if event.type == "text_delta":
        print(event.payload["chunk"], end="", flush=True)
    elif event.type == "tool_called":
        print("→", event.payload["name"])

Events also serialize to ready-made SSE / WebSocket packets for web UIs.


The control plane

A Claude Code-style safety layer — rule-based, no extra LLM call.

from shipit_agent import Agent, PermissionEngine

agent = Agent.with_builtins(
    llm=llm,
    permissions=PermissionEngine(
        deny=["bash", "*_delete"],   # never run these
        ask=["sql"],                 # require approval
        allow=["read*", "grep*"],    # always fine
    ),
)

# Read-only "plan mode" — research and propose, take no action:
plan = agent.plan("Migrate the billing schema to multi-tenant.").output
  • Modes: default, acceptEdits, plan, bypass.
  • permission_callback(name, args) for programmatic human-in-the-loop approval.
  • Blocking hooksbefore_tool hooks can deny a call or rewrite its arguments; on_user_prompt can redact prompts:
@hooks.on_before_tool
def guard(name, args):
    if name == "bash" and "rm -rf" in args.get("command", ""):
        return {"decision": "deny", "reason": "destructive command"}

Performance: prompt caching

The runtime rebuilds the same system prompt + tool schemas each turn — the ideal cacheable prefix.

from shipit_agent.llms import AnthropicChatLLM
llm = AnthropicChatLLM("claude-opus-4-1", prompt_caching=True)   # default on for Claude

cache_control breakpoints are placed on tools + system prompt; responses surface cache_read_input_tokens / cache_creation_input_tokens, which flow into CostTracker. Caching spans Anthropic, Bedrock, Vertex (cache_control) and OpenAI (automatic) — cache reads bill at ~10% of input.


Deep agents & orchestration

from shipit_agent import create_deep_agent, Goal

# Autonomous goal decomposition with a planner / explorer / coder / verifier loop:
agent = create_deep_agent(llm=llm, tools=[...])
result = agent.run(Goal(objective="Build and test a REST API for todos"))

GoalAgent (decompose → execute), ReflectiveAgent (self-improve to a quality bar), Supervisor + Worker (hierarchical), ShipCrew (role-based crews), AdaptiveAgent, and PersistentAgent (checkpoint + resume) are all first-class.

Super RAG

from shipit_agent import RAG, Agent
from shipit_agent.rag.embedder import HashingEmbedder

rag = RAG.default(embedder=HashingEmbedder())
rag.index_text("Payments run on Stripe; refunds settle in 5–7 days.", source="ops.md")

agent = Agent.with_builtins(llm=llm, rag=rag)     # retrieves, then answers with cited sources
print(agent.run("How long do refunds take?").rag_sources)

Hybrid vector + BM25 ranking, a document chunker, multiple embedders/rerankers, and pluggable backends (Chroma, Qdrant, pgvector).

Autopilot, computer use & connectors

# Drive a real browser (works in Jupyter):
from shipit_agent.computer_use import ComputerUseAgent, PlaywrightBrowserSession

with PlaywrightBrowserSession.launch(headless=True) as browser:
    ComputerUseAgent(llm=claude_llm, browser=browser,
                     goal="Find the iPhone 15 Pro price on apple.com").run()

Autopilot runs long, unattended jobs with a critic, artifacts, fan-out, and a scheduler. 17 SaaS connectors — GitHub, GitLab, Slack, Gmail, Google Drive/Sheets/Calendar, Jira, Linear, Notion, Confluence, HubSpot, Salesforce, Stripe, Zendesk, Figma, LinkedIn — share a credential store with built-in OAuth helpers.

MCP

from shipit_agent.mcp import RemoteMCPServer

agent = Agent.with_builtins(llm=llm, mcps=[RemoteMCPServer(name="fs", transport=...)])

Connect MCP servers over stdio, HTTP, or a persistent subprocess — their tools join the agent's tool set automatically.


Observability & cost

  • TracingFileTraceStore, OpenTelemetry, and LangSmith exporters.
  • Cost & budgetsCostTracker prices every call from a model table; Budget enforces a ceiling (and flags unknown-model pricing instead of silently billing $0).
  • Verifier network — an optional cheap LLM that vetoes hallucinated tool calls and detects stalling, complementing the rule-based permission engine.

Examples & notebooks

  • examples/ — runnable scripts (basic agent, custom tools, parallel tools, cost budgets, multi-turn memory, async runtime, secure tools, the verifier guard, and more).
  • notebooks/ — 60+ Jupyter notebooks covering agents, streaming, MCP, connectors, deep agents, RAG, skills, autopilot, the control plane, prompt caching, and the memory tool.
python examples/run_multi_tool_agent.py

Documentation

Contributing

Issues and PRs are welcome. Install the dev extras, keep the suite green, and run the linter:

pip install -e ".[dev]"
pytest -q
ruff check .

See CONTRIBUTING.md for the full guide.


SHIPIT
Built with Love. Powered by your choice of AI models.
Ship it fast. Ship it right.

About

Powerful Python agent runtime with tools, MCP, Hooks, Skills, Rag, memory, sessions, reasoning, and streaming packets.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors