AutOffload is a hybrid local-cloud agent task delegator and hardware diagnostics runner built on Clean Architecture principles. It allows cloud-based developer agents (like Antigravity or Claude Code) to offload multi-turn, iterative coding tasks (such as syntax fixes, test cycles, and unit test generation) to a local Ollama model. This drastically reduces cloud token consumption and speeds up minor refactoring loops.
To run local coding models with comfortable generation speeds (~30–60 tokens/sec), the model must fit entirely within your GPU VRAM:
- 12GB VRAM (Sweet Spot - e.g., RTX 3060, RTX 4070): Recommended model:
qwen2.5-coder:7b(extremely fast, low footprint) orqwen2.5-coder:14b(higher reasoning, tight fit). - 8GB VRAM (Minimum GPU): Recommended model:
qwen2.5-coder:7bordeepseek-r1:8b. - No GPU (CPU-only fallback): The tool will fall back to CPU execution. Note that running models on system RAM is highly sluggish (~2 tokens/sec) and not recommended for agentic iterative loops.
- Node.js: v22.15.0 or higher.
- Ollama: Client and service installed and listening (typically on
http://localhost:11434).
Execute these steps in sequence to install the CLI tool and register the global agent skill.
git clone <repository-url>
cd AutOffloadnpm installnpm run buildLink the package globally on your OS so the autoffload command is available in any terminal session:
npm linkVerify installation by running:
autoffload specsRegister the custom skill so that your Antigravity agent knows how and when to call this tool:
npm run install-skillThis copies the global skill definition directly to your local .gemini settings:
C:\Users\<Username>\.gemini\config\skills\autoffload\SKILL.md
Make sure the local Ollama instance is running and has the optimized coding model loaded.
- Start Ollama: Ensure the Ollama app or system service is active.
- Pull the Recommended Model:
ollama pull qwen2.5-coder:7b
You can create an autoffload.config.json file in the root of your target project workspace to override settings:
{
"ollamaUrl": "http://localhost:11434",
"defaultModel": "qwen2.5-coder:7b",
"maxRetries": 3
}ollamaUrl: The HTTP API URL where your Ollama service is listening.defaultModel: The model to fall back on if no model override-mparameter is specified in the CLI.maxRetries: The number of self-correction code-compilation loops the agent executes before declaring failure.
autoffload specsExamines your CPU, total RAM, and GPU VRAM to output a compatibility report and suggest the best model for your hardware.
autoffload run \
-t "Fix spelling error 'rturn' to 'return' in the add function" \
-f "test_workspace/calculator.ts" \
-c "npx tsc --noEmit test_workspace/calculator.ts"-t, --task: Detailed instructions of the coding task to execute.-f, --files: Comma-separated list of target files (local model reads them and writes changes back).-c, --test: Optional. The validation test command. If it returns a non-zero exit code, the compiler error logs are fed back to the model to correct the code in a loop.-m, --model: Optional. Override the targeted model.-r, --retries: Optional. Override the max self-correction attempts.
AutOffload is structured to isolate core business rules from infrastructure implementations:
src/
├── domain/ # Core Models & Contracts (Zero dependencies)
│ ├── entities/ # SystemSpecs definitions and recommended rules
│ └── ports/ # Interfaces for FileSystem, ProcessExecutor, LLMProvider
│
├── application/ # Use Cases
│ └── use-cases/ # GetSpecsUseCase, RunTaskUseCase (Self-correction logic)
│
└── infrastructure/ # Adapters (Concrete implementations)
├── cli/ # CLI entry flag parsing and stdout streaming
├── config/ # JSON config loader
├── spec-providers/ # Windows PowerShell specs extraction
├── llm-providers/ # Ollama REST client (HTTP JSON-lines parser)
├── executors/ # Subprocess runner (child_process)
└── file-system/ # Node fs/promises file reader/writer