Skip to content

feat: add Ollama provider support with streaming inference#162

Open
independenter wants to merge 1 commit into
nat:mainfrom
independenter:feature/add-ollama-provider
Open

feat: add Ollama provider support with streaming inference#162
independenter wants to merge 1 commit into
nat:mainfrom
independenter:feature/add-ollama-provider

Conversation

@independenter

Copy link
Copy Markdown

Description

This PR adds comprehensive support for the Ollama LLM provider, enabling users to run local models through Ollama with full streaming support.

Changes Made

  • ✅ Implemented ollama_text_generation and ollama_chat_generation methods in [server/lib/inference/init.py](server/lib/inference/init.py)
  • ✅ Added api_url field to ProviderDetails and Provider entities in [server/lib/entities.py](server/lib/entities.py) for custom API endpoints
  • ✅ Registered Ollama routes in [server/app.py](server/app.py)
  • ✅ Updated [server/models.json](server/models.json) with Ollama model configurations (e.g., gemma2, llama3)
  • ✅ Added comprehensive documentation in Chinese: docs/添加大模型提供商指南.md

Features

  • 🔄 Full streaming support for real-time token generation
  • 🎯 Support for both text generation and chat modes
  • ⚙️ Configurable API URL (default: http://localhost:11434)
  • 📊 Comprehensive parameter support (temperature, top_p, top_k, etc.)
  • ❌ Graceful cancellation support

How to Test

  1. Install Ollama: https://ollama.com/
  2. Pull a model: ollama pull llama3 or ollama pull gemma2
  3. Ensure Ollama is running (default: http://localhost:11434)
  4. Start the OpenPlayground server
  5. Select an Ollama model from the dropdown
  6. Start chatting and verify streaming responses work correctly

Configuration Example

Add to [server/models.json](server/models.json):

{
  "ollama": {
    "models": {
      "llama3": {
        "enabled": true,
        "status": "ready",
        "parameters": {
          "temperature": 0.7,
          "topP": 0.9,
          "topK": 40,
          "maximumLength": 512
        }
      }
    },
    "requiresAPIKey": false,
    "remoteInference": true,
    "apiURL": "http://localhost:11434"
  }
}

- Add ollama dependency handling in lib/inference/__init__.py
- Implement __ollama_text_generation__ and __ollama_chat_generation__ methods
- Add api_url field to ProviderDetails and Provider entities
- Register Ollama routes in server/app.py
- Update models.json with Ollama model configurations (gemma4:e4b)
- Add comprehensive documentation in docs/添加大模型提供商指南.md
- Support both text generation and chat generation with streaming responses
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant