ML Model Serving Engine

A lightweight engine for serving machine learning models via a REST API.

Features

Load ML models saved in standard formats (joblib, pickle, onnx, pytorch, Tensorflow Savedmodel)
Added a /predict_multimodal endpoint accepting image + text payloads, routing to a VLA model (e.g. LLaVA, OpenVLA) for grounded predictions
a multimodal retrieval pipeline — embed uploaded images/text into a vector store and retrieve relevant context before inference
Added a /finetune endpoint that accepts a small labelled dataset and runs LoRA/QLoRA fine-tuning on a loaded base model in-place, then hot-swaps the adapter weights without server restart
Support loading models with merged or separate LoRA adapter files (.safetensors) alongside the base model, configurable via YAML
Added a /predict_video endpoint that accepts a sequence of frames and runs temporal inference (useful for action recognition or latent diffusion pipelines)
Support loading rectified flow / diffusion model checkpoints (.safetensors, diffusers format) as a first-class model format in model_loader.py
Added INT8/INT4 quantisation support via bitsandbytes or torch.ao.quantization at model load time, configurable with a quantization key in the YAML config
Implemented dynamic batching with configurable max_wait_ms and max_batch_size to maximise GPU throughput under concurrent requests Add a /benchmark endpoint that runs a warmup + timed inference loop and reports latency percentiles (p50/p95/p99) and throughput
Observability / MLOps support
- Expose a /metrics endpoint (Prometheus-compatible) tracking inference latency, batch sizes, model load time, and GPU memory utilisation via pynvml
Added model versioning support — load multiple model variants and route traffic between them via a config-driven A/B split or shadow mode

Quick Start

Installation

Clone the repository:

git clone https://github.com/yourusername/ml-serving-engine.git
cd ml-serving-engine

Install dependencies:
```
pip install -r requirements.txt
```

Running the Engine

With default configuration (looks for model.joblib in current directory):
```
python run_engine.py
```

With a configuration file:

python run_engine.py --config config/default_config.yaml

With command-line arguments:

python run_engine.py --model_path ./models/my_model.joblib --host 0.0.0.0 --port 8000

Using the API

Once the engine is running, you can make predictions via the API:

curl -X POST "http://localhost:5000/predict" \
  -H "Content-Type: application/json" \
  -d '{"data": [5.1, 3.5, 1.4, 0.2]}'

Response:

{
  "prediction": [0]
}

You can also visit the auto-generated API documentation at: http://localhost:5000/docs

Configuration Options

Option	Description	Default
`model_path`	Path to the model file	`./model.joblib`
`model_format`	Format of the model file (auto-detect if not specified)	`null`
`host`	Host address to bind the server	`127.0.0.1`
`port`	Port to bind the server	`5000`
`log_level`	Logging level	`INFO`

API Endpoints

GET / - Root endpoint (health check)
GET /health - Health check endpoint
POST /predict - Make predictions using the loaded model

Example YAML Configuration

model_path: "./models/classifier.joblib"
model_format: "joblib"

host: "0.0.0.0"
port: 8000

log_level: "INFO"

Creating a Sample Model

Here's a simple example of creating and saving a model for use with this engine:

from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
import joblib

iris = load_iris()
X, y = iris.data, iris.target

model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X, y)

joblib.dump(model, "model.joblib")

Phases

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
__pycache__		__pycache__
config		config
plugin		plugin
src		src
tests		tests
.gitignore		.gitignore
DOCKERFILE		DOCKERFILE
README.md		README.md
create_sample_model.py		create_sample_model.py
ml_engine_feature_plan.svg		ml_engine_feature_plan.svg
model.joblib		model.joblib
requirement.txt		requirement.txt
run_engine.py		run_engine.py
tmp_test_write.txt		tmp_test_write.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML Model Serving Engine

Features

Quick Start

Installation

Running the Engine

Using the API

Configuration Options

API Endpoints

Example YAML Configuration

Creating a Sample Model

Phases

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ML Model Serving Engine

Features

Quick Start

Installation

Running the Engine

Using the API

Configuration Options

API Endpoints

Example YAML Configuration

Creating a Sample Model

Phases

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages