A visual image search system that retrieves and ranks images similar to a query image using CLIP embeddings and FAISS vector search.
- CLIP Embeddings: Uses OpenAI's CLIP model for state-of-the-art image understanding
- FAISS Vector Search: Fast exact nearest neighbor search with IndexFlatL2
- Abstract Storage: Pluggable storage backend (local filesystem, extensible to S3)
- Reranking Pipeline: Score normalization, filtering, and diversity controls
- Offline Evaluation: Comprehensive metrics (Recall@K, Precision@K, mAP, MRR)
- TDD Design: Full test coverage with unit and integration tests
flowchart TB
subgraph IndexingPipeline["Indexing Pipeline"]
direction TB
IP1["Image Input"] --> IP2["Object Storage"]
IP2 --> IP3["Image Preprocessor"]
IP3 --> IP4["CLIP Embedding Service"]
IP4 --> IP5["FAISS Index Table"]
end
subgraph PredictionPipeline["Prediction Pipeline"]
direction TB
PP1["Query Image"] --> PP2["Image Preprocessor"]
PP2 --> PP3["CLIP Embedding Service"]
PP3 --> PP4["Nearest Neighbor Search"]
PP4 --> PP5["Reranker"]
PP5 --> PP6["Search Results"]
end
subgraph Storage["Storage Layer"]
direction LR
S1[("Local Storage")]
S2[("FAISS Index")]
S3["Metadata Store"]
end
IP2 --> S1
IP5 --> S2
IP5 --> S3
PP4 --> S2
PP4 --> S3
| Pipeline | Components | Description |
|---|---|---|
| Indexing | Storage → Preprocessor → CLIP → FAISS | Processes images and builds searchable index |
| Prediction | Preprocessor → CLIP → NN Search → Reranker | Handles queries and returns ranked results |
- Python 3.10+
- 4GB+ RAM (for CLIP model)
# Clone repository
git clone <repository-url>
cd VisualSearch
# Create virtual environment
python -m venv venv
venv\Scripts\activate # Windows
source venv/bin/activate # Unix/macOS
# Install dependencies
pip install -r requirements.txt
pip install -e .from PIL import Image
from visual_search.indexing.indexing_service import IndexingService
from visual_search.indexing.storage.local_storage import LocalStorage
from visual_search.indexing.index_table import VectorIndex
from visual_search.prediction.embedding_service import EmbeddingService
# Initialize components
storage = LocalStorage(root_path="./data/images")
embedding_service = EmbeddingService() # Loads CLIP model
index = VectorIndex(dimension=512)
indexing_service = IndexingService(
storage=storage,
embedding_service=embedding_service,
index=index,
)
# Index an image
image = Image.open("photo.jpg")
indexing_service.index_image(image, "photo_001")
# Save index for later use
indexing_service.save_index("./data/index")from visual_search.prediction.nearest_neighbor import NearestNeighborSearch
from visual_search.prediction.reranking import Reranker
# Create search service
search_service = NearestNeighborSearch(index=index)
reranker = Reranker()
# Search by image ID
results = search_service.search_by_id(
image_id="photo_001",
query_id="search_001",
k=10,
exclude_query=True,
)
# Apply reranking
reranked = reranker.rerank(
results=results,
normalize=True, # Convert L2 distance to similarity
min_score=0.3, # Filter low-quality matches
top_k=5,
)
# Display results
for result in reranked.results:
print(f"{result.rank}. {result.image_id}: {result.score:.4f}")from visual_search.evaluation.evaluator import evaluate_model
# Create evaluation dataset (JSON format)
# {
# "queries": [
# {"query_id": "q1", "ground_truth": ["img1", "img2", "img3"]},
# {"query_id": "q2", "ground_truth": ["img4", "img5"]}
# ]
# }
result = evaluate_model(
search_service=search_service,
dataset_path="evaluation_data.json",
k_values=[1, 5, 10, 20],
output_path="results.json",
)
print(result.summary())
# Evaluation Results
# =================
# Queries: 100
# Recall@1: 0.450
# Recall@5: 0.720
# mAP: 0.520
# MRR: 0.610VisualSearch/
├── src/visual_search/
│ ├── models/ # Data models (Pydantic)
│ │ ├── image.py # ImageMetadata
│ │ ├── embedding.py # EmbeddingVector
│ │ └── search_result.py # SearchResult, SearchResults
│ ├── indexing/
│ │ ├── storage/ # Storage backends
│ │ │ ├── base.py # StorageBackend ABC
│ │ │ └── local_storage.py
│ │ ├── index_table.py # VectorIndex (FAISS wrapper)
│ │ └── indexing_service.py
│ ├── prediction/
│ │ ├── preprocessing.py # Image preprocessing
│ │ ├── embedding_service.py # CLIP embeddings
│ │ ├── nearest_neighbor.py # NN search
│ │ └── reranking.py # Result reranking
│ └── evaluation/
│ ├── metrics.py # Recall, Precision, mAP, MRR
│ └── evaluator.py # ModelEvaluator
├── tests/
│ ├── unit/ # Unit tests
│ └── integration/ # Integration tests
├── instructions/ # Documentation
│ ├── getting-started.md
│ ├── architecture.md
│ ├── api-reference.md
│ ├── configuration.md
│ ├── evaluation.md
│ └── testing.md
├── requirements.txt
├── requirements-dev.txt
└── pyproject.toml
- Model:
clip-ViT-B-32via sentence-transformers - Embedding Dimension: 512
- Input Size: 224×224 pixels
- Normalization: CLIP-specific mean/std values
- Type: IndexFlatL2 (exact L2 distance)
- Metric: Euclidean distance
- Memory: ~2KB per image (512 × 4 bytes)
| Metric | Description |
|---|---|
| Recall@K | Fraction of relevant items in top-K results |
| Precision@K | Fraction of top-K that are relevant |
| mAP | Mean Average Precision (ranking quality) |
| MRR | Mean Reciprocal Rank (first relevant position) |
# All tests
pytest
# With coverage
pytest --cov=src/visual_search --cov-report=html
# Unit tests only
pytest tests/unit/
# Integration tests
pytest tests/integration/See the instructions folder for detailed documentation:
- Getting Started - Installation and quick start
- Architecture - System design overview
- API Reference - Complete API documentation
- Configuration - Configuration options
- Evaluation - Model evaluation guide
- Testing - Testing practices
MIT License