Skip to content

bkahan/sphereQL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

359 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

sphereQL

CI License: MIT Crates.io PyPI

Project high-dimensional embeddings onto a 3D sphere for fast semantic search, spatial queries, category-aware exploration, and interactive visualization.

sphereQL maps vectors from any embedding model (OpenAI, Cohere, sentence-transformers, etc.) onto spherical coordinates via one of four projection families — linear PCA, kernel PCA with a Gaussian (RBF) kernel, Laplacian eigenmap over a k-NN similarity graph, or random projection — then indexes them with shell/sector partitioning for fast nearest-neighbor lookups. A Category Enrichment Layer computes inter-category relationships, classifies bridges (Genuine / OverlapArtifact / Weak), and builds inner spheres for high-resolution within-category search. sphereQL auto-tunes its pipeline per corpus against a scalar QualityMetric; a meta-model recalls winning configs from past tuner runs when a new corpus arrives. Callable from Rust, Python, or the browser via WASM.

How sphereQL compares to vector databases

sphereQL is not a drop-in replacement for FAISS, Qdrant, or pgvector. Those systems return the most similar vectors. sphereQL returns the most similar vectors plus a 3D coordinate system you can visualize, navigate, and reason about geometrically.

FAISS Qdrant / Milvus / Weaviate pgvector sphereQL
ANN nearest-neighbor yes yes yes yes (via 3D sphere index)
Hosted service no yes yes (Postgres) no — embed as library
Filtering on metadata manual yes yes (SQL) yes
3D positions for visualization no no no yes
Spatial queries (cone, cap, shell, wedge) no no no yes
Category bridges + inner spheres no no no yes
Auto-tuned per corpus no no no yes
Returns full-d cosine accuracy yes yes yes only via hybrid re-rank

Use a vector DB when you need scale-out k-NN over millions of vectors, durable hosted storage, or multi-tenant access control. Use sphereQL when you need 3D layouts for visualization, spatial queries beyond k-NN, category-aware exploration, or an in-process embedded library that runs in the browser via WASM.

The two compose: sphereQL ships Pinecone and Qdrant backends in sphereql-vectordb so you can keep authoritative vectors in a hosted store and use sphereQL for projection, layout, and category enrichment.

Documentation

Full documentation lives under docs/.

Install

# Cargo.toml
[dependencies]
sphereql = { version = "0.2.0-alpha", features = ["full"] }
# Python
pip install sphereql

See architecture.md for feature-flag details.

Rust — minimal example

use sphereql::embed::*;

// 1. Build a pipeline from categorized embeddings.
let input = PipelineInput {
    categories: vec![
        "science".into(), "science".into(),
        "cooking".into(), "cooking".into(),
    ],
    embeddings: vec![
        vec![0.1, 0.9, 0.3, 0.0],
        vec![0.2, 0.8, 0.4, 0.1],
        vec![0.9, 0.1, 0.0, 0.5],
        vec![0.8, 0.2, 0.1, 0.4],
    ],
};
let pipeline = SphereQLPipeline::new(input).unwrap();

// 2. Query nearest neighbors.
let query = PipelineQuery { embedding: vec![0.15, 0.85, 0.35, 0.05] };
let results = pipeline.query(SphereQLQuery::Nearest { k: 3 }, &query);

See the Rust quickstart for spatial indexing, the layout engine, GraphQL, and the full embedding pipeline. auto-tuning.md covers the PipelineConfig + auto_tune + MetaModel workflow end-to-end.

Python — minimal example

import sphereql

categories = ["science", "science", "cooking", "cooking"]
embeddings = [
    [0.1, 0.9, 0.3, 0.0],
    [0.2, 0.8, 0.4, 0.1],
    [0.9, 0.1, 0.0, 0.5],
    [0.8, 0.2, 0.1, 0.4],
]

pipeline = sphereql.Pipeline(categories, embeddings)
results = pipeline.nearest([0.15, 0.85, 0.35, 0.05], k=3)

# Interactive 3D visualization in your browser
sphereql.visualize(categories, embeddings, title="My Embeddings")

The Python bindings cover the full Rust surface — PCA, Kernel PCA, Laplacian eigenmap, auto_tune, MetaModel, FeedbackAggregator, and the category enrichment layer. Type stubs (.pyi) are auto-generated via pyo3-stub-gen. See the Python quickstart for semantic search, 3D visualization, vector database bridges, and the core type surface.

WASM — minimal example

cd sphereql-wasm && wasm-pack build --target web
import init, { Pipeline } from './pkg/sphereql_wasm.js';
await init();

const pipeline = new Pipeline(JSON.stringify({
  categories: ["science", "cooking"],
  embeddings: [[0.1, 0.9, 0.3], [0.9, 0.1, 0.0]],
}));
const results = pipeline.nearest(JSON.stringify([0.15, 0.85, 0.35]), 1);

Same bindings coverage as Python. Every pipeline / category / metalearning method returns typed values via tsifywasm-pack build emits a .d.ts with named interfaces, no JSON.parse required. See the WASM quickstart for category enrichment in the browser.

Workspace layout

Crate Role
sphereql Umbrella crate with feature flags for selective imports.
sphereql-core Spherical math — points, conversions, distance metrics, region types.
sphereql-index Spatial indexing with shell + sector partitioning.
sphereql-layout Layout engines (Fibonacci spiral, k-means, force-directed).
sphereql-embed Projections, query pipeline, Category Enrichment Layer, metalearning framework.
sphereql-graphql async-graphql schema: spatial queries (cone/shell/band/wedge), the full category enrichment surface, subscriptions, and a pluggable TextEmbedder trait for text query input.
sphereql-vectordb Vector store bridge (InMemory, Qdrant, Pinecone) with hybrid search.
sphereql-python Python bindings via PyO3/maturin.
sphereql-wasm WASM bindings via wasm-bindgen.
sphereql-corpus Shared example corpora: 775-concept built-in (31 academic domains) and 300-concept stress corpus, plus bulk-ingested parquet corpora from DBpedia (500K) and Wikidata (50K).

Full dependency graph and crate-by-crate description in architecture.md.

Project status

sphereQL is at v0.2.0-alpha. The core API is functional and covered by 450+ tests, but may change before 1.0. Known limitations and roadmap are in project-status.md.

Binding parity is protected by a drift check (scripts/check-drift) — new public items in sphereql-embed / sphereql-layout must either have a Python/WASM binding or an allowlist entry with a reason in .bindings-ignore.toml.

Contributing

  1. Fork the repo and create a feature branch.
  2. Run cargo test --workspace --all-features and cargo clippy --workspace --all-features --all-targets.
  3. For Python changes, run cd sphereql-python && maturin develop && pytest -v.
  4. Open a PR against main.

The codebase uses Rust 2024 edition. All CI checks must pass before merge. See testing.md for the full pipeline.

License

MIT

About

Project high-dimensional embeddings onto a 3D sphere for fast semantic search, spatial queries, category-aware exploration, and interactive visualization.

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Contributors

Languages