Library for Top-k Approximate String Matching, autocomplete and spell checking.
The library was mostly inspired by
- http://www.chokkan.org/software/simstring/
- http://www.aaai.org/ocs/index.php/AAAI/AAAI10/paper/viewFile/1939/2234
- http://nlp.stanford.edu/IR-book/
- http://bazhenov.me/blog/2012/08/04/autocomplete.html
- http://www.aclweb.org/anthology/C10-1096
The library is organized into sub-packages under pkg/. Below are concrete examples for the most common use cases.
Find the top-K most similar strings from a dictionary:
import (
"context"
"fmt"
"log"
"github.com/suggest-go/suggest/pkg/suggest"
"github.com/suggest-go/suggest/pkg/store"
"github.com/suggest-go/suggest/pkg/metric"
)
func main() {
ctx := context.Background()
// Load a dictionary from disk
source, err := store.OpenStoreFromFile(ctx, "cars.txt")
if err != nil {
log.Fatalf("open store: %v", err)
}
defer source.Close()
// Configure the suggester: Jaro-Winkler distance, top-5 results
config := suggest.Config{
Source: source,
Metric: metric.NewJaroWinkler(),
SuggestAmount: 5,
}
suggester, err := suggest.New(config)
if err != nil {
log.Fatalf("create suggester: %v", err)
}
// Query
results, err := suggester.Suggest(ctx, "teslla model 3")
if err != nil {
log.Fatalf("suggest: %v", err)
}
for _, r := range results {
fmt.Printf(" %s (score=%.3f)\n", r.Value, r.Score)
}
}Detect and correct misspelled words based on a language model:
import (
"context"
"fmt"
"github.com/suggest-go/suggest/pkg/spellchecker"
)
func main() {
ctx := context.Background()
// Initialize from a pre-built language model directory
sc, err := spellchecker.New(ctx, "path/to/lm-folder")
if err != nil {
panic(err)
}
defer sc.Close()
// Check a single word
if suggestions, err := sc.Suggest(ctx, "recieve", 5); err == nil {
for _, s := range suggestions {
fmt.Printf(" %s\n", s.Value)
}
// Output: receive
}
}Implement your own similarity metric by satisfying the metric.Metric interface:
import "github.com/suggest-go/suggest/pkg/metric"
type MyMetric struct{}
func (m *MyMetric) Compare(a, b string) float64 {
// Return 1.0 for identical, 0.0 for unrelated
// ... your custom comparison here ...
return 0.0
}
func (m *MyMetric) IsSimilar(a, b string, threshold float64) bool {
return m.Compare(a, b) >= threshold
}Then plug it into suggest.Config{Metric: &MyMetric{}}.
The package ships with a built-in HTTP server. See cmd/suggest/service.go for an example. Quick start:
import (
"log"
"net/http"
"github.com/suggest-go/suggest/internal/http"
)
func main() {
handler, err := http.NewHandler("path/to/config.json")
if err != nil {
log.Fatal(err)
}
http.HandleFunc("/suggest", handler)
log.Fatal(http.ListenAndServe(":8080", nil))
}suggest is configured via a JSON file. Minimal example:
{
"name": "my-suggester",
"source": {
"type": "file",
"path": "data/items.txt"
},
"metric": "jaro-winkler",
"suggest_amount": 5,
"min_score": 0.5
}The schema is documented in pkg/suggest/config.go.
| Sub-package | Purpose |
|---|---|
pkg/suggest |
Core suggester engine (Top-K retrieval) |
pkg/spellchecker |
Context-aware spellchecking with language models |
pkg/store |
Storage backends (in-memory, file-based) |
pkg/metric |
Distance metrics (Jaro-Winkler, Levenshtein, Cosine) |
pkg/dictionary |
Dictionary loaders (plain text, gzip) |
pkg/index |
Inverted index for fast lookup |
pkg/mph |
Minimal perfect hashing |
pkg/vgram |
Variable-length n-grams |
pkg/lm |
Language model integration (KenLM) |
pkg/merger |
Result merging & deduplication |
pkg/compression |
Compact storage formats |
pkg/utils |
Shared helpers |
- Use
store.Memoryfor small dictionaries (<100k entries) — fastest - Use
store.Filefor large dictionaries — saves RAM - For spellchecking, use the pre-built
lm-foldershipped with the language model - For autocomplete at scale, batch queries with
SuggestBatch(ctx, queries)
See the documentation with examples demo and API documentation.
The demo shows an approximate string search in a vehicle dictionary with more than 2k model names.
You can also run it locally
$ make build
$ ./build/suggest eval -c pkg/suggest/testdata/config.json -d cars -s 0.5 -k 5
or by using Docker
$ make build-docker
$ docker run -p 8080:8080 -v $(pwd)/pkg/suggest/testdata:/data/testdata suggest /data/build/suggest service-run -c /data/testdata/config.json
Spellchecker recognizes a misspelled word based on the context of the surrounding words. In order to run a spellchecker demo, please do the next
- Download an English language model built on Blog Authorship Corpus
- Extract downloaded language model and perform
$ make build
$ ./build/./spellchecker eval -c lm-folder/config.json
When contributing to this repository, please first discuss the change you wish to make via issue, email, or any other method with the owners of this repository before making a change.

