B2B Lead Deduplication Β· CSV/XLSX/TXT Parser Β· Email Pattern Intelligence Β· Cross-File Duplicate Finder Β· Relationship Graph Β· Zero External Dependencies Β· Self-Hosted Β· Privacy-First
I'm currently working in B2B sales, and every single day I deal with the same frustrating problem:
Multiple Excel sheets. Multiple CSV exports. Multiple formats. Same contacts scattered everywhere.
- CRM exports emails as
first.last@company.com - LinkedIn exports them as
firstlast@company.com - Marketing lists use
f.last@company.com - Some sheets have the email in column B, others in column F, others don't even have a header row
I was spending hours every week manually cross-referencing spreadsheets, copy-pasting emails into Ctrl+F, trying to figure out:
- "Did I already reach out to this person?"
- "Is this the same John Doe from the other list?"
- "How many duplicates am I wasting outreach on?"
The manual process was destroying my productivity and causing real errors β duplicate outreach, missed leads, embarrassing double-emails to the same prospect.
So I built MultipChecker. Upload all your sheets, and it instantly:
- Finds exact duplicates across all files
- Detects same-person-different-format emails (smart pattern matching)
- Shows you a visual relationship graph of your entire contact network
- Groups contacts by domain, company, and LinkedIn profile
One upload. Zero manual work. No more spreadsheet hell.
Upload & Parse β Instant stats on records and emails
5-Tier Email Search β Exact, Username, Domain, Pattern, Similar matches
Record Detail Drawer β Full email analysis, pattern detection, record data
Domain Grouping β All contacts organized by email domain
Relationship Graph β Interactive visualization of email, domain, username & company connections
| Category | Capability |
|---|---|
| File Parsing | Upload & parse .csv, .xlsx, .txt with concurrent multi-file processing |
| Email Search | 5-tier matching: Exact β Username β Domain β Pattern β Levenshtein similarity |
| Duplicate Detection | Cross-file & intra-file duplicate identification (unique / duplicate / cross-file) |
| Domain Intelligence | Domain grouping with SLD-aware matching (e.g. acme.com β acme.co.uk) |
| Company Search | Auto-detects company/organization columns and enables fuzzy company search |
| LinkedIn Search | Auto-detects LinkedIn profile URLs from headers and raw cell data |
| Relationship Graph | Interactive canvas graph β nodes: email, domain, username, company; BFS chain tracing |
| Multi-Tenant | Session-isolated in-memory stores via X-Session-ID header |
| Single Binary | Embedded web UI via Go embed β no external static files needed |
| TLS Encryption | Optional HTTPS mode via TLS_CERT + TLS_KEY environment variables |
| Privacy-First | All data stays in memory β nothing written to disk, no telemetry, no tracking |
| Docker Ready | Multi-stage Dockerfile for minimal production images (~15 MB) |
# Clone the repo
git clone https://github.com/CODECZERO/MultipChecker.git
cd MultipChecker
# Install dependencies
go mod tidy
# Run the server
go run .Open your browser β http://localhost:8080
# Build and run
docker build -t multipchecker .
docker run -p 8080:8080 multipchecker# Build a standalone binary
go build -o multipchecker .
# Run it
./multipcheckerAll configuration is done via environment variables. Create a .env file (see .env.example):
# Server port (default: 8080)
PORT=8080
# Bind address (default: 0.0.0.0)
HOST=0.0.0.0
# TLS/HTTPS (leave empty for plain HTTP)
TLS_CERT=./cert.pem
TLS_KEY=./key.pemOr pass directly:
# Custom port
PORT=3000 go run .
# Via flag
go run . -port 3000
# With TLS encryption
TLS_CERT=cert.pem TLS_KEY=key.pem go run .Generate a self-signed certificate:
openssl req -x509 -newkey rsa:4096 \
-keyout key.pem -out cert.pem \
-days 365 -nodes \
-subj "/CN=localhost"Then set the env vars:
TLS_CERT=cert.pem TLS_KEY=key.pem go run .
# β π MultipChecker running (HTTPS) β https://localhost:8080MultipChecker is a local-first tool. Here's exactly what travels between your browser and the server:
| Direction | Endpoint | Data Sent |
|---|---|---|
| Browser β Server | POST /api/upload |
Your raw CSV/XLSX/TXT file contents (multipart form) |
| Browser β Server | POST /api/search/* |
Your search query (email, text, domain, company name, LinkedIn URL) |
| Server β Browser | All responses | Parsed records, matched emails, field data, graph nodes/edges |
| Browser β Server | All requests | X-Session-ID header (client-generated UUID for session isolation) |
| Layer | Protection |
|---|---|
| In Transit | Optional TLS/HTTPS encryption (set TLS_CERT + TLS_KEY) |
| At Rest | No data is ever written to disk β everything is in-memory only |
| Session Isolation | Each browser session gets its own isolated data store |
| CORS | Permissive Access-Control-Allow-Origin: * (configurable for production) |
| Upload Limits | 200 MB max request size, 50 MB multipart buffer |
| No Telemetry | Zero analytics, no phone-home, no tracking β fully offline capable |
| No Auth Required | Runs on your local machine β no accounts, no cloud dependencies |
β οΈ No authentication β designed for local/internal use. Don't expose to the public internet without a reverse proxy + auth.- β No persistent storage β server restart = clean slate. Your data is never saved.
- β No external requests β the server makes zero outbound network calls.
If you're on a corporate network that blocks custom ports:
# Run on port 80 (may require sudo)
sudo PORT=80 go run .
# Or use port 443 with TLS
sudo PORT=443 TLS_CERT=cert.pem TLS_KEY=key.pem go run .If MultipChecker runs on a remote server and you need to access it through a restricted network:
# From your local machine β forward local:8080 to remote:8080
ssh -L 8080:localhost:8080 user@your-server.com
# Then open http://localhost:8080 on your local machineExpose your local instance with a public HTTPS URL:
# Install ngrok: https://ngrok.com
ngrok http 8080
# β https://abc123.ngrok-free.app (share this URL)# Install cloudflared
cloudflared tunnel --url http://localhost:8080# Map to any external port
docker run -p 443:8080 multipchecker
# Bind to specific interface
docker run -p 127.0.0.1:8080:8080 multipcheckerserver {
listen 443 ssl;
server_name checker.yourdomain.com;
ssl_certificate /path/to/cert.pem;
ssl_certificate_key /path/to/key.pem;
location / {
proxy_pass http://127.0.0.1:8080;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
client_max_body_size 200M;
}
}All endpoints accept/return JSON. Use X-Session-ID header for session isolation.
POST /api/upload
Content-Type: multipart/form-data
Body: files[] = your-file.csv, contacts.xlsx, ...
Response: [
{ "filename": "leads.csv", "records": 150, "emails": 142, "skipped_rows": 3, "warnings": [] }
]
POST /api/search/email β {"email": "john@acme.com"} β SearchResult (5-tier matches)
POST /api/search/text β {"query": "John Doe"} β Record[]
POST /api/search/domain β {"domain": "acme.com"} β Record[]
POST /api/search/company β {"company": "Acme Corp"} β Record[]
POST /api/search/linkedin β {"url": "linkedin.com/in/john"} β Record[]
GET /api/duplicates β DuplicateGroup[] (emails appearing in 2+ records)
GET /api/domains β { "acme.com": Record[], "gmail.com": Record[], ... }
GET /api/record/{id} β Record (single record by ID)
GET /api/graph β { nodes: GraphNode[], edges: GraphEdge[] }
GET /api/stats β { fileCount, recordCount, emailCount, dupCount }
DELETE /api/clear β Wipes all data in current session
| Method | Endpoint | Body | Response | Description |
|---|---|---|---|---|
POST |
/api/upload |
multipart/form-data files[] |
UploadResult[] |
Upload & parse files |
POST |
/api/search/email |
{"email":"..."} |
SearchResult |
5-tier email intelligence |
POST |
/api/search/text |
{"query":"..."} |
Record[] |
Full-text search |
POST |
/api/search/domain |
{"domain":"..."} |
Record[] |
Domain search |
POST |
/api/search/company |
{"company":"..."} |
Record[] |
Company search |
POST |
/api/search/linkedin |
{"url":"..."} |
Record[] |
LinkedIn search |
GET |
/api/duplicates |
β | DuplicateGroup[] |
Duplicate groups |
GET |
/api/domains |
β | map[domain]Record[] |
Domain grouping |
GET |
/api/record/{id} |
β | Record |
Single record |
GET |
/api/graph |
β | GraphData |
Relationship graph |
GET |
/api/stats |
β | StoreStats |
Aggregate stats |
DELETE |
/api/clear |
β | {"ok":true} |
Clear session data |
MultipChecker/
βββ main.go # HTTP server, routing, CORS, TLS, upload handler, .env loader
βββ go.mod # Go module definition
βββ go.sum # Dependency checksums
βββ .env # Local env config (git-ignored)
βββ .env.example # Env var template (committed)
βββ .gitignore # Git ignore rules
βββ Dockerfile # Multi-stage Docker build
βββ LICENSE # MIT License
βββ README.md # This file
β
βββ email/
β βββ email.go # Email parser β decomposition, pattern detection, Levenshtein distance
β
βββ parser/
β βββ csv.go # CSV parser with smart header detection & email extraction
β βββ xlsx.go # XLSX parser via excelize
β βββ txt.go # Plain text / tab-delimited parser
β
βββ store/
β βββ store.go # In-memory store β indexes, search, graph builder, session manager
β
βββ static/
βββ index.html # Embedded single-page web UI (dark theme, interactive graph)
βββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββ
β Browser β TLS β HTTP Server β β StoreManager β
β (index.html) βββββββΆβ (main.go) βββββββΆβ (per-session) β
β ββββββββ CORS + TLS ββββββββ β
βββββββββββββββββββ ββββββββββ¬ββββββββββ ββββββββββ¬ββββββββββ
β β
βββββββββββββββΌβββββββββββ ββββββββββββΌββββββββββ
β Parsers β β Store β
β CSV Β· XLSX Β· TXT β β Indexes: β
β (concurrent upload) β β β’ emailIdx β
βββββββββββββββ¬βββββββββββ β β’ domainIdx β
β β β’ companyIdx β
βββββββββββββββΌβββββββββββ β β’ linkedinIdx β
β Email Parser β β β’ records (map) β
β Pattern Detection β ββββββββββββββββββββββ
β Levenshtein Distance β
ββββββββββββββββββββββββββ
Data Flow:
- Upload β Files parsed concurrently by format-specific parsers β emails auto-extracted
- Index β Records indexed across 4 dimensions: email, domain, company, LinkedIn
- Search β 5-tier cascade: exact β username β domain β pattern β Levenshtein (edit distance β€ 2)
- Graph β Relationship graph built from indexes (email β domain β username β company)
- Session β Each
X-Session-IDgets an isolated store β multi-user safe
MultipChecker's email parser detects these patterns automatically:
| Pattern | Example | Description |
|---|---|---|
first.last |
john.doe@acme.com |
Dot-separated first and last name |
first_last |
john_doe@acme.com |
Underscore-separated |
initial.last |
j.doe@acme.com |
Single initial + last name |
firstlast |
johndoesmith@acme.com |
Concatenated (8+ chars) |
name+number |
john42@acme.com |
Name with trailing digits |
username |
jdoe@acme.com |
Single-word username |
custom |
anything else | Unclassified pattern |
Query: "john.doe@acme.com"
β
ββ 1. EXACT β john.doe@acme.com (direct match)
ββ 2. USERNAME β john.doe@otherdomain.com (same local part)
ββ 3. DOMAIN β jane.smith@acme.com (same domain + SLD cross-match)
ββ 4. PATTERN β mike.jones@acme.com (same pattern "first.last" on same SLD)
ββ 5. SIMILAR β john.deo@acme.com (Levenshtein distance β€ 2)
This is what makes MultipChecker actually useful for B2B β it catches the duplicates that simple Ctrl+F never will.
- Language: Go 1.22+
- XLSX Parsing: excelize/v2
- Frontend: Vanilla HTML/CSS/JS (embedded via
go:embed) β dark theme with Inter + JetBrains Mono - Graph Engine: Canvas-based with Barnes-Hut quadtree simulation
- Containerization: Docker (Alpine-based multi-stage build)
- TLS: Native Go
crypto/tls(no external dependency)
- Fork the repo
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'feat: add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Export results as CSV/XLSX
- Email validation (MX record check)
- Persistent SQLite storage option
- Authentication/user management
- REST API rate limiting
- Webhook notifications for duplicate alerts
This project is licensed under the MIT License β see the LICENSE file for details.
b2b sales tool Β· email deduplication Β· CSV parser Β· XLSX parser Β· duplicate detection Β· email finder Β· lead management Β· sales operations Β· contact deduplication Β· email intelligence Β· data cleaning Β· golang Β· self-hosted Β· open source Β· CRM tool Β· sales automation Β· email verification Β· cross-reference tool Β· spreadsheet analysis Β· lead enrichment Β· data quality Β· email pattern detection Β· fuzzy matching Β· levenshtein distance Β· relationship graph Β· network analysis Β· contact management Β· sales productivity Β· outreach tool Β· prospecting tool
Built by CODECZERO Β· β Star this repo if it saves you time!