Skip to content

CODECZERO/MultipChecker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

13 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

MultipChecker

Multi-Format Email Intelligence, Duplicate Detection & Cross-File Search Tool for B2B Sales Teams

License: MIT Go Docker Self Hosted TLS

B2B Lead Deduplication Β· CSV/XLSX/TXT Parser Β· Email Pattern Intelligence Β· Cross-File Duplicate Finder Β· Relationship Graph Β· Zero External Dependencies Β· Self-Hosted Β· Privacy-First


πŸ’‘ Why I Built This

I'm currently working in B2B sales, and every single day I deal with the same frustrating problem:

Multiple Excel sheets. Multiple CSV exports. Multiple formats. Same contacts scattered everywhere.

  • CRM exports emails as first.last@company.com
  • LinkedIn exports them as firstlast@company.com
  • Marketing lists use f.last@company.com
  • Some sheets have the email in column B, others in column F, others don't even have a header row

I was spending hours every week manually cross-referencing spreadsheets, copy-pasting emails into Ctrl+F, trying to figure out:

  • "Did I already reach out to this person?"
  • "Is this the same John Doe from the other list?"
  • "How many duplicates am I wasting outreach on?"

The manual process was destroying my productivity and causing real errors β€” duplicate outreach, missed leads, embarrassing double-emails to the same prospect.

So I built MultipChecker. Upload all your sheets, and it instantly:

  • Finds exact duplicates across all files
  • Detects same-person-different-format emails (smart pattern matching)
  • Shows you a visual relationship graph of your entire contact network
  • Groups contacts by domain, company, and LinkedIn profile

One upload. Zero manual work. No more spreadsheet hell.


πŸ“Έ Screenshots

Upload & Parse β€” Instant stats on records and emails

File Upload View

5-Tier Email Search β€” Exact, Username, Domain, Pattern, Similar matches

Email Search Results

Record Detail Drawer β€” Full email analysis, pattern detection, record data

Detail Drawer View

Domain Grouping β€” All contacts organized by email domain

Domain Grouping View

Relationship Graph β€” Interactive visualization of email, domain, username & company connections

Relationship Graph View


✨ Features

Category Capability
File Parsing Upload & parse .csv, .xlsx, .txt with concurrent multi-file processing
Email Search 5-tier matching: Exact β†’ Username β†’ Domain β†’ Pattern β†’ Levenshtein similarity
Duplicate Detection Cross-file & intra-file duplicate identification (unique / duplicate / cross-file)
Domain Intelligence Domain grouping with SLD-aware matching (e.g. acme.com ↔ acme.co.uk)
Company Search Auto-detects company/organization columns and enables fuzzy company search
LinkedIn Search Auto-detects LinkedIn profile URLs from headers and raw cell data
Relationship Graph Interactive canvas graph β€” nodes: email, domain, username, company; BFS chain tracing
Multi-Tenant Session-isolated in-memory stores via X-Session-ID header
Single Binary Embedded web UI via Go embed β€” no external static files needed
TLS Encryption Optional HTTPS mode via TLS_CERT + TLS_KEY environment variables
Privacy-First All data stays in memory β€” nothing written to disk, no telemetry, no tracking
Docker Ready Multi-stage Dockerfile for minimal production images (~15 MB)

πŸš€ How to Run

Prerequisites

Option 1: Run Locally

# Clone the repo
git clone https://github.com/CODECZERO/MultipChecker.git
cd MultipChecker

# Install dependencies
go mod tidy

# Run the server
go run .

Open your browser β†’ http://localhost:8080

Option 2: Docker

# Build and run
docker build -t multipchecker .
docker run -p 8080:8080 multipchecker

Option 3: Build Binary

# Build a standalone binary
go build -o multipchecker .

# Run it
./multipchecker

Custom Configuration

All configuration is done via environment variables. Create a .env file (see .env.example):

# Server port (default: 8080)
PORT=8080

# Bind address (default: 0.0.0.0)
HOST=0.0.0.0

# TLS/HTTPS (leave empty for plain HTTP)
TLS_CERT=./cert.pem
TLS_KEY=./key.pem

Or pass directly:

# Custom port
PORT=3000 go run .

# Via flag
go run . -port 3000

# With TLS encryption
TLS_CERT=cert.pem TLS_KEY=key.pem go run .

Enable HTTPS Encryption

Generate a self-signed certificate:

openssl req -x509 -newkey rsa:4096 \
  -keyout key.pem -out cert.pem \
  -days 365 -nodes \
  -subj "/CN=localhost"

Then set the env vars:

TLS_CERT=cert.pem TLS_KEY=key.pem go run .
# β†’ πŸ”’ MultipChecker running (HTTPS) β†’ https://localhost:8080

πŸ”’ Data Encryption & Security

What Data Crosses the Wire

MultipChecker is a local-first tool. Here's exactly what travels between your browser and the server:

Direction Endpoint Data Sent
Browser β†’ Server POST /api/upload Your raw CSV/XLSX/TXT file contents (multipart form)
Browser β†’ Server POST /api/search/* Your search query (email, text, domain, company name, LinkedIn URL)
Server β†’ Browser All responses Parsed records, matched emails, field data, graph nodes/edges
Browser β†’ Server All requests X-Session-ID header (client-generated UUID for session isolation)

Security Model

Layer Protection
In Transit Optional TLS/HTTPS encryption (set TLS_CERT + TLS_KEY)
At Rest No data is ever written to disk β€” everything is in-memory only
Session Isolation Each browser session gets its own isolated data store
CORS Permissive Access-Control-Allow-Origin: * (configurable for production)
Upload Limits 200 MB max request size, 50 MB multipart buffer
No Telemetry Zero analytics, no phone-home, no tracking β€” fully offline capable
No Auth Required Runs on your local machine β€” no accounts, no cloud dependencies

Threat Model

  • ⚠️ No authentication β€” designed for local/internal use. Don't expose to the public internet without a reverse proxy + auth.
  • βœ… No persistent storage β€” server restart = clean slate. Your data is never saved.
  • βœ… No external requests β€” the server makes zero outbound network calls.

🌐 Network Bypass & Remote Access

Behind Corporate Proxy/Firewall

If you're on a corporate network that blocks custom ports:

# Run on port 80 (may require sudo)
sudo PORT=80 go run .

# Or use port 443 with TLS
sudo PORT=443 TLS_CERT=cert.pem TLS_KEY=key.pem go run .

SSH Tunnel (Access from Anywhere)

If MultipChecker runs on a remote server and you need to access it through a restricted network:

# From your local machine β€” forward local:8080 to remote:8080
ssh -L 8080:localhost:8080 user@your-server.com

# Then open http://localhost:8080 on your local machine

ngrok (Quick Public URL)

Expose your local instance with a public HTTPS URL:

# Install ngrok: https://ngrok.com
ngrok http 8080

# β†’ https://abc123.ngrok-free.app (share this URL)

Cloudflare Tunnel (Production)

# Install cloudflared
cloudflared tunnel --url http://localhost:8080

Docker Port Forwarding

# Map to any external port
docker run -p 443:8080 multipchecker

# Bind to specific interface
docker run -p 127.0.0.1:8080:8080 multipchecker

Reverse Proxy (Nginx)

server {
    listen 443 ssl;
    server_name checker.yourdomain.com;

    ssl_certificate     /path/to/cert.pem;
    ssl_certificate_key /path/to/key.pem;

    location / {
        proxy_pass http://127.0.0.1:8080;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        client_max_body_size 200M;
    }
}

πŸ“‘ API Reference & Data Flow

All endpoints accept/return JSON. Use X-Session-ID header for session isolation.

Upload

POST /api/upload
Content-Type: multipart/form-data
Body: files[] = your-file.csv, contacts.xlsx, ...

Response: [
  { "filename": "leads.csv", "records": 150, "emails": 142, "skipped_rows": 3, "warnings": [] }
]

Search

POST /api/search/email    β†’ {"email": "john@acme.com"}     β†’ SearchResult (5-tier matches)
POST /api/search/text     β†’ {"query": "John Doe"}          β†’ Record[]
POST /api/search/domain   β†’ {"domain": "acme.com"}         β†’ Record[]
POST /api/search/company  β†’ {"company": "Acme Corp"}       β†’ Record[]
POST /api/search/linkedin β†’ {"url": "linkedin.com/in/john"} β†’ Record[]

Data Views

GET /api/duplicates  β†’ DuplicateGroup[] (emails appearing in 2+ records)
GET /api/domains     β†’ { "acme.com": Record[], "gmail.com": Record[], ... }
GET /api/record/{id} β†’ Record (single record by ID)
GET /api/graph       β†’ { nodes: GraphNode[], edges: GraphEdge[] }
GET /api/stats       β†’ { fileCount, recordCount, emailCount, dupCount }

Session Management

DELETE /api/clear    β†’ Wipes all data in current session

Full Endpoint Table

Method Endpoint Body Response Description
POST /api/upload multipart/form-data files[] UploadResult[] Upload & parse files
POST /api/search/email {"email":"..."} SearchResult 5-tier email intelligence
POST /api/search/text {"query":"..."} Record[] Full-text search
POST /api/search/domain {"domain":"..."} Record[] Domain search
POST /api/search/company {"company":"..."} Record[] Company search
POST /api/search/linkedin {"url":"..."} Record[] LinkedIn search
GET /api/duplicates β€” DuplicateGroup[] Duplicate groups
GET /api/domains β€” map[domain]Record[] Domain grouping
GET /api/record/{id} β€” Record Single record
GET /api/graph β€” GraphData Relationship graph
GET /api/stats β€” StoreStats Aggregate stats
DELETE /api/clear β€” {"ok":true} Clear session data

πŸ“ Project Structure

MultipChecker/
β”œβ”€β”€ main.go              # HTTP server, routing, CORS, TLS, upload handler, .env loader
β”œβ”€β”€ go.mod               # Go module definition
β”œβ”€β”€ go.sum               # Dependency checksums
β”œβ”€β”€ .env                 # Local env config (git-ignored)
β”œβ”€β”€ .env.example         # Env var template (committed)
β”œβ”€β”€ .gitignore           # Git ignore rules
β”œβ”€β”€ Dockerfile           # Multi-stage Docker build
β”œβ”€β”€ LICENSE              # MIT License
β”œβ”€β”€ README.md            # This file
β”‚
β”œβ”€β”€ email/
β”‚   └── email.go         # Email parser β€” decomposition, pattern detection, Levenshtein distance
β”‚
β”œβ”€β”€ parser/
β”‚   β”œβ”€β”€ csv.go           # CSV parser with smart header detection & email extraction
β”‚   β”œβ”€β”€ xlsx.go          # XLSX parser via excelize
β”‚   └── txt.go           # Plain text / tab-delimited parser
β”‚
β”œβ”€β”€ store/
β”‚   └── store.go         # In-memory store β€” indexes, search, graph builder, session manager
β”‚
└── static/
    └── index.html       # Embedded single-page web UI (dark theme, interactive graph)

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚     Browser     β”‚ TLS  β”‚   HTTP Server    β”‚      β”‚  StoreManager    β”‚
β”‚   (index.html)  │─────▢│   (main.go)      │─────▢│  (per-session)   β”‚
β”‚                 │◀─────│   CORS + TLS     │◀─────│                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                  β”‚                          β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚       Parsers          β”‚    β”‚      Store         β”‚
                    β”‚  CSV Β· XLSX Β· TXT      β”‚    β”‚   Indexes:         β”‚
                    β”‚  (concurrent upload)   β”‚    β”‚   β€’ emailIdx       β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚   β€’ domainIdx      β”‚
                                  β”‚               β”‚   β€’ companyIdx     β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚   β€’ linkedinIdx    β”‚
                    β”‚    Email Parser        β”‚    β”‚   β€’ records (map)  β”‚
                    β”‚  Pattern Detection     β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β”‚  Levenshtein Distance  β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Data Flow:

  1. Upload β†’ Files parsed concurrently by format-specific parsers β†’ emails auto-extracted
  2. Index β†’ Records indexed across 4 dimensions: email, domain, company, LinkedIn
  3. Search β†’ 5-tier cascade: exact β†’ username β†’ domain β†’ pattern β†’ Levenshtein (edit distance ≀ 2)
  4. Graph β†’ Relationship graph built from indexes (email ↔ domain ↔ username ↔ company)
  5. Session β†’ Each X-Session-ID gets an isolated store β€” multi-user safe

πŸ” Email Intelligence Engine

MultipChecker's email parser detects these patterns automatically:

Pattern Example Description
first.last john.doe@acme.com Dot-separated first and last name
first_last john_doe@acme.com Underscore-separated
initial.last j.doe@acme.com Single initial + last name
firstlast johndoesmith@acme.com Concatenated (8+ chars)
name+number john42@acme.com Name with trailing digits
username jdoe@acme.com Single-word username
custom anything else Unclassified pattern

The 5-Tier Search Cascade

Query: "john.doe@acme.com"
β”‚
β”œβ”€ 1. EXACT       β†’ john.doe@acme.com (direct match)
β”œβ”€ 2. USERNAME    β†’ john.doe@otherdomain.com (same local part)
β”œβ”€ 3. DOMAIN      β†’ jane.smith@acme.com (same domain + SLD cross-match)
β”œβ”€ 4. PATTERN     β†’ mike.jones@acme.com (same pattern "first.last" on same SLD)
└─ 5. SIMILAR     β†’ john.deo@acme.com (Levenshtein distance ≀ 2)

This is what makes MultipChecker actually useful for B2B β€” it catches the duplicates that simple Ctrl+F never will.


πŸ› οΈ Tech Stack

  • Language: Go 1.22+
  • XLSX Parsing: excelize/v2
  • Frontend: Vanilla HTML/CSS/JS (embedded via go:embed) β€” dark theme with Inter + JetBrains Mono
  • Graph Engine: Canvas-based with Barnes-Hut quadtree simulation
  • Containerization: Docker (Alpine-based multi-stage build)
  • TLS: Native Go crypto/tls (no external dependency)

🀝 Contributing

  1. Fork the repo
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'feat: add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Ideas for Contribution

  • Export results as CSV/XLSX
  • Email validation (MX record check)
  • Persistent SQLite storage option
  • Authentication/user management
  • REST API rate limiting
  • Webhook notifications for duplicate alerts

πŸ“„ License

This project is licensed under the MIT License β€” see the LICENSE file for details.


🏷️ Keywords

b2b sales tool Β· email deduplication Β· CSV parser Β· XLSX parser Β· duplicate detection Β· email finder Β· lead management Β· sales operations Β· contact deduplication Β· email intelligence Β· data cleaning Β· golang Β· self-hosted Β· open source Β· CRM tool Β· sales automation Β· email verification Β· cross-reference tool Β· spreadsheet analysis Β· lead enrichment Β· data quality Β· email pattern detection Β· fuzzy matching Β· levenshtein distance Β· relationship graph Β· network analysis Β· contact management Β· sales productivity Β· outreach tool Β· prospecting tool


Built by CODECZERO · ⭐ Star this repo if it saves you time!