waybackwhen

A multi-source passive URL enumerator that aggregates historical endpoints from the Wayback Machine, Common Crawl, AlienVault OTX, URLScan, VirusTotal, and more — all in one run.

What It Does

Runs five passive tools against a domain and merges the results into a single deduplicated output file. Useful for surface mapping, endpoint discovery, and parameter hunting without touching the target directly.

Sources covered:

Tool	Sources
waybackurls	Wayback Machine
gau	Wayback Machine, Common Crawl, AlienVault OTX, URLScan
waymore	Wayback Machine, URLScan, VirusTotal, Common Crawl
urlfinder	Wayback Machine, Common Crawl
paramspider	Wayback Machine (parameter-focused)

Tool parallelism: gau and urlfinder run in parallel (different backends). waybackurls, waymore, and paramspider run sequentially to avoid hammering the Wayback CDX API simultaneously.

Rate-limit resilience: every tool runs through a retry/backoff wrapper, domains that come back empty are requeued (a common symptom of being throttled), and an optional archive-lane semaphore caps how many domains hit the Wayback/archive tools at once even when overall concurrency is high. See Rate limiting & resilience.

Installation

Dependencies

Install the Go tools:

go install github.com/tomnomnom/waybackurls@latest
go install github.com/lc/gau/v2/cmd/gau@latest
go install github.com/projectdiscovery/urlfinder/cmd/urlfinder@latest

Install waymore:

git clone https://github.com/xnl-h4ck3r/waymore.git ~/tools/waymore
pip install -r ~/tools/waymore/requirements.txt

Install paramspider:

pip install paramspider

Install tldextract (required for --apex flag):

pip install tldextract

Install waybackwhen

git clone https://github.com/DFC302/waybackwhen.git
cd waybackwhen
chmod +x waybackwhen
sudo cp waybackwhen /usr/local/bin/   # optional: add to PATH

Verify your setup

waybackwhen --check

This prints a status table showing which tools are installed and where:

 waybackwhen — tool check
 TOOL            STATUS     PATH
 -----------------------------------------------
  waybackurls    [OK]       /home/user/go/bin/waybackurls
  gau            [OK]       /home/user/go/bin/gau
  urlfinder      [OK]       /home/user/go/bin/urlfinder
  paramspider    [OK]       /usr/local/bin/paramspider
  waymore        [OK]       /home/user/tools/waymore/waymore.py
  python3        [OK]       /usr/bin/python3
  tldextract     [OK]       /usr/lib/python3/dist-packages/tldextract

Missing tools are silently skipped at runtime — the script will not crash if a tool is absent, it just won't contribute results.

Usage

waybackwhen [options] [domain]

Options

Flag	Long form	Description
`-e`	`--exclude`	Filter out static assets (images, fonts, CSS, JS libraries, etc.) from all results
`-x`	`--exact`	Disable automatic apex extraction and scan the literal input domain (e.g. only `api.foo.com`, not `foo.com`)
`-c`	`--check`	Check which tools are installed and where, then exit
`-s TOOLS`	`--skip TOOLS`	Comma-separated list of tools to skip (e.g. `waymore` or `gau,waybackurls`)
`-f FILE`	`--file FILE`	Read domains from a file (one per line)
`-p N`	`--parallel N`	Number of domains to process concurrently (default: 1)
`-l FILE`	`--log FILE`	Write a timestamped run log to FILE (`.log` extension added if missing)
	`--stdout`	Print merged URLs to stdout instead of writing `.wbw` files (status/progress goes to stderr, so output stays pipe-clean)
`-r N`	`--retries N`	Per-tool retries on a hard failure (non-zero exit), with exponential backoff (default: 2)
`-q N`	`--requeue N`	Whole-domain retries when a pass returns zero URLs, with backoff (default: 1). Catches silent throttling, where a tool exits cleanly but returns nothing
`-b N`	`--backoff N`	Base backoff in seconds; the delay for attempt n is `N × n` (default: 5)
`-a N`	`--archive-slots N`	Max domains allowed to hit the archive lane (`waybackurls`/`waymore`/`paramspider`) at once. `0` = unlimited (default). Clamped to `--parallel`. Lets you fan out wide with `-p` without stampeding the Wayback API

Valid tool names for --skip: waybackurls, gau, urlfinder, waymore, paramspider

All numeric flags accept non-negative integers; --parallel must be at least 1.

Input methods

Single domain:

waybackwhen example.com

From file:

waybackwhen -f domains.txt

From stdin:

cat domains.txt | waybackwhen

Examples

Basic run on a single domain:

waybackwhen example.com

Run on a subdomain (apex is extracted automatically by default):

waybackwhen api.example.com
# Strips to example.com and runs all tools against it.
# Prints: [*] Apex extracted: api.example.com → example.com

Scan the literal subdomain only (no apex strip):

waybackwhen --exact api.example.com
# Scans api.example.com directly, does NOT widen to example.com.

Exclude static assets (cleaner output for endpoint hunting):

waybackwhen --exclude example.com

Multiple domains in parallel with logging:

waybackwhen -f domains.txt -p 5 -l run.log

Skip a slow tool for a faster run:

waybackwhen --skip waymore example.com

Skip multiple tools:

waybackwhen --skip gau,waybackurls example.com

Full combination — exclude, skip, parallel, log (apex is the default):

waybackwhen -f subdomains.txt --exclude --skip waymore -p 3 -l hunt.log

Pipe URLs straight into another tool (stdout mode):

waybackwhen --stdout example.com | httpx -silent

Fan out wide but stay gentle on the Wayback API:

# 20 domains in flight, but only 4 hitting the archive tools at any moment
waybackwhen -f domains.txt -p 20 -a 4

Recommended rate-limit-safe profile for large lists:

waybackwhen -f domains.txt -p 20 -a 4 -q 1 -b 8 -l run.log

-p 20 keeps 20 domains in flight (gau/urlfinder run at full width) while -a 4 lets only 4 touch the archive tools at once; -q 1 retries a domain that came back empty (a common throttling symptom) and -b 8 sets the backoff base.

Running on large lists

One thing to know before pointing this at a big file: apexes are not de-duplicated. In the default apex mode, api.foo.com, www.foo.com, and foo.com are three separate jobs that all collapse to foo.com — three full enumerations of the same apex (and three times the archive API calls, all writing to the same foo_com.wbw). On a large subdomain list this multiplies your rate-limit exposure and works against -a. Two ways to avoid it:

Collapse to unique apexes first (each apex runs once):

# requires tldextract (already a dependency)
python3 -c 'import sys,tldextract
for l in sys.stdin:
    l=l.strip()
    if not l: continue
    e=tldextract.extract(l)
    print(getattr(e,"top_domain_under_public_suffix",None) or e.registered_domain)' \
  < subdomains.txt | sort -u > apexes.txt
waybackwhen -f apexes.txt -p 20 -a 4 -q 1 -b 8 -l run.log

Or scan each host literally (no apex collapse, per-subdomain output):

waybackwhen -f subdomains.txt --exact -p 20 -a 4 -q 1 -b 8 -l run.log

For very large lists, also consider --skip waymore for a faster (if slightly less thorough) run, since waymore is the slowest and most rate-limited source.

Output

Each domain produces a .wbw file in the current directory named after the domain (dots replaced with underscores):

example_com.wbw
api_example_com.wbw

Files contain one URL per line, sorted and deduplicated. If a domain returns zero results the output file is deleted automatically.

Example output:

https://example.com/api/v1/users
https://example.com/login?redirect=/dashboard
https://example.com/search?q=FUZZ
https://example.com/wp-login.php

Rate limiting & resilience

Running many domains in parallel (-p) used to mean up to that many waybackurls + waymore processes all hammering web.archive.org at once, with no recovery if any of them got throttled. Three layers address that:

Per-tool retry with backoff (-r, default 2). Each tool runs through a wrapper that re-runs it on a hard failure (non-zero exit), waiting backoff × attempt seconds between tries. This catches crashes and network errors.
Whole-domain requeue on empty (-q, default 1). The archive tools usually exit 0 even when they were rate-limited, so retry-on-error alone misses silent throttling. When a domain's merged result is empty, the entire tool battery is re-run after a backoff. A genuinely empty domain just costs one extra (bounded) backoff cycle before its output is dropped.
Archive-lane semaphore (-a, default unlimited). A counting semaphore caps how many domains hit the archive tools (waybackurls/waymore/paramspider) simultaneously, while gau and urlfinder keep running at full -p. This is the cleanest way to prevent throttling in the first place: e.g. -p 20 -a 4 fans out to 20 domains but lets only 4 touch the archives at a time. The value is clamped to --parallel.

Tune the backoff base with -b (seconds). Set -q 0 to disable requeuing, or -a 0 for the old unlimited behavior.

Notes

Default apex extraction and multi-part TLDs: apex extraction (the default) uses tldextract, which handles complex TLDs correctly (api.example.co.uk → example.co.uk). Falls back to a two-label split if tldextract is not installed. Use --exact / -x to bypass and scan the literal input.
paramspider output: paramspider replaces parameter values with FUZZ placeholders (e.g., ?id=FUZZ). This is intentional — it surfaces parameter-bearing endpoints cleanly.
waymore config: waymore returns significantly more results when configured with API keys for URLScan and VirusTotal. See waymore's README for setup.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
README.md		README.md
waybackwhen		waybackwhen

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

waybackwhen

What It Does

Installation

Dependencies

Install waybackwhen

Verify your setup

Usage

Options

Input methods

Examples

Running on large lists

Output

Rate limiting & resilience

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

waybackwhen

What It Does

Installation

Dependencies

Install waybackwhen

Verify your setup

Usage

Options

Input methods

Examples

Running on large lists

Output

Rate limiting & resilience

Notes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages