Config-driven EVM log indexer in Rust. Indexes only the contracts and events
you declare in config.toml — never full blocks — to keep paid RPC usage
(Alchemy/QuickNode compute units & credits) low. Logs are fetched with filtered
eth_getLogs, decoded against their ABI at runtime (no codegen), and written as
structured, typed rows into Postgres.
| Goal | How |
|---|---|
| Minimal RPC spend | Filtered eth_getLogs over wide adaptive ranges; timestamps fetched once & cached; tx data piggybacks the block fetch; WebSocket at the tip, not polling. Every call metered through a per-provider cost model into a spend ledger with a monthly free-quota guard. |
| Fast, same cost | Two-level parallel pipeline (concurrent getLogs ranges × concurrent block/receipt RPCs per range) saturates the rate budget instead of idling. Same call count & $spent as serial — concurrency changes ordering, not volume. |
| Pay RPC once | Raw logs persisted before decode → re-decode / resync (new event, bug fix, added column) replays from disk with zero new RPC. |
| Reorg-safe | Block hashes tracked per row; reorgs are cheap point deletes + re-index within the confirmation window. |
| Swappable seams | RPC source, storage read layer, and query protocol all behind traits. Default: Alchemy/QuickNode + Postgres + GraphQL. |
| Multi-chain | One deployment, chain_id everywhere, one ingest task per chain. |
| Observable | Optional Prometheus /metrics endpoint (RPC calls, spend, queue depth, decode/query latency, build_info) — enable with [indexer] metrics_listen. |
Full design rationale: ARCHITECTURE.md. Code layout: CODEBASE.md.
Three services + a Postgres-backed queue. The ingestor fetches filtered logs and enqueues pointers; decode workers read the raw logs, decode them, and upsert typed rows; the query service serves GraphQL over those rows.
flowchart LR
cfg[config.toml<br/>chains · contracts · ABIs · events]
rpc[(Alchemy / QuickNode<br/>JSON-RPC + WS)]
pg[(Postgres)]
cfg -.loaded by all.-> ING & DEC & API
subgraph svc[services]
ING[INGESTOR<br/>per chain<br/>filtered eth_getLogs<br/>ws subscribe]
DEC[DECODER + WRITER<br/>ABI decode<br/>N workers]
API[QUERY API<br/>GraphQL R/O]
end
rpc -->|logs · blocks · receipts<br/>matched only| ING
ING -->|raw_* written first| pg
ING -->|work item pointer| DEC
DEC -->|read raw| pg
DEC -->|typed upserts| pg
pg --> API
Fetch path (the cost-saver) — diagram
Both levels of the diagram run concurrently — range_concurrency getLogs ranges in
flight, and aux_concurrency block/receipt RPCs per range — all gated by the same
PlanProfile token bucket so throughput fills the rate budget without exceeding it.
flowchart TD
A[one filter per chain<br/>address: all contracts<br/>topics: union of event sigs] --> B[eth_getLogs<br/>provider-sized range<br/>× range_concurrency in flight]
B -->|result-cap hit| C[split range in half,<br/>retry each side → 1 block]
B --> E[distinct matched blocks]
E --> F[batched eth_getBlockByNumber<br/>full=true → timestamp + txs<br/>× aux_concurrency]
F --> G[(blocks / transactions<br/>cached, deduped per chain)]
E --> H[eth_getTransactionReceipt<br/>batched, deduped<br/>× aux_concurrency]
H --> G
B & F & H -.rps/CU-gated.-> T[PlanProfile token bucket]
F -.metered.-> M[CostModel → SpendLedger<br/>$spent · monthly quota guard]
Queue & decode (parallel across chains, serial within a chain) — diagram
sequenceDiagram
participant I as Ingestor (chain N)
participant Q as work_queue (Postgres)
participant W as Decode Worker
participant DB as Event tables
I->>Q: raw insert + enqueue (same tx)
W->>Q: pull_any() — lease oldest, no in-flight for its chain
Q-->>W: WorkItem {from, to, kind}
W->>DB: decode raw → upsert typed rows (idempotent PK)
W->>Q: ack
Note over Q: FOR UPDATE SKIP LOCKED<br/>competing consumers, at-least-once
cp .env.example .env # fill ALCHEMY_HTTP / ALCHEMY_WS
cp config.toml.example config.toml
just pg-up # throwaway local Postgres on :55432
just migrate # ABI → DDL: create event tables
just backfill 19000000 19010000 # index a fixed range on chain 1 (default)
just backfill 19000000 19010000 8453 # …or pass a chain id explicitly
just run 4 # or: supervisor — ingest all chains + 4 decode workers
just query # GraphQL at :8080Run just with no args to list every recipe.
| Recipe | What it does |
|---|---|
just migrate / just migrate-dry |
Apply / preview ABI→DDL schema diff |
just backfill <from> <to> |
Index a height range (one chain) |
just resync <from> <to> |
Re-decode from raw_*, zero RPC |
just follow |
Track the tip via WS, resume on restart |
just run <workers> |
Supervisor: ingest all chains + in-proc decode pool |
just decode <workers> |
Standalone decode-worker pool (scale-out) |
just query |
Start the GraphQL server |
Docker compose mirrors these as profiles: just up indexer query,
just scale-decode 4, just logs.
config.toml is the single source of truth — it drives schema (migrate) and
every runtime service. Secrets are ${ENV} placeholders, never inline. Copy
config.toml.example to start.
[indexer]
log_level = "info" # tracing filter; RUST_LOG overrides
batch_size = 500 # decoder write batch into Postgres
range_concurrency = 4 # getLogs ranges in flight at once (backfill pipeline)
aux_concurrency = 8 # concurrent block/receipt RPCs per range (rps-gated)
tip_interval_secs = 6 # tip poll cadence (run/follow); CLI --interval overrides
# metrics_listen = "0.0.0.0:9090" # Prometheus /metrics endpoint; omit = exporter off
[database]
url = "${DATABASE_URL}"
max_conns = 16
[queue]
kind = "postgres" # work_queue table, FOR UPDATE SKIP LOCKED, polled
poll_ms = 50 # worker poll interval while items are flowing
poll_idle_ms = 1000 # slower poll once the queue drains; returns to poll_ms on new work
[query]
api = "graphql"
listen = "0.0.0.0:8080"
expose = "finalized" # finalized | provisional — read visibility
cache_ttl_ms = 1000 # in-process read-cache TTL; 0 disables caching
[[chains]]
id = 1
name = "ethereum"
kind = "evm" # chain family (selects the adapter); "evm" is the only kind today
confirmations = 12 # unfinalized window for reorg safety
[chains.source]
kind = "alchemy" # provider impl: alchemy | quicknode | generic_rpc | free_node
http = "${ALCHEMY_HTTP}"
ws = "${ALCHEMY_WS}" # optional; present → WS tip subscription, absent → poll fallback
[chains.source.limits] # PlanProfile — provider caps; omit the block for defaults
max_rps = 8 # request-rate ceiling (token bucket)
max_cu_per_sec = 330 # compute-unit/sec ceiling (Alchemy)
max_batch = 100 # max JSON-RPC batch size the plan accepts
max_getlogs_blocks = 10 # getLogs range seed/ceiling (free tier is small)
max_getlogs_results = 10_000 # result-count cap → triggers range halving
monthly_quota_cu = 300_000_000 # free-quota guard in the spend ledger
[[chains.contracts]]
address = "0xA0b86991c6218b36c1d19D4a2e9Eb0cE3606eB48" # USDC
abi = "abis/erc20.json"
events = ["Transfer", "Approval"] # subset of ABI; omit = all events
start_block = 19_000_000 # earliest block to scan — REQUIRED on ≥1 contract
# functions = ["transfer"] # decode calldata of these fns into columns; omit = none
# table = "usdc_transfer" # table-name override (default evt_<contract>_<event>)Add more [[chains]] blocks for multi-chain; each can use a different provider
and plan. source.kind picks the LogSource/CostModel impl;
[chains.source.limits] sets the plan caps that the client self-tunes against.
Field reference — full table of every non-obvious knob
Only fields whose meaning isn't obvious from the example are listed; anything not shown takes the default above.
[indexer]
| Field | Default | Meaning |
|---|---|---|
batch_size |
500 | Rows per decoder upsert batch into Postgres. |
range_concurrency |
4 | getLogs ranges fetched concurrently during backfill. Higher fills the rate budget faster; does not change call count or $spent. |
aux_concurrency |
8 | Per-range concurrent getBlockByNumber/getTransactionReceipt calls. The dominant wall-clock win at ~1 matched tx/block; rps-gated, so volume is unchanged. |
tip_interval_secs |
6 | Poll cadence at the tip for run/follow when WS is unavailable. CLI --interval overrides. |
metrics_listen |
(unset) | Address for the Prometheus /metrics scrape endpoint (e.g. 0.0.0.0:9090). Omit to disable the exporter. Honoured by every indexer command and indexer-query. |
[queue]
| Field | Default | Meaning |
|---|---|---|
poll_ms |
50 | Worker poll interval while items keep coming. |
poll_idle_ms |
1000 | Slower poll workers back off to when a pull returns nothing; snaps back to poll_ms once work resumes. |
[query]
| Field | Default | Meaning |
|---|---|---|
expose |
finalized |
Read watermark. finalized hides rows inside the confirmations window (reorg-unsafe); provisional exposes them. |
cache_ttl_ms |
1000 | TTL of the in-process read cache over the query path. 0 disables caching entirely. |
[[chains]]
| Field | Default | Meaning |
|---|---|---|
kind |
evm |
Chain family — selects the chain adapter. evm is the only kind built today. Distinct from source.kind (the RPC provider). |
confirmations |
12 | Depth of the unfinalized window. Reorgs are only acted on (and finalized reads hidden) within this many blocks of the tip. |
There is no chain-level start_block — the scan start is derived from the
contracts (see below).
[chains.source] / [chains.source.limits] — the PlanProfile. Omit the
limits block to take built-in defaults; otherwise set your plan's real caps so
the client self-tunes without tripping provider 429s.
| Field | Meaning |
|---|---|
kind |
Provider impl: alchemy, quicknode, generic_rpc, free_node. Picks both the RPC client and its CostModel. |
ws |
WS endpoint. Present → tip tracked via eth_subscribe; absent → poll fallback (tip_interval_secs). |
max_rps / max_cu_per_sec |
Token-bucket ceilings on request rate and Alchemy compute-units/sec. |
max_batch |
Largest JSON-RPC batch the plan accepts (block/receipt lookups are packed up to this). |
max_getlogs_blocks |
getLogs range — seeded here and shrunk only on result-cap hits. Free tiers are small (e.g. 10); paid tiers allow 2000+. |
max_getlogs_results |
Result-count cap. A page that hits it makes the range halve and retry, down to a single block. |
monthly_quota_cu |
Free monthly CU/credit allotment. Feeds the SpendLedger guard that slows/stops before the allotment is blown. |
[[chains.contracts]]
| Field | Required | Meaning |
|---|---|---|
address |
yes | Contract address (checksummed or lowercase). All addresses on a chain are unioned into one getLogs filter. |
abi |
yes | Path to the ABI JSON, relative to config.toml. Only event (and, if functions is set, function) entries are read. |
events |
no | ABI event names to index, exact match. Omit to index all events in the ABI. Each event → one table. |
functions |
no | ABI function names whose tx calldata to decode into columns. Omit/empty = decode no calldata. |
start_block |
≥1 per chain | Earliest block to scan for this contract. The chain's scan start is the minimum across its contracts; config validation fails if no contract sets one. |
table |
no | Override the generated table name (default evt_<contract>_<event>). |
Preparing config & ABIs — step-by-step from zero
Steps from zero to an indexable config:
-
Copy the template.
cp config.toml.example config.toml mkdir -p abis
-
Get the contract ABI as JSON and drop it in
abis/. Sources:# from Etherscan (verified contracts) — needs an API key curl -s "https://api.etherscan.io/api?module=contract&action=getabi&address=0xA0b8...&apikey=$ETHERSCAN_KEY" \ | jq -r '.result' > abis/erc20.json # or from a local Foundry/forge build artifact jq '.abi' out/ERC20.sol/ERC20.json > abis/erc20.json
The file must be the ABI array (or an object with an
.abifield). Onlyevententries are used; functions/constructors are ignored. One ABI can be reused across many contracts (e.g. oneerc20.jsonfor every ERC-20). -
Declare the chain —
id,name,confirmations,start_block, and a[chains.source]withkind+${ENV}endpoints. Set[chains.source.limits]to your provider plan's real caps (see the cost model for yourkind). -
Declare each contract under
[[chains.contracts]]:[[chains.contracts]] address = "0xA0b86991c6218b36c1d19D4a2e9Eb0cE3606eB48" # checksummed or lowercase abi = "abis/erc20.json" # path relative to config.toml events = ["Transfer", "Approval"] # by ABI event name; omit = all events # table = "usdc_transfer" # override evt_<contract>_<event> # start_block = 19_500_000 # per-contract start override
eventsnames must match the ABI exactly. Each event → one tableevt_<contract>_<event>(or yourtableoverride).- The getLogs filter unions all contract addresses + event topic0s per chain → one call covers every contract.
-
Validate — the migrator parses every ABI and computes the schema diff:
just migrate-dry # parse ABIs + print DDL plan, apply nothing just migrate # create the event tables
A bad ABI path, malformed JSON, or an
eventsname absent from the ABI fails here before any RPC is spent.
Adding a new event or contract later: edit
config.toml, drop/extend the ABI,just migrate, thenjust resync <from> <to>to backfill the new columns fromraw_*with zero new RPC.
Copy .env.example to .env. Compose reads it, and config.toml ${VAR}
placeholders resolve from it too.
| Var | Purpose |
|---|---|
DATABASE_URL |
Postgres DSN used by database.url |
POSTGRES_USER / POSTGRES_PASSWORD / POSTGRES_DB / POSTGRES_PORT |
Compose-provisioned DB creds |
ALCHEMY_HTTP / ALCHEMY_WS |
RPC source endpoints referenced by config.toml |
RUST_LOG |
Tracing filter (e.g. info, debug) |
CHAIN_ID |
Default chain for CLI recipes |
QUERY_PORT |
GraphQL server port |
just query (or the query compose profile) serves a read-only GraphQL API on
:8080, with a playground at the same address. The schema is built at runtime
from the configured ABIs — one typed object + query field per event/transaction
table, so column names and types match your contracts exactly.
Each queryable table exposes a field with these arguments:
| Arg | Type | Meaning |
|---|---|---|
first |
Int | Page size (default 100). |
after |
String | Opaque keyset cursor from a prior endCursor. |
orderBy |
ASC | DESC |
Sort on block position (height, log_index). Default ASC. |
chainId |
Int | Restrict to one chain. |
fromHeight / toHeight |
Int | Inclusive block-height bounds. |
where |
[FilterInput!] |
Arbitrary column predicates: {column, value, op} where op ∈ EQ, NEQ, GT, GTE, LT, LTE. |
The field returns a <table>_connection: nodes (typed rows), endCursor,
hasNext, and a lazy totalCount (an extra count(*), only when selected). On
event tables, nodes also exposes nested transaction and receipt objects
(joined on (chain_id, tx_hash)) — fetched only when selected.
{
evt_usdc_transfer(
first: 50
orderBy: DESC
chainId: 1
fromHeight: 19000000
where: [{ column: "from", value: "0x0000000000000000000000000000000000000000" }]
) {
totalCount
hasNext
endCursor
nodes {
from
to
value
transaction { hash gas_price }
receipt { status gas_used }
}
}
}Reads are gated at the [query].expose watermark (finalized hides rows inside the
confirmations window) and cached for [query].cache_ttl_ms.
Set [indexer] metrics_listen (e.g. 0.0.0.0:9090) to install a Prometheus
exporter serving /metrics. Every indexer subcommand and indexer-query honour
it; omit the field to leave the exporter off. Exposed series include RPC call counts,
metered spend, queue depth, and decode/query latency histograms, plus a build_info
gauge labelled with version and git SHA.
Compose ships the whole stack as profiles — each service starts only when its
profile is named, so you compose exactly the pieces you need. All app containers
share one image (built from the Dockerfile), mount config.toml read-only, and
read .env.
| Profile | Service | What it does |
|---|---|---|
db |
postgres |
Postgres only (:5432, pgdata volume) |
migrate |
migrate |
One-shot: diff ABIs → apply DDL, then exit |
indexer |
indexer |
Supervisor: per-chain ingest loops + in-process decode pool |
decode |
decode |
Standalone decode-worker pool (scale out) |
query |
query |
GraphQL read API on :8080 |
postgres is attached to every app profile and gated by a healthcheck, so any app
service waits for the DB to be ready before booting.
cp .env.example .env # fill ALCHEMY_HTTP / ALCHEMY_WS, DB creds
cp config.toml.example config.toml # mounted read-only into every container
docker compose build # or: just docker-build# 1. apply the schema once (one-shot, exits 0)
docker compose --profile migrate up
# 2. start indexing + the query API
docker compose --profile indexer --profile query up -d
# equivalently: just up indexer query
# tail logs / stop
docker compose logs -f # just logs
docker compose down # just downFor many or fast-blocktime chains, run the ingest supervisor and a separate,
horizontally-scaled decode pool (competing consumers of the shared work_queue):
docker compose --profile indexer up -d
docker compose --profile decode up -d --scale decode=4 # just scale-decode 4WORKERS (default 4) sets the in-process pool size for indexer/decode.
Compose reads these from .env (see the environment variables
table): POSTGRES_*, DATABASE_URL (overridden to point at the postgres service),
RUST_LOG, QUERY_PORT, plus WORKERS shown above. App containers
get a 45s stop_grace_period so in-flight ranges/decodes drain on SIGTERM.
just build # cargo build --workspace
just test # cargo test --workspace
just clippy # -D warnings
just fmt # cargo fmt --all
just ci # fmt-check + clippy + test (what CI runs)