loadgen

Zero-allocation HTTP/1.1 and HTTP/2 load generator for benchmarking web servers.

Custom protocol clients bypass Go's standard net/http for maximum throughput and minimal measurement overhead. Latency is recorded per-worker in sharded recorders with zero contention on the hot path.

Features

HTTP/1.1: One persistent TCP connection per worker with pre-formatted request bytes. Auto-reconnect for Connection: close workloads via round-robin connection pool.
HTTP/2: Lock-free stream dispatch over multiplexed connections with pre-encoded HPACK headers, batched WINDOW_UPDATE, and channel-pooled response delivery.
HTTPS/TLS: Full TLS support with custom certificates, ALPN negotiation for H2.
Zero-allocation hot path: No allocations during request/response cycle.
Sharded latency recording: Per-worker histograms merged at completion -- no lock contention.
Accurate percentiles: p50, p75, p90, p99, p99.9, p99.99 via reservoir sampling.
Rate limiting: Optional MaxRPS for closed-loop benchmarking.
Progress callbacks: Real-time OnProgress reporting during benchmarks.
Extensible: Exported Client interface for custom protocol handlers (gRPC, QUIC, WebSocket).

Installation

go get github.com/goceleris/loadgen

Quick Start

HTTP/1.1 Benchmark

package main

import (
    "context"
    "fmt"
    "time"

    "github.com/goceleris/loadgen"
)

func main() {
    cfg := loadgen.Config{
        URL:         "http://localhost:8080/",
        Duration:    15 * time.Second,
        Connections: 256,
        Workers:     256,
    }
    b, err := loadgen.New(cfg)
    if err != nil {
        panic(err)
    }
    result, err := b.Run(context.Background())
    if err != nil {
        panic(err)
    }
    fmt.Printf("%.0f req/s, p99=%v\n", result.RequestsPerSec, result.Latency.P99)
}

HTTP/2 Benchmark

cfg := loadgen.Config{
    URL:      "http://localhost:8080/",
    Duration: 15 * time.Second,
    HTTP2:    true,
    HTTP2Options: loadgen.HTTP2Options{
        Connections: 16,
        MaxStreams:  100,
    },
    Workers: 64,
}
b, err := loadgen.New(cfg)
// ...

HTTPS with Custom TLS

cfg := loadgen.Config{
    URL:                "https://api.example.com/health",
    Duration:           30 * time.Second,
    Connections:        128,
    Workers:            128,
    InsecureSkipVerify: true, // for self-signed certs
}

Rate-Limited Benchmark

cfg := loadgen.Config{
    URL:      "http://localhost:8080/",
    Duration: 30 * time.Second,
    Workers:  64,
    MaxRPS:   10000, // cap at 10k req/s
}

Progress Monitoring

cfg := loadgen.Config{
    // ...
    OnProgress: func(elapsed time.Duration, snapshot loadgen.Result) {
        fmt.Printf("\r%s: %d req, %.0f req/s",
            elapsed.Round(time.Second),
            snapshot.Requests,
            snapshot.RequestsPerSec)
    },
}

Custom Protocol Client

type myClient struct { /* ... */ }
func (c *myClient) DoRequest(ctx context.Context, workerID int) (int, error) { /* ... */ }
func (c *myClient) Close() { /* ... */ }

cfg := loadgen.Config{
    Duration: 15 * time.Second,
    Workers:  64,
    Client:   &myClient{},
}

CLI Usage

go install github.com/goceleris/loadgen/cmd/loadgen@latest

loadgen [flags] -url <target>

Flags:
  -url string          Target URL (required)
  -duration duration   Benchmark duration (default 15s)
  -warmup duration     Warmup duration (default 2s)
  -connections int     Number of H1 connections (default 256)
  -workers int         Number of workers (default: connections for H1, NumCPU*4 for H2)
  -method string       HTTP method (default "GET")
  -H string            Custom header "Key: Value" (repeatable)
  -body-file string    Read request body from file
  -close               Send Connection: close (H1 only)
  -h2                  Use HTTP/2 prior-knowledge (h2c/h2)
  -h2c-upgrade         Use HTTP/2 via RFC 7540 §3.2 h2c upgrade handshake
  -mix string          Per-connection protocol mix ratio, e.g. h1:h2:upgrade=4:4:1
  -h2-conns int        H2 connections (default 16)
  -h2-streams int      Max concurrent H2 streams per connection (default 100)
  -insecure            Skip TLS certificate verification
  -max-rps int         Max requests per second (0 = unlimited)

Example:

# H1 benchmark with custom headers
loadgen -url http://localhost:8080/api -duration 30s -connections 512 \
  -H "Authorization: Bearer token" -H "Content-Type: application/json"

# H2 benchmark
loadgen -url http://localhost:8080/ -h2 -h2-conns 16 -h2-streams 200 -duration 30s

# HTTPS with rate limiting
loadgen -url https://api.example.com/health -insecure -max-rps 5000 -duration 60s

`-h2c-upgrade` (RFC 7540 §3.2)

Starts each connection as HTTP/1.1 carrying Connection: Upgrade, HTTP2-Settings + Upgrade: h2c headers, reads a 101 Switching Protocols response, then switches to HTTP/2 on the same TCP socket. Exercises the cleartext upgrade path that is absent from -h2 (which sends the H2 preface directly, skipping HTTP/1.1 entirely). Mutually exclusive with -h2 and -mix. Only defined over cleartext HTTP — TLS servers negotiate H2 via ALPN.

# Basic h2c upgrade run.
loadgen -url http://localhost:8080/ -h2c-upgrade -duration 30s

# h2c upgrade with many connections + streams (matches browser-style fanout).
loadgen -url http://localhost:8080/ -h2c-upgrade -h2-conns 16 -h2-streams 200 -duration 30s

# Target an endpoint used by the celeris Protocol: Auto + EnableH2Upgrade test.
loadgen -url http://127.0.0.1:9000/api/health -h2c-upgrade -duration 60s -warmup 5s

The final output includes an upgrade block in the JSON result plus a stderr line summarising how many connections completed the handshake: h2c upgrade: X/Y conns upgraded successfully.

`-mix` — realistic traffic mixtures

Assigns each connection to a protocol by weighted random draw across H1, H2 prior-knowledge, and h2c-upgrade. Connections commit to their chosen protocol for life. Mutually exclusive with -h2 and -h2c-upgrade. Format: h1:h2:upgrade=N:N:N (the h1:h2:upgrade= prefix is optional; a bare N:N:N also parses). Weights must be non-negative integers; 0:0:0 is rejected.

# Third of each protocol — equal fan-out across the dispatcher.
loadgen -url http://localhost:8080/ -mix h1:h2:upgrade=1:1:1 -duration 30s

# Browser-heavy H2 with a trickle of legacy H1 upgrades (44/44/11).
loadgen -url http://localhost:8080/ -mix h1:h2:upgrade=4:4:1 -duration 60s

# Mostly H1 with an occasional H2 client (90/10 split).
loadgen -url http://localhost:8080/ -mix h1:h2:upgrade=9:1:0 -duration 30s

# Fully equivalent to -h2c-upgrade but routed through the mix dispatcher.
loadgen -url http://localhost:8080/ -mix h1:h2:upgrade=0:0:1 -duration 30s

The mix block in the JSON result reports per-protocol connection counts, requests, and errors. The stderr summary prints a matching per-protocol breakdown so you can confirm the server handled each slot correctly.

Output is JSON with RPS, latency percentiles (p50-p99.99), errors, throughput, and timeseries data.

Cluster-bench integration contract

cmd/loadgen is designed to be invoked remotely (over SSH or by an orchestrator like Ansible) on a dedicated load-generation host. The contract is:

Aspect	Behavior
stdout	A single pretty-printed JSON `Result` object — no progress noise, no log lines.
stderr	Human-readable progress (`<elapsed> <reqs> <rps>`), final mix/upgrade summaries, error traces.
exit 0	Benchmark ran to completion (errors during the run are reported in JSON, not via exit code).
exit non-zero	Configuration error (`-url` missing, mutually-exclusive flags, body-file unreadable, etc.).
SIGINT / SIGTERM	Cancels the run gracefully — partial Result still printed to stdout, exit 0.

JSON output schema

The shape comes from loadgen.Result in results.go. Stable fields used by orchestrators:

Field	Type	Meaning
`Requests`	int64	Total successful requests during the measurement window.
`Errors`	int64	Total errors during the measurement window.
`Bytes`	int64	Total response body bytes received.
`RequestsPerSec`	float64	`Requests / Duration.Seconds()`.
`BytesPerSec`	float64	`Bytes / Duration.Seconds()`.
`Duration`	duration	Wall-clock measurement duration (post-warmup).
`Latency.{P50,P75,P90,P99,P99_9,P99_99}`	duration	Latency percentiles in nanoseconds.
`Mix` (optional)	object	Present when `-mix` was used. See "Mix-mode benchmarks" above.
`Upgrade` (optional)	object	Present when `-h2c-upgrade` was used. Records connection upgrade tally.

Pre-built binary releases

GitHub Releases ship platform tarballs that orchestrators can fetch directly:

TAG=v1.1.0
OS=linux
ARCH=amd64
curl -fsSL "https://github.com/goceleris/loadgen/releases/download/${TAG}/loadgen_${OS}_${ARCH}.tar.gz" \
  | tar xz -C /tmp/
chmod +x /tmp/loadgen-${OS}-${ARCH}
/tmp/loadgen-${OS}-${ARCH} -url http://target:8080/ -duration 10s

GitHub displays the SHA-256 of each release asset on the release page — gh release download <tag> and the GitHub UI verify checksums automatically, so the workflow doesn't ship a separate .sha256 sidecar.

Reference orchestrator

The celeris cluster bench (see goceleris/celeris → mage clusterBench + ansible/cluster-bench.yml) uses this contract. Cross-compiles loadgen on the dev machine (or fetches a release tarball), pushes to the loadgen host, executes, parses stdout JSON, fetches stderr+stdout to the dev box.

Architecture

┌─────────────────────────────────────────────────────┐
│                    Benchmarker                       │
│  ┌─────────┐  ┌─────────┐       ┌─────────┐       │
│  │ Worker 0 │  │ Worker 1 │  ...  │ Worker N │       │
│  └────┬─────┘  └────┬─────┘       └────┬─────┘       │
│       │              │                  │              │
│  ┌────▼──────────────▼──────────────────▼─────┐      │
│  │              Client Interface               │      │
│  │  ┌──────────┐    ┌──────────┐              │      │
│  │  │ H1Client │    │ H2Client │   (custom)   │      │
│  │  └──────────┘    └──────────┘              │      │
│  └────────────────────────────────────────────┘      │
│                                                       │
│  ┌────────────────────────────────────────────┐      │
│  │       ShardedLatencyRecorder               │      │
│  │  [Shard 0] [Shard 1] ... [Shard N]        │      │
│  │  (per-worker, zero contention)             │      │
│  └────────────────────────────────────────────┘      │
└─────────────────────────────────────────────────────┘

H1 Worker Model

Each worker owns a dedicated TCP connection (keep-alive mode) or a round-robin pool of 16 connections (Connection: close mode). Pre-formatted request bytes are written in a single syscall. No synchronization between workers.

H2 Multiplexed Model

N workers share M connections. Each connection has a write goroutine (serializes frame writes, allocates stream IDs) and a read goroutine (dispatches responses to waiting workers via lock-free stream slots). Workers acquire a stream semaphore, submit a write request, and wait on a pooled response channel.

Latency Recording

Each worker writes to its own shard -- no locks, no atomics on the hot path. Request/byte counters use batched local counters flushed to atomics every 256 requests (H1) or 16 requests (H2) to minimize ARM64 memory barrier overhead. Shards are merged once at benchmark completion for percentile calculation.

H2 Flow Control

The H2 client uses aggressive flow control settings tuned for benchmarking throughput rather than production fairness.

Settings

Parameter	Value	RFC 7540 Default	Rationale
Initial window size	16 MB	64 KB	Prevents stalls with large response bodies
Max frame size	64 KB	16 KB	Balances framing overhead against flow control granularity
Header table size	0	4096	Zero-allocation status code extraction from HPACK headers

WINDOW_UPDATE Batching

The read loop accumulates consumed bytes via an atomic counter. The write loop flushes a single WINDOW_UPDATE between processing request batches AND on a 1ms ticker when idle. This amortizes frame overhead while preventing flow control stalls.

Dynamic Table Disabled

SETTINGS_HEADER_TABLE_SIZE=0 allows zero-allocation status code extraction from HPACK-encoded headers. With the dynamic table disabled, there is no dynamic table state to track, so status codes can be parsed directly from the encoded header block without maintaining decoder state.

Stream Concurrency

Controlled by a semaphore sized to min(server's MAX_CONCURRENT_STREAMS, configured MaxStreams). Stream slots are allocated at 2x the semaphore size for wrap-around headroom, preventing slot reuse conflicts during high-throughput bursts.

Tuning Guidance

CPU-bound: Increase connections. More TCP sockets means more kernel parallelism across cores.
IO-bound: Increase streams. More multiplexed requests per connection saturates network bandwidth.
Large responses (>1 MB): Fewer connections with more streams prevents flow control pressure. The 16 MB window can sustain ~16 in-flight 1 MB responses per connection before stalling.

Comparison with Other Tools

Feature	loadgen	wrk	h2load	k6
HTTP/1.1	Zero-alloc custom client	Lua-scriptable	Limited	Full `net/http`
HTTP/2	Zero-alloc multiplexed	Not supported	nghttp2-based	Full `net/http`
HTTPS/TLS	ALPN negotiation	Yes	Yes	Yes
Scripting	Go (compile-time)	Lua	None	JavaScript
Allocations	Zero on hot path	Low	Low	High
Rate limiting	Built-in `MaxRPS`	None	None	Built-in
Use case	Raw throughput measurement	H1 benchmarking	H2 benchmarking	Load testing

wrk: H1 only, no H2 support. loadgen supports both H1 and H2 with zero-alloc clients.
h2load: H2 focused but limited H1. loadgen has both with equal optimization.
k6: Full-featured but higher overhead (JavaScript scripting). loadgen is pure Go, zero-alloc, designed for maximum throughput measurement.
loadgen: Focused on raw throughput measurement with minimal client overhead. Not a general-purpose load testing framework.

License

Apache License 2.0 -- see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.github		.github
checkpoint		checkpoint
cmd/loadgen		cmd/loadgen
internal/integrationtest/testserver		internal/integrationtest/testserver
.golangci.yml		.golangci.yml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
bench.go		bench.go
bench_perf_test.go		bench_perf_test.go
bench_rated_test.go		bench_rated_test.go
benchmarker_test.go		benchmarker_test.go
cpumon.go		cpumon.go
cpumon_other.go		cpumon_other.go
cpumon_test.go		cpumon_test.go
dial.go		dial.go
dial_test.go		dial_test.go
doc.go		doc.go
federation.go		federation.go
federation_test.go		federation_test.go
go.mod		go.mod
go.sum		go.sum
h1client.go		h1client.go
h1client_test.go		h1client_test.go
h2c_upgrade.go		h2c_upgrade.go
h2c_upgrade_test.go		h2c_upgrade_test.go
h2client.go		h2client.go
h2client_test.go		h2client_test.go
h2framer.go		h2framer.go
h2status_test.go		h2status_test.go
integration_h2c_test.go		integration_h2c_test.go
latency.go		latency.go
latency_test.go		latency_test.go
latency_v2_test.go		latency_v2_test.go
mix.go		mix.go
mix_test.go		mix_test.go
race_off_test.go		race_off_test.go
race_on_test.go		race_on_test.go
recvq.go		recvq.go
recvq_other.go		recvq_other.go
recvq_test.go		recvq_test.go
results.go		results.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

loadgen

Features

Installation

Quick Start

HTTP/1.1 Benchmark

HTTP/2 Benchmark

HTTPS with Custom TLS

Rate-Limited Benchmark

Progress Monitoring

Custom Protocol Client

CLI Usage

`-h2c-upgrade` (RFC 7540 §3.2)

`-mix` — realistic traffic mixtures

Cluster-bench integration contract

JSON output schema

Pre-built binary releases

Reference orchestrator

Architecture

H1 Worker Model

H2 Multiplexed Model

Latency Recording

H2 Flow Control

Settings

WINDOW_UPDATE Batching

Dynamic Table Disabled

Stream Concurrency

Tuning Guidance

Comparison with Other Tools

License

About

Uh oh!

Releases 10

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

loadgen

Features

Installation

Quick Start

HTTP/1.1 Benchmark

HTTP/2 Benchmark

HTTPS with Custom TLS

Rate-Limited Benchmark

Progress Monitoring

Custom Protocol Client

CLI Usage

-h2c-upgrade (RFC 7540 §3.2)

-mix — realistic traffic mixtures

Cluster-bench integration contract

JSON output schema

Pre-built binary releases

Reference orchestrator

Architecture

H1 Worker Model

H2 Multiplexed Model

Latency Recording

H2 Flow Control

Settings

WINDOW_UPDATE Batching

Dynamic Table Disabled

Stream Concurrency

Tuning Guidance

Comparison with Other Tools

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 10

Contributors

Uh oh!

Languages

`-h2c-upgrade` (RFC 7540 §3.2)

`-mix` — realistic traffic mixtures