BroadLeaf

Structured JSONL Logging

ABSTRACT

The BroadLeaf library is a lightweight structured logging library for Python services and pipelines. Every log call produces a single, self-contained JSON Lines record. No external dependencies. Async drain thread so callers never wait on I/O.

Implemented as local in first dry runs, but scaffolding in place for sinks in the cloudy.

{"ts":"2026-04-09T11:04:22.103Z","level":"INFO","component":"receiver","event":"batch_received","batch_id":"abc123","sample_count":864}
{"ts":"2026-04-09T11:04:22.218Z","level":"INFO","component":"inference","event":"predict","prediction":"shake","confidence":0.71,"latency_ms":12.4,"batch_id":"abc123","weights":"v1.pt"}

Installation

# editable install (from within a uv project)
uv add --editable /path/to/broadleaf

# or as a path dependency in pyproject.toml
[tool.uv.sources]
broadleaf = { path = "/path/to/broadleaf", editable = true }

Dev dependencies (tests only):

uv sync --extra dev

Basic Wiring

from broadleaf import get_logger, configure, Level
from broadleaf.sinks.file_sink import FileSink

# Optional: call once at startup to set sink and minimum level.
# If omitted, defaults are read from LOG_SINK / LOG_LEVEL / LOG_DIR env vars.
configure(sink=FileSink(log_dir="logs"), min_level=Level.INFO)

log = get_logger("receiver")
log.info("batch_received", batch_id="abc123", sample_count=864)
log.warn("queue_pressure", depth=8200, capacity=10000)
log.error("write_failed", path="/data/batches/abc123.json", reason="disk full")

get_logger() can be called before configure() — the sink is lazily initialized on the first record that drains. Subsequent configure() calls propagate immediately to all existing loggers.

Levels

from broadleaf import Level

log.trace("raw_frame", frame_id=7)          # development detail
log.debug("tensor_shape", shape=[1, 9, 128])
log.info("epoch_end", epoch=5, loss=0.31)
log.warn("low_confidence", score=0.42)
log.error("inference_failed", batch_id="xyz")
log.fatal("engine_unrecoverable", reason="OOM")

Levels in order: TRACE(0) < DEBUG(10) < INFO(20) < WARN(30) < ERROR(40) < FATAL(50).

Bound Context

bind() returns a child logger that stamps every subsequent record with fixed fields. Original logger is unchanged.

run_log = get_logger("train").bind(run_id="r42", config_hash="deadbeef")
run_log.info("epoch_end", epoch=5, loss=0.31)
run_log.info("epoch_end", epoch=6, loss=0.28)
# both records carry run_id and config_hash automatically

Useful for request-scoped context, batch IDs, model version tags — anything that should appear on every record within a logical scope without repeating it at every call site.

Sinks

FileSink — daily-rotating JSONL files

from broadleaf.sinks.file_sink import FileSink

configure(sink=FileSink(log_dir="logs"))
# writes: logs/20260409_broadleaf.jsonl
# rolls:  logs/20260409_143022_broadleaf.jsonl  (when file exceeds 50 MB)

Files are named {YYYYMMDD}_broadleaf.jsonl — date-first so directory listings sort chronologically. The broadleaf segment is a stable marker; it can be replaced with a service name to differentiate log streams in future.

StdoutSink — Docker / cloud log collectors

from broadleaf.sinks.stdout_sink import StdoutSink

configure(sink=StdoutSink())
# or: LOG_SINK=stdout (no code change needed)

Writes one JSON line per record to stdout. Compatible with awslogs, gcplogs, Datadog, and any collector that reads container stdout.

Environment Variable Overrides

Variable	Default	Effect
`LOG_SINK`	`file`	`file` or `stdout`
`LOG_DIR`	`logs`	directory for FileSink
`LOG_LEVEL`	`INFO`	minimum level name

Reading Logs

LogReader streams JSONL records without loading full files into memory. All filters are additive (AND).

from broadleaf import LogReader
from datetime import datetime, timedelta, timezone

reader = LogReader("logs/")

# All inference predictions for a specific batch
for rec in reader.query(component="inference", search="abc123"):
    print(rec["prediction"], rec["confidence"])

# Warnings and above from training, last hour
since = datetime.now(timezone.utc) - timedelta(hours=1)
for rec in reader.query(component="train", level="WARN", since=since):
    print(rec)

# Collect into a list — pass limit=None for unbounded
results = list(reader.query(component="receiver", event="batch_received", limit=50))

# Quick tail inspection
recent = reader.tail(n=20, component="inference")

Query Parameters

Parameter	Type	Description
`component`	`str`	Prefix match — `"inference"` matches `"inference"` and `"inference.embed"`
`level`	`str`	Minimum level — `"WARN"` returns WARN, ERROR, FATAL
`event`	`str`	Exact match on the `event` field
`search`	`str`	Case-insensitive substring across the full JSON line
`since`	`datetime`	Records at or after this timestamp
`until`	`datetime`	Records at or before this timestamp
`predicate`	`Callable[[dict], bool]`	Arbitrary filter applied after all named parameters
`limit`	`int`	Stop after N matches (default 500; `None` = all)

Predicate & Single-File Queries

Use predicate for field-level checks the named filters don't cover. It receives the parsed record dict and runs after all other filters, so named filters eliminate rows first.

# Low-confidence predictions from the last hour
since = datetime.now(timezone.utc) - timedelta(hours=1)
for rec in reader.query(
    component="inference",
    since=since,
    predicate=lambda r: r.get("confidence", 1.0) < 0.5,
):
    print(rec["batch_id"], rec["confidence"])

To query a specific file rather than a directory, use LogReader.from_file(). All query() parameters work normally.

reader = LogReader.from_file("logs/20260413_broadleaf.jsonl")
errors = list(reader.query(level="ERROR", limit=None))

Record Schema

Every JSONL record has these top-level fields:

{
  "ts":        "2026-04-09T11:04:22.103456Z",  // ISO-8601 UTC
  "level":     "INFO",
  "component": "receiver",
  "event":     "batch_received",               // machine-readable, snake_case
  "msg":       "",                             // optional human-readable note
  "batch_id":  "abc123",                       // ...arbitrary fields merged in
  "count":     864
}

event is the primary machine-readable identifier. msg is optional free text for human notes. All keyword arguments passed to the log call are merged into the top-level record.

Structured over Formatted

BroadLeaf records are designed to be queried, not read. Each call should be a single self-contained event:

# good — one record, one event, queryable fields
log.info("export_options_available", formats=["coreml", "onnx", "swift_wrapper"])

# avoid — several records for one moment, no queryable structure
log.info("Export options:")
log.info("  - CoreML")
log.info("  - ONNX")

If content is developer-facing output rather than an operational event, use print().

Graceful Shutdown

from broadleaf import shutdown

# Drains queue, flushes, and closes the sink.
# Safe to call at process exit; configure() can restart the engine afterward.
shutdown()

Running Tests

uv sync --extra dev
uv run pytest tests/ -v

Project Layout

broadleaf/
├── src/broadleaf/
│   ├── __init__.py          # public API
│   ├── level.py             # Level(IntEnum)
│   ├── record.py            # LogRecord dataclass
│   ├── logger.py            # Logger, _Engine singleton
│   ├── reader.py            # LogReader streaming query
│   └── sinks/
│       ├── base.py          # Sink ABC
│       ├── file_sink.py     # daily-rotating JSONL
│       └── stdout_sink.py   # Docker / cloud stdout
├── tests/
│   └── test_logger.py
└── pyproject.toml

APPENDIX

This project was designed and purpose built from the ground up as a logger library that supports levels in addition to other features enabled by using the JSON Lines approach.

The objective is to enable a more modern, durable approach to log traces - still enabling fine grain meta data content control on write, and predicate filtering on read.

Initial bring up was related to building out inference pathways in the areas of fine tuning model implementations plus follow up passes and iterations to test and train - including key steps like determinations of weights as downstream model inputs.

Library Name

NOTE: On inception, was interested to call the library Baobab to mix it beyond the usual reference to a log and instead shift into the broader set of tree terms. With baobab, pronunciation questions surfaced right away. The compromise was BroadLeaf, the category of deciduous trees in which the Baobab tree resides. This name fits, as there is a lot of data carried in the unique trunk but the branches and leaf structures are not just interesting but key as well.

/jmw

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
src/broadleaf		src/broadleaf
statics		statics
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BroadLeaf

ABSTRACT

Installation

Basic Wiring

Levels

Bound Context

Sinks

FileSink — daily-rotating JSONL files

StdoutSink — Docker / cloud log collectors

Environment Variable Overrides

Reading Logs

Query Parameters

Predicate & Single-File Queries

Record Schema

Structured over Formatted

Graceful Shutdown

Running Tests

Project Layout

APPENDIX

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BroadLeaf

ABSTRACT

Installation

Basic Wiring

Levels

Bound Context

Sinks

FileSink — daily-rotating JSONL files

StdoutSink — Docker / cloud log collectors

Environment Variable Overrides

Reading Logs

Query Parameters

Predicate & Single-File Queries

Record Schema

Structured over Formatted

Graceful Shutdown

Running Tests

Project Layout

APPENDIX

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages