fix(content-guards): cap markdown validator output to prevent context bloat#335
fix(content-guards): cap markdown validator output to prevent context bloat#335JacobPEvans wants to merge 1 commit into
Conversation
… bloat The validate-markdown.sh PostToolUse hook captured full markdownlint-cli2 stderr/stdout into the blockingError string returned to Claude with no bound. A noisy markdown file (e.g. README with long mermaid blocks under default MD013 line-length) emits hundreds of violations per fire; in one recent session a 15-file scaffold pushed ~500KB of lint output per Write into context and busted the 1M-token session limit (final transcript: 2,082,547 tokens, 85% markdownlint-attributed). Cap reporting to the first 20 violation lines plus an overflow-count summary. Worst-case per-fire payload ~2KB vs ~500KB before. Hook still exits 2 on failure; blocking behavior unchanged. awk used over head/wc to avoid SIGPIPE concerns under set -euo pipefail. Smoke test (120-line noisy file, default MD013): exit 2, 26 lines / 1893 bytes total, footer reads "...and 105 more line(s) (capped at 20; rerun markdownlint-cli2 manually for the full report)". Assisted-by: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a change to content-guards/scripts/validate-markdown.sh to cap the output of markdownlint-cli2 to 20 lines, preventing large outputs from flooding the context window. The reviewer suggested optimizing this logic by combining the line counting, capping, and footer formatting into a single awk invocation to improve performance and avoid spawning multiple processes.
| # Cap validator output so a noisy run can't flood Claude's context window. | ||
| max_lines=20 | ||
| total_lines=$(awk 'END{print NR}' <<<"$markdownlint_output") | ||
| if (( total_lines > max_lines )); then | ||
| capped=$(awk -v max="$max_lines" 'NR<=max' <<<"$markdownlint_output") | ||
| capped+=$'\n…and '"$((total_lines - max_lines))"' more line(s) (capped at '"$max_lines"'; rerun markdownlint-cli2 manually for the full report)' | ||
| errors+=("$capped") | ||
| else | ||
| errors+=("$markdownlint_output") | ||
| fi |
There was a problem hiding this comment.
We can simplify this logic and avoid spawning awk twice (which is relatively expensive in shell scripts) by performing the line counting, capping, and footer formatting in a single awk invocation. This also avoids potential Bash arithmetic syntax errors if awk output is unexpected.
| # Cap validator output so a noisy run can't flood Claude's context window. | |
| max_lines=20 | |
| total_lines=$(awk 'END{print NR}' <<<"$markdownlint_output") | |
| if (( total_lines > max_lines )); then | |
| capped=$(awk -v max="$max_lines" 'NR<=max' <<<"$markdownlint_output") | |
| capped+=$'\n…and '"$((total_lines - max_lines))"' more line(s) (capped at '"$max_lines"'; rerun markdownlint-cli2 manually for the full report)' | |
| errors+=("$capped") | |
| else | |
| errors+=("$markdownlint_output") | |
| fi | |
| # Cap validator output so a noisy run can't flood Claude's context window. | |
| max_lines=20 | |
| if [[ -n "$markdownlint_output" ]]; then | |
| capped=$(awk -v max="$max_lines" ' | |
| NR <= max { print } | |
| END { | |
| if (NR > max) { | |
| print "…and " (NR - max) " more line(s) (capped at " max "; rerun markdownlint-cli2 manually for the full report)" | |
| } | |
| } | |
| ' <<<"$markdownlint_output") | |
| errors+=("$capped") | |
| fi |
Summary
validate-markdown.shPostToolUse hook output to first 20 lines + overflow summary.markdownlint-cli2violation dumps from filling Claude's context window.Why
The hook currently captures full
markdownlint-cli2stderr/stdout into theblockingErrorstring returned to Claude with no bound. In a recent session scaffolding a new terraform module, a single noisy file (README with long mermaid blocks under default MD013 line-length) emitted ~500KB of lint output per Write. A 15-file scaffold pushed the session past the 1,000,000-token limit — final transcript landed at 2,082,547 tokens, 85% markdownlint-attributed./compactcouldn't recover; the session bricked.Change
Lines 147–157 of
content-guards/scripts/validate-markdown.sh:errors+=("$markdownlint_output")— unbounded.awk, append"...and N more line(s)"overflow summary if total exceeds the cap. Worst-case payload ~2KB vs ~500KB before.Hook still
exit 2s on failure — blocking behavior unchanged.awkchosen overhead/wcto avoid SIGPIPE concerns underset -euo pipefail.Test plan
bash -npasses on patched script.…and 105 more line(s) (capped at 20; rerun markdownlint-cli2 manually for the full report).release-pleasedrafts content-guards 1.7.0 → 1.7.1 on merge..mdproduces bounded hook output in a live Claude Code session.Sibling
validate-readme.pywas investigated and confirmed already-bounded (3 fires × ~1.2KB in the same incident); no change needed there.🤖 Generated with Claude Code