Skip to content

fix(doc-collector): two SPA-robustness fixes (recoverable nav error, page-state repopulation)#34

Open
gololdf1sh wants to merge 2 commits into
testomatio:mainfrom
gololdf1sh:fix/doc-collector-spa-robustness
Open

fix(doc-collector): two SPA-robustness fixes (recoverable nav error, page-state repopulation)#34
gololdf1sh wants to merge 2 commits into
testomatio:mainfrom
gololdf1sh:fix/doc-collector-spa-robustness

Conversation

@gololdf1sh
Copy link
Copy Markdown

@gololdf1sh gololdf1sh commented May 20, 2026

Summary

Two independent fixes that came out of running explorbot docs collect against a real-world SPA (Testomat.io beta). Each is in its own commit.

Note: Initial version of this PR included a third fix to ConfigParser.loadEnv (walk parent dirs to find .env). It regressed action-result-diff.test.ts on CI in ways the diff alone doesn't fully explain. Dropped from this PR — will resubmit separately with a CLI-level bootstrap so library semantics are unchanged.

1. fix(explorer): treat "navigating and changing the content" as recoverable

Playwright throws page.content: Unable to retrieve content because the page is navigating and changing the content on heavy SPAs whose client router rewrites the DOM mid-action (Ember, React Router, etc.). The current regex covered net::ERR_ABORTED, screenshot timeout, and font-wait — this new phrase fell through to FATAL_BROWSER_ERRORS and killed the whole crawl on the first race. Added to RECOVERABLE_NAVIGATION_ERRORS so the explorer retries instead.

2. fix(doc-collector): repopulate page state when framenavigated stripped it

After navigation, the framenavigated handler overwrites the rich ActionResult (html / links / aria) with a stripped WebPageState carrying only { url, title, statusCode }. doc-collector then reads getCurrentState() and gets state.html === undefined, state.links === []. Two consequences:

  • Documentarian receives empty html → page docs degrade to a near-empty stub.
  • extractNextPaths sees an empty links array → subtree crawl stops at the entry page even when many followable links exist.

Targeted fixes:

  • In the main collect loop: if state.html is falsy, force capturePageState before passing to the AI documenter.
  • In extractNextPaths: if state.links is empty but state.html is present, fall back to extractLinks(state.html).

Repro (combined effect)

Running explorbot docs collect /projects/{slug}/runs/{id} on Testomat.io beta:

Before After
Crash on first action with the "navigating and changing the content" error Crawl completes
When it didn't crash: "Pages documented: 1" "Pages documented: 2-3" (entry + linked subpages)

gololdf1sh and others added 2 commits May 20, 2026 15:50
…overable

Playwright throws "page.content: Unable to retrieve content because
the page is navigating and changing the content" on heavy SPAs whose
client-side router rewrites the DOM mid-action (Ember, React Router,
etc.). The explorer was catching only net::ERR_ABORTED /
screenshot-timeout / waiting-for-fonts as recoverable; this new
phrase fell through to FATAL_BROWSER_ERRORS and killed the whole
crawl on the first navigation race.

Add the phrase to RECOVERABLE_NAVIGATION_ERRORS so the explorer
re-queues the action instead of aborting.

Repro: collect docs against a Testomat.io page hosted in beta
(Ember-based SPA). Without the fix, ~30% of pages fail with the
fatal error on the first action. With the fix, those pages
complete normally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…d it

After a navigation completes, ExplorBot's framenavigated handler
overwrites the full ActionResult (with html/links/aria) with a
stripped-down WebPageState that has only { url, title, statusCode }.
The doc-collector then reads getCurrentState() and gets a state with
state.html === undefined and state.links === [].

Consequences:
- Documentarian receives empty html -> page documentation degrades
  to a near-empty stub.
- extractNextPaths() sees an empty links array -> the subtree crawl
  stops at the entry page even when many followable links exist.

Two targeted fixes:
1. In the main collect loop, if state.html is falsy, force a
   capturePageState (with screenshots if configured). This is cheap
   compared to the AI documentation step that follows.
2. In extractNextPaths, if state.links is empty but state.html is
   present, fall back to extractLinks(state.html) so subtree
   traversal still finds child paths.

Repro: collect against a Testomat.io project page. Before:
"Pages documented: 1". After: full subtree (3-7 pages depending
on the entry).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@gololdf1sh gololdf1sh force-pushed the fix/doc-collector-spa-robustness branch from b45fdaa to 5c37bcd Compare May 20, 2026 12:50
@gololdf1sh gololdf1sh changed the title fix(doc-collector): three SPA-robustness fixes (env lookup, recoverable nav error, page-state repopulation) fix(doc-collector): two SPA-robustness fixes (recoverable nav error, page-state repopulation) May 20, 2026
@gololdf1sh gololdf1sh force-pushed the fix/doc-collector-spa-robustness branch from 16857af to 5c37bcd Compare May 20, 2026 12:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant