Skip to content

fix(brightdata): fix async Discover API, echo-back fields, and registry ordering#4188

Merged
waleedlatif1 merged 6 commits intostagingfrom
waleedlatif1/worcester-v2
Apr 15, 2026
Merged

fix(brightdata): fix async Discover API, echo-back fields, and registry ordering#4188
waleedlatif1 merged 6 commits intostagingfrom
waleedlatif1/worcester-v2

Conversation

@waleedlatif1
Copy link
Copy Markdown
Collaborator

@waleedlatif1 waleedlatif1 commented Apr 15, 2026

Summary

  • Fix Discover API to handle async response pattern (POST returns task_id, poll GET for results) using postProcess polling, matching Firecrawl's crawl pattern
  • Fix echo-back fields (url, query, snapshotId, searchEngine) across 5 tools to use params from transformResponse second argument instead of hardcoded null or unreliable response extraction
  • Alphabetize block registry entry (box before brandfetch/brightdata)
  • Update Agiloft docs bgColor to white

Test plan

  • Verify Discover tool returns results (previously silently returned empty array)
  • Verify scrape_url output includes the scraped URL
  • Verify download_snapshot output includes the snapshot ID from params
  • Verify SERP search output includes query and search engine with params fallback
  • Verify cancel_snapshot output includes snapshot ID from params
  • Verify block registry alphabetical ordering

waleedlatif1 and others added 4 commits April 15, 2026 13:16
transformResponse receives params as its second argument. Use it to
return the original url, query, snapshotId, and searchEngine values
instead of hardcoding null or extracting from response data that may
not contain them.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Bright Data Discover API is asynchronous — POST /discover returns
a task_id, and results must be polled via GET /discover?task_id=...
The previous implementation incorrectly treated it as synchronous,
always returning empty results.

Uses postProcess (matching Firecrawl crawl pattern) to poll every 3s
with a 120s timeout until status is "done".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move box before brandfetch/brightdata to maintain alphabetical ordering.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel bot commented Apr 15, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Apr 15, 2026 11:08pm

Request Review

@cursor
Copy link
Copy Markdown

cursor bot commented Apr 15, 2026

PR Summary

Medium Risk
Touches external-tool execution paths and output shaping (including new polling behavior), which could affect runtime timing and downstream workflows relying on Bright Data tool outputs.

Overview
Fixes brightdata_discover to handle Bright Data’s async flow by returning taskId from the initial POST and adding a postProcess polling loop (with logging + timeout) to fetch final results.

Updates multiple Bright Data tools’ transformResponse to use the passed params for echo-back outputs (snapshotId, url, query, searchEngine) instead of hardcoded null or parsing unreliable response fields, and extends the BrightDataDiscoverResponse type to include optional taskId.

Minor cleanup: reorders box in the block registry to maintain alphabetical ordering, changes the Agiloft docs info card color to white, and updates bun.lock metadata.

Reviewed by Cursor Bugbot for commit 9e9bf35. Configure here.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 15, 2026

Greptile Summary

Fixes the BrightData Discover tool to use the async POST→poll-GET pattern (the API returns a task_id immediately, results arrive via polling), and corrects echo-back fields across five tools that were returning null or reading unreliable response body fields instead of using params. Registry alphabetization and Agiloft docs color are also tidied. The two previously flagged issues (taskId type safety, hardcoded polling timeout) are both resolved in this revision.

Confidence Score: 5/5

Safe to merge — all remaining findings are P2 style suggestions that don't block correct behavior.

The two prior P1 issues are resolved, the polling implementation follows the established Firecrawl pattern, and the echo-back fixes are clean. Remaining comments (poll error body, content fallbacks) are speculative improvements that don't affect the core correctness of the fix.

apps/sim/tools/brightdata/discover.ts — poll error message and content field fallbacks worth a second look before merging.

Important Files Changed

Filename Overview
apps/sim/tools/brightdata/discover.ts Rewrites Discover tool to use async POST→poll-GET pattern; correctly uses DEFAULT_EXECUTION_TIMEOUT_MS and adds taskId to response type; minor: poll error drops response body, content fallbacks for text/markdown fields removed
apps/sim/tools/brightdata/types.ts Adds optional taskId field to BrightDataDiscoverResponse.output, resolving the unsafe cast noted in the prior review thread
apps/sim/tools/brightdata/cancel_snapshot.ts Fixes snapshotId echo-back to use params.snapshotId instead of unreliable response body extraction; response body is now correctly consumed and discarded
apps/sim/tools/brightdata/serp_search.ts Adds params fallback for query and searchEngine when the response general object is missing; correct priority order (response data first, params second)
apps/sim/blocks/registry.ts Moves box before brandfetch/brightdata to restore alphabetical ordering — straightforward fix

Sequence Diagram

sequenceDiagram
    participant W as Workflow Executor
    participant T as brightDataDiscoverTool
    participant BD as BrightData API

    W->>T: execute(params)
    T->>BD: POST /discover (query, intent, ...)
    BD-->>T: { task_id: "abc123" }
    T->>T: transformResponse → output { taskId, results: [] }

    Note over T: postProcess polling loop
    loop every 3s until MAX_POLL_TIME_MS
        T->>BD: GET /discover?task_id=abc123
        BD-->>T: { status: "pending" }
    end
    T->>BD: GET /discover?task_id=abc123
    BD-->>T: { status: "done", results: [...] }
    T->>T: map results → { url, title, description, relevanceScore, content }
    T-->>W: { success: true, output: { results, query, totalResults } }
Loading

Reviews (3): Last reviewed commit: "fix(brightdata): use platform execution ..." | Re-trigger Greptile

Comment thread apps/sim/tools/brightdata/discover.ts Outdated
Comment thread apps/sim/tools/brightdata/discover.ts
The executor wraps postProcess in try-catch and falls back to the
intermediate transformResponse result on error, which has success: true
with empty results. Throwing errors would silently return empty results.

Match Firecrawl's pattern: return { ...result, success: false, error }
instead of throwing. Also add taskId to BrightDataDiscoverResponse type
to eliminate unsafe casts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Comment thread apps/sim/tools/brightdata/discover.ts Outdated
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit b549ee4. Configure here.

Replace hardcoded 120s timeout with DEFAULT_EXECUTION_TIMEOUT_MS to
match Firecrawl and other async polling tools. Respects platform-
configured limits (300s free, 3000s paid).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 9e9bf35. Configure here.

@waleedlatif1 waleedlatif1 merged commit 6dddc3f into staging Apr 15, 2026
14 checks passed
@waleedlatif1 waleedlatif1 deleted the waleedlatif1/worcester-v2 branch April 15, 2026 23:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant