fix(2495): Karen humanizes EDS list output before showing to user#360
fix(2495): Karen humanizes EDS list output before showing to user#360
Conversation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Staging verification — PASSDeployed dev build `1.2.271` to staging marketplace, installed on `scen-bench-01` (persona `0i0iTV6ZWU`). Seeded two EDS sources to exercise both happy and error paths:
Raw tool output (the bug shape) Karen received``` Karen's user-facing reply
Pass criteria
Side-effectsNone observed. Karen ran two defensive extra tool calls (`kanban_advanced(op=status_my_tasks)` and `eds_setup(op=help)`) but they didn't pollute the user reply. ~7s reply latency. Screenshots: `flexus-test-agents/screenshots/adhoc/2495-test-{1..5}.png`. Ready to merge. |
Round 2 verification — PASS-with-caveatRe-tested with broader scenarios. Three scenarios run on staging dev install `1.2.271` / persona `0i0iTV6ZWU`:
#2515 regression checkClean — no `Status: support_collection_status()` leaks in any of A/B/D. Note: this branch does NOT contain #2515's prompt rule yet (branched off main pre-#359). When #359 merges, expect a textual conflict in the same KAREN_PERSONALITY insertion point. Caveat (Scenario D)The new prompt rule says "hide all raw IDs". It works for the natural list-EDS reply path (A, B). It loses against the skill's "set knowledge EDS" instruction in multi-step orchestrated flows — model treats IDs as actionable parameters and quotes them for copy-paste. Possible follow-up: tighten rule wording, or co-locate it next to skill SKILL.md instructions that mention EDS IDs. Tracking as scope of follow-up rather than blocking this PR. Pre-merge
Round 2 screenshots: `flexus-test-agents/screenshots/adhoc/2495-r2-{A-persona-page,A-empty-list,A-empty-thread,B-many-sources,D-kb-collection}.png`. |
| but it is not useful for a regular user who asks a question, so do not mention it. | ||
|
|
||
| When EDS tools return raw data (JSON, markdown tables, field names like `eds_id` or `eds_type`, timestamps, | ||
| `eds_scan_problem`), NEVER paste it verbatim. Translate into plain English: one line per source with a friendly name, |
There was a problem hiding this comment.
don't directly mention English, user may be talking in diff language
|
|
||
| When EDS tools return raw data (JSON, markdown tables, field names like `eds_id` or `eds_type`, timestamps, | ||
| `eds_scan_problem`), NEVER paste it verbatim. Translate into plain English: one line per source with a friendly name, | ||
| type ("Google Drive folder", "website crawl", "uploaded file"), last scan time in relative phrasing ("scanned today", |
There was a problem hiding this comment.
last scan time, prompt is maybe too specific here, could be something more generic like don't print timestamps, use relative human readable time
| When EDS tools return raw data (JSON, markdown tables, field names like `eds_id` or `eds_type`, timestamps, | ||
| `eds_scan_problem`), NEVER paste it verbatim. Translate into plain English: one line per source with a friendly name, | ||
| type ("Google Drive folder", "website crawl", "uploaded file"), last scan time in relative phrasing ("scanned today", | ||
| "last scanned 3 days ago"), and any blocker in plain words — hide all raw IDs, hashes, and technical field names. |
There was a problem hiding this comment.
we may not need 4 lines of prompt for this, instruction can be a bit shorter
Fixes Fibery #2495 — Karen was relaying raw EDS tool output (JSON dumps, markdown tables with
eds_id,eds_type,eds_scan_problem, raw timestamps) verbatim to users. Katrin: "lots of unclear letters & symbols, not user-friendly."Diagnosis
flexus_eds_setup(op="list")andlist_eds_str()in flexus backend return technical-style strings:Karen had no instruction telling her to rephrase before showing.
Change
One rule in `KAREN_PERSONALITY` (concatenated into 4 user-facing experts: `default`, `very_limited`, `messages_triage`, `explore` — not `post_conversation`):
Net: +5 / 0 in `karen_prompts.py`, plus VERSION bump.
Tournament context
Tournament was N=2: A (this branch, prompt-only) + B (backend formatter rewrite in `flexus` repo). B's worktree crashed mid-run with an API error and never pushed. Promoted A directly. Backend formatter remains as a candidate follow-up if prompt fix proves insufficient.
Honest doubts
🤖 Single-candidate tournament outcome.