cache: diacritic-insensitive matching, typing, validation by robgruen · Pull Request #2231 · microsoft/TypeAgent

robgruen · 2026-04-22T23:03:41Z

Diacritic-insensitive matching in construction cache (handled inside MatchSet regex to avoid grammar regression)
Validate number parameters against request in explainWorkQueue
Replace as any with ParamObjectType in validateExplanation

Split from #2210.

Agent-Logs-Url: https://github.com/microsoft/TypeAgent/sessions/ffed37bc-04b5-40e4-a31d-0b28a392e95c Co-authored-by: robgruen <25374553+robgruen@users.noreply.github.com>

…ectType - Remove three TODO typing comments - Use ParamObjectType cast instead of 'as any' in ensureProperties - Add direct unit tests for getActionProperty and ensureProperties Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Use normalizeParamString (NFD + combining mark removal) in MatchSet construction so that accented variants (e.g. "Beyoncé" vs "Beyonce") match interchangeably. Normalize request before matching in both Construction.match() and ConstructionCache.match() using a separate variable so the original cased request is preserved in the result. Update grammar.spec.ts to use normalizeParamString instead of toLowerCase for the lower-case roundtrip test. Mark related TODOs done. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Extend the explainable-value check to also verify that number parameter values appear literally in the request string, mirroring the existing string check. Update the test that was previously documenting the broken behavior to assert the correct behavior, and add a passing case. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ression The previous approach normalized MatchSet entries and request strings (stripping diacritics on both sides). That broke the grammar matcher tests because the generated grammar uses MatchSet entries as literal tokens and the grammar matcher is only case-insensitive — not diacritic-insensitive — so stripped tokens like "beyonce" could not match input "Beyoncé". Keep MatchSet.matches as the lowercased original text (unchanged from main) and instead make the MatchSet regex accept both forms per character via a (?:<accented>|<base>) alternation for characters that carry combining marks. Request strings are no longer normalized before matching. The construction cache still gets diacritic-insensitive matching, and the grammar path continues to behave as before. Also revert the grammar.spec.ts lower-case round-trip assertion back to toLowerCase since the grammar matcher does not strip diacritics. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- knowPro/collections: matchesWithMinHitCount condition > 0 → > 1 (optimization, all matches already have hitCount ≥ 1 by design) - dispatcher/unknownSwitcher: parallelize assistant-selection partitions via Promise.all; extract selectFromPartitions() for testability - knowledgeProcessor actions/entities: addMultiple concurrency 1 → settings.concurrency - cache/explainWorkQueue: extend parameter-value-in-request check to cover numbers - azure-ai-foundry/wikipedia: add optional locale param (default "en") to getPageObject and getPageMarkdown Each fix is covered by new or updated unit tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Agent-Logs-Url: https://github.com/microsoft/TypeAgent/sessions/40c35148-43e0-467e-83d5-276e04185071 Co-authored-by: robgruen <25374553+robgruen@users.noreply.github.com>

Copilot AI and others added 7 commits April 22, 2026 14:01

fix: run prettier on matchPart.ts to fix lint CI failure

a4d2658

Agent-Logs-Url: https://github.com/microsoft/TypeAgent/sessions/ffed37bc-04b5-40e4-a31d-0b28a392e95c Co-authored-by: robgruen <25374553+robgruen@users.noreply.github.com>

lint

e4ea0e9

robgruen had a problem deploying to development-fork April 22, 2026 23:03 — with GitHub Actions Error

Copilot started work on behalf of robgruen April 22, 2026 23:12 View session

fix: format construction.spec.ts with prettier

96c004c

Agent-Logs-Url: https://github.com/microsoft/TypeAgent/sessions/40c35148-43e0-467e-83d5-276e04185071 Co-authored-by: robgruen <25374553+robgruen@users.noreply.github.com>

Copilot finished work on behalf of robgruen April 22, 2026 23:16

Merge remote-tracking branch 'origin/main' into dev/robgruen/TODO/cache

c679603

robgruen had a problem deploying to development-fork April 22, 2026 23:36 — with GitHub Actions Failure

robgruen temporarily deployed to development-fork April 22, 2026 23:36 — with GitHub Actions Inactive

Merge branch 'main' into dev/robgruen/TODO/cache

649a272

robgruen temporarily deployed to development-fork April 23, 2026 06:15 — with GitHub Actions Inactive

robgruen added this pull request to the merge queue Apr 23, 2026

Merged via the queue into main with commit 2701512 Apr 23, 2026
21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cache: diacritic-insensitive matching, typing, validation#2231

cache: diacritic-insensitive matching, typing, validation#2231
robgruen merged 10 commits intomainfrom
dev/robgruen/TODO/cache

robgruen commented Apr 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

robgruen commented Apr 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants