Skip to content

cache: diacritic-insensitive matching, typing, validation#2231

Merged
robgruen merged 10 commits intomainfrom
dev/robgruen/TODO/cache
Apr 23, 2026
Merged

cache: diacritic-insensitive matching, typing, validation#2231
robgruen merged 10 commits intomainfrom
dev/robgruen/TODO/cache

Conversation

@robgruen
Copy link
Copy Markdown
Collaborator

  • Diacritic-insensitive matching in construction cache (handled inside MatchSet regex to avoid grammar regression)
  • Validate number parameters against request in explainWorkQueue
  • Replace as any with ParamObjectType in validateExplanation

Split from #2210.

Copilot AI and others added 7 commits April 22, 2026 14:01
…ectType

- Remove three TODO typing comments
- Use ParamObjectType cast instead of 'as any' in ensureProperties
- Add direct unit tests for getActionProperty and ensureProperties

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Use normalizeParamString (NFD + combining mark removal) in MatchSet
construction so that accented variants (e.g. "Beyoncé" vs "Beyonce")
match interchangeably. Normalize request before matching in both
Construction.match() and ConstructionCache.match() using a separate
variable so the original cased request is preserved in the result.
Update grammar.spec.ts to use normalizeParamString instead of
toLowerCase for the lower-case roundtrip test. Mark related TODOs done.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extend the explainable-value check to also verify that number parameter
values appear literally in the request string, mirroring the existing
string check. Update the test that was previously documenting the broken
behavior to assert the correct behavior, and add a passing case.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ression

The previous approach normalized MatchSet entries and request strings
(stripping diacritics on both sides). That broke the grammar matcher
tests because the generated grammar uses MatchSet entries as literal
tokens and the grammar matcher is only case-insensitive — not
diacritic-insensitive — so stripped tokens like "beyonce" could not
match input "Beyoncé".

Keep MatchSet.matches as the lowercased original text (unchanged from
main) and instead make the MatchSet regex accept both forms per
character via a (?:<accented>|<base>) alternation for characters that
carry combining marks. Request strings are no longer normalized before
matching. The construction cache still gets diacritic-insensitive
matching, and the grammar path continues to behave as before.

Also revert the grammar.spec.ts lower-case round-trip assertion back to
toLowerCase since the grammar matcher does not strip diacritics.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- knowPro/collections: matchesWithMinHitCount condition > 0 → > 1 (optimization,
  all matches already have hitCount ≥ 1 by design)
- dispatcher/unknownSwitcher: parallelize assistant-selection partitions via
  Promise.all; extract selectFromPartitions() for testability
- knowledgeProcessor actions/entities: addMultiple concurrency 1 → settings.concurrency
- cache/explainWorkQueue: extend parameter-value-in-request check to cover numbers
- azure-ai-foundry/wikipedia: add optional locale param (default "en") to
  getPageObject and getPageMarkdown

Each fix is covered by new or updated unit tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@robgruen robgruen temporarily deployed to development-fork April 23, 2026 06:15 — with GitHub Actions Inactive
@robgruen robgruen temporarily deployed to development-fork April 23, 2026 06:15 — with GitHub Actions Inactive
@robgruen robgruen added this pull request to the merge queue Apr 23, 2026
Merged via the queue into main with commit 2701512 Apr 23, 2026
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants