Skip to content

fix: voice tracking highlight and mic stall bugs#37

Open
afkehaya wants to merge 1 commit intof:masterfrom
afkehaya:fix/voice-tracking-bugs
Open

fix: voice tracking highlight and mic stall bugs#37
afkehaya wants to merge 1 commit intof:masterfrom
afkehaya:fix/voice-tracking-bugs

Conversation

@afkehaya
Copy link
Copy Markdown

Summary

  • Seamless recognition restart: Keep AVAudioEngine alive across SFSpeechRecognitionTask restarts, add pre-emptive 55-second restart timer to beat Apple's ~60s timeout, sync matchStartOffset before each restart, and add contextualStrings for better STT accuracy
  • Fix fuzzy matching false positives: Remove overly permissive contains check (e.g. "and" matching "demand"), fix charLevelMatch skip-both fallback that treated gibberish as matches, fix wordLevelMatch +1 space overcount, fix unicode scalar vs Character count mismatch
  • Add confidence gating: Replace max(charResult, wordResult) with agreement-based selection; require 2-of-3 recent results to agree before committing large forward jumps
  • Improve retry resilience: Distinguish timeout errors (code 1110/216) from real errors — no retry limit for expected timeouts, backoff only for genuine failures
  • Architecture cleanup: Merge two polling timers in observeDismiss() into one, fix retain cycle and double-dismiss race in dismiss(), fix cancelled-task error callback race in restartTask()

Context

Two user-reported bugs in Word Tracking mode:

  1. Highlight doesn't track at the right speed — jumps erratically, lags, or races ahead of the speaker
  2. Mic stalls out — speech recognition intermittently stops responding after ~60 seconds

Root cause analysis revealed the "mic stalling" was primarily a matching bug masquerading as an audio bug — the mic was working, but matchStartOffset staying at 0 after recognition restarts meant new session results couldn't advance the highlight past its current position.

Test plan

  • Read a multi-paragraph script in Word Tracking mode — highlight should track smoothly without jumps
  • Continue reading past 60 seconds — no visible pause or stall at the timeout boundary
  • Test with accented text (French, Spanish) — highlight should not drift
  • Test mic switching mid-session — should recover gracefully
  • Test pause/resume — highlight should continue from correct position
  • Test Director Mode live-edit — highlight should not jump on text update
  • Verify Classic and Silence-Paused modes still work correctly (no regressions)

🤖 Generated with Claude Code

Addresses two user-reported bugs: (1) highlight not tracking at the right
speed, jumping erratically or lagging behind speech, and (2) mic appearing
to stall out and stop picking up audio after ~60 seconds.

Root causes identified and fixed:

**Seamless recognition restart (P0)**
- Split cleanupRecognition() so AVAudioEngine stays alive across
  SFSpeechRecognitionTask restarts, eliminating audio gaps
- Add pre-emptive 55-second restart timer to beat Apple's ~60s timeout
- Update matchStartOffset to recognizedCharCount before each restart so
  new sessions match from the correct position
- Thread-safe request swapping via NSLock for audio I/O thread safety
- Add contextualStrings from remaining source text for better STT accuracy

**Fix fuzzy matching false positives (P1)**
- Remove overly permissive `contains` check from isFuzzyMatch that caused
  "and" to match "demand", "the" to match "other", etc.
- Tighten prefix matching to require minimum 3-char words
- Require exact match for 2-char words (no edit distance tolerance)
- Fix charLevelMatch skip-both fallback: no longer advances
  lastGoodOrigIndex on genuine mismatches (gibberish no longer matches)
- Fix wordLevelMatch +1 space overcount on last matched word
- Fix unicode scalar vs Character count mismatch in charLevelMatch

**Confidence gating (P2)**
- Replace blind max(charResult, wordResult) with agreement-based selection
- Add sliding window requiring 2-of-3 recent results to agree before
  committing large forward jumps (small steps always pass through)

**Retry resilience (P3)**
- Distinguish timeout errors (code 1110/216) from real errors
- No retry limit for expected timeouts; immediate soft restart
- Backoff with retry limit only for genuine errors

**Architecture cleanup (P4)**
- Merge two polling timers in observeDismiss() into one
- Fix retain cycle in dismiss() asyncAfter closure
- Add isDismissing guard to prevent double-dismiss
- Fix cancelled-task error callback race in restartTask()

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant