Skip to content

Fix macOS silent microphone capture handling#597

Open
basnijholt wants to merge 1 commit into
mainfrom
t3code/f3c48bfb
Open

Fix macOS silent microphone capture handling#597
basnijholt wants to merge 1 commit into
mainfrom
t3code/f3c48bfb

Conversation

@basnijholt

Copy link
Copy Markdown
Owner

Summary

  • detect all-zero live microphone captures and surface a clear macOS microphone-permission error instead of silently returning no transcript
  • add macOS microphone permission checks before hold/toggle recording starts
  • cover silent-capture and microphone-permission presentation behavior with tests

Test Plan

  • /Users/bas.nijholt/.local/bin/uv run pytest -q
  • /Users/bas.nijholt/.local/bin/uv run ruff check agent_cli/services/asr.py agent_cli/agents/transcribe.py tests/test_asr.py tests/agents/test_transcribe.py tests/agents/test_transcribe_recovery.py
  • swift test --package-path macos/AgentCLI
  • pre-commit hooks during git commit

@greptile-apps

greptile-apps Bot commented Jun 18, 2026

Copy link
Copy Markdown

Greptile Summary

This PR adds detection and clear error surfacing for macOS microphone permission issues, where silent all-zero audio would previously produce an empty transcript with no user-visible explanation. It also adds upfront permission gating in the Swift menu-bar layer so the system dialog fires before recording starts.

  • Python side: A new SilentAudioCaptureError exception and AudioCaptureStats dataclass track whether captured PCM audio is all-zero; after each Wyoming recording session the stats are checked and the exception propagates up through a new structured error field in TranscriptResult, printed and surfaced as a non-zero exit.
  • Swift side: A new MicrophonePermission.swift module wraps AVCaptureDevice.authorizationStatus behind a testable protocol, and AgentCommandRunner gates both hold and toggle recording paths through ensureMicrophonePermissionForRecording() before starting any command.

Confidence Score: 4/5

Safe to merge; all changes are additive and the new error path is well-isolated from the normal recording flow.

The core detection logic (all-zero PCM + empty transcript → SilentAudioCaptureError) is sound and thoroughly tested. The Swift permission gating is injected via a protocol so it does not change existing behaviour when permissions are already granted. The only items worth revisiting are a dead None-guard on send_task, a class-raise instead of an instance-raise in the helper, a two-call TOCTOU in requestAccessIfNeeded, and a missing notDetermined test case — all minor.

agent_cli/services/asr.py (redundant None guard and class-raise style), macos/AgentCLI/Sources/AgentCLI/MicrophonePermission.swift (double currentStatus() call)

Important Files Changed

Filename Overview
agent_cli/services/asr.py Adds AudioCaptureStats dataclass, SilentAudioCaptureError, and _peak_int16_sample; wires stats tracking into _send_audio and checks after receive completes. send_task is guarded with a redundant None-check that the return type of manage_send_receive_tasks makes unreachable.
agent_cli/agents/transcribe.py Wraps live_transcriber call in try/except for SilentAudioCaptureError and adds error field to TranscriptResult; _transcript_result_error helper safely inspects the dict. Error-exit logic is correctly placed after json_output printing in both recording paths.
macos/AgentCLI/Sources/AgentCLI/MicrophonePermission.swift New file with clean protocol abstraction over AVCaptureDevice; requestAccessIfNeeded calls currentStatus() twice with a narrow TOCTOU window, and the notDetermined presentation message is consistent with the UX flow.
macos/AgentCLI/Sources/AgentCLI/AgentCommandRunner.swift Injects MicrophonePermissionControlling via init and calls ensureMicrophonePermissionForRecording() at both hold and toggle recording entry points. Notification and status message are set before returning false.
macos/AgentCLI/Tests/AgentCLITests/MicrophonePermissionTests.swift Tests authorized and denied states; the notDetermined presentation (canRecord=false, distinct message) is not covered.
tests/test_asr.py Two new tests: one validates AudioCaptureStats byte/peak values from _send_audio, another confirms SilentAudioCaptureError is raised when all-zero capture meets an empty Wyoming transcript.
tests/agents/test_transcribe.py New test drives _async_main through the SilentAudioCaptureError path and asserts the error field and warning log are both populated.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant U as User
    participant ACR as AgentCommandRunner (Swift)
    participant MPC as MicrophonePermissionController
    participant PY as transcribe._async_main (Python)
    participant ASR as asr._transcribe_live_audio_wyoming
    participant SA as _send_audio
    participant WY as Wyoming ASR Server

    U->>ACR: hold/toggle recording
    ACR->>MPC: currentStatus()
    MPC-->>ACR: .notDetermined / .denied / .authorized

    alt not authorized
        ACR->>MPC: requestAccessIfNeeded(completion)
        MPC-->>U: System permission dialog (notDetermined only)
        ACR-->>U: statusMessage + notification
        ACR-->>U: return (recording blocked)
    else authorized
        ACR->>PY: run transcribe command
        PY->>ASR: live_transcriber(...)
        ASR->>SA: _send_audio(stream, ...)
        SA->>SA: capture_stats.observe(chunk) per chunk
        SA->>WY: AudioChunk events
        SA-->>ASR: AudioCaptureStats(byte_count, peak_sample)
        ASR->>WY: AudioStop
        WY-->>ASR: transcript string

        alt "peak_sample==0 AND transcript empty"
            ASR-->>PY: raise SilentAudioCaptureError
            PY-->>PY: "catch => TranscriptResult(error=message)"
            PY-->>U: print error + exit(1)
        else normal
            ASR-->>PY: transcript string
            PY-->>U: transcript output
        end
    end
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant U as User
    participant ACR as AgentCommandRunner (Swift)
    participant MPC as MicrophonePermissionController
    participant PY as transcribe._async_main (Python)
    participant ASR as asr._transcribe_live_audio_wyoming
    participant SA as _send_audio
    participant WY as Wyoming ASR Server

    U->>ACR: hold/toggle recording
    ACR->>MPC: currentStatus()
    MPC-->>ACR: .notDetermined / .denied / .authorized

    alt not authorized
        ACR->>MPC: requestAccessIfNeeded(completion)
        MPC-->>U: System permission dialog (notDetermined only)
        ACR-->>U: statusMessage + notification
        ACR-->>U: return (recording blocked)
    else authorized
        ACR->>PY: run transcribe command
        PY->>ASR: live_transcriber(...)
        ASR->>SA: _send_audio(stream, ...)
        SA->>SA: capture_stats.observe(chunk) per chunk
        SA->>WY: AudioChunk events
        SA-->>ASR: AudioCaptureStats(byte_count, peak_sample)
        ASR->>WY: AudioStop
        WY-->>ASR: transcript string

        alt "peak_sample==0 AND transcript empty"
            ASR-->>PY: raise SilentAudioCaptureError
            PY-->>PY: "catch => TranscriptResult(error=message)"
            PY-->>U: print error + exit(1)
        else normal
            ASR-->>PY: transcript string
            PY-->>U: transcript output
        end
    end
Loading

Reviews (1): Last reviewed commit: "Fix macOS silent microphone capture hand..." | Re-trigger Greptile

Comment thread agent_cli/services/asr.py
Comment on lines +107 to +108
def _raise_silent_audio_capture_error() -> NoReturn:
raise SilentAudioCaptureError

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 _raise_silent_audio_capture_error raises the class object rather than an instance (raise SilentAudioCaptureError vs raise SilentAudioCaptureError()). Python 3 instantiates the class with no arguments when it sees raise ExceptionClass, so the DEFAULT_MESSAGE is used correctly and there is no runtime bug. However, raising a class directly is unconventional, surprises some static analysers, and adds an unnecessary call frame to every traceback. Since the function has no other callers and provides no testability benefit, it can be collapsed to a direct raise at the call site.

Suggested change
def _raise_silent_audio_capture_error() -> NoReturn:
raise SilentAudioCaptureError
def _raise_silent_audio_capture_error() -> NoReturn:
raise SilentAudioCaptureError()

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment thread agent_cli/services/asr.py
return_when=asyncio.ALL_COMPLETED,
)
result = recv_task.result()
capture_stats = send_task.result() if send_task is not None else None

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 send_task is never None here. manage_send_receive_tasks is typed as returning tuple[asyncio.Task, asyncio.Task] and always calls asyncio.create_task(send_task_coro) unconditionally, so the guard if send_task is not None is dead code. It could mislead a future reader into thinking there is a code path where send_task can be absent.

Suggested change
capture_stats = send_task.result() if send_task is not None else None
capture_stats = send_task.result()

Comment on lines +51 to +57
func requestAccessIfNeeded(completion: @escaping (Bool) -> Void) {
guard currentStatus() == .notDetermined else {
completion(currentStatus() == .authorized)
return
}
AVCaptureDevice.requestAccess(for: .audio, completionHandler: completion)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 currentStatus() is called twice on the same execution path: once for the guard and once inside the else branch for the completion value. Between the two calls the OS authorization status could change (e.g., user grants access in System Settings while the app is on the guard check), causing the completion to be called with a stale value. Caching the result in a local constant eliminates the TOCTOU window.

Suggested change
func requestAccessIfNeeded(completion: @escaping (Bool) -> Void) {
guard currentStatus() == .notDetermined else {
completion(currentStatus() == .authorized)
return
}
AVCaptureDevice.requestAccess(for: .audio, completionHandler: completion)
}
func requestAccessIfNeeded(completion: @escaping (Bool) -> Void) {
let status = currentStatus()
guard status == .notDetermined else {
completion(status == .authorized)
return
}
AVCaptureDevice.requestAccess(for: .audio, completionHandler: completion)
}

Comment on lines +6 to +22
func testAuthorizedPermissionAllowsRecording() {
let presentation = MicrophonePermissionPresentation(status: .authorized)

XCTAssertTrue(presentation.canRecord)
XCTAssertEqual(presentation.statusMessage, nil)
}

func testDeniedPermissionExplainsSettingsFix() {
let presentation = MicrophonePermissionPresentation(status: .denied)

XCTAssertFalse(presentation.canRecord)
XCTAssertEqual(
presentation.statusMessage,
"Allow Microphone permission for Agent CLI in System Settings."
)
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Missing notDetermined presentation test

The notDetermined state has a distinct canRecord = false and a different statusMessage ("Approve Microphone permission…") compared to the denied branch. Adding a test case would guard against the two non-authorized messages being accidentally swapped or the canRecord property being set to true for notDetermined in a future refactor.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant