Skip to content

feat(hub): native companion (FCM) push channel + device registry + pairing QR#803

Open
heavygee wants to merge 10 commits into
tiann:mainfrom
heavygee:feat/companion-fcm-push-api
Open

feat(hub): native companion (FCM) push channel + device registry + pairing QR#803
heavygee wants to merge 10 commits into
tiann:mainfrom
heavygee:feat/companion-fcm-push-api

Conversation

@heavygee
Copy link
Copy Markdown
Collaborator

@heavygee heavygee commented Jun 4, 2026

Summary

Adds an opt-in, contract-only path for native companion apps (Android phone + Wear OS) to register for FCM push and receive the same notifications HAPI already sends via Web Push - without changing default behavior for any existing user.

Companion to discussion #746.

The companion app source itself is not in this PR; it lives at heavygee/hapi-companion (currently dogfood, eventual home is wherever the maintainer prefers - happy to transfer / mirror). This PR is just the hub-side surface the app talks to, plus a PWA Settings UI + terminal QR for pairing.

Hub additions

  • POST /api/devices/register + DELETE /api/devices/register - FCM token registration scoped per (namespace, deviceId, platform). platform is phone or wear.
  • hub/src/fcm/ - optional FcmService + FcmNotificationChannel, gated entirely by FCM_SERVICE_ACCOUNT_PATH env. If unset, no FCM code paths run, no behavior changes.
  • Schema migration to v10 adds fcm_devices table (id, namespace, token, platform, deviceId, timestamps, unique on (namespace, deviceId, platform)).
  • shared/messages.ts - extractAssistantPlainText() and extractNotifySummary() helpers used by the FCM channel to build human-readable notification bodies (and parse optional AGENT_NOTIFY_SUMMARY {...} JSON tail for richer summaries when agents emit it).
  • docs/api/native-companion-contract.md - documents the data field schema, REST endpoints, severity values, and env knobs.

PWA additions

  • Settings -> Companion section renders a QR code encoding hapicompanion://bind?hub=<base>&code=<token>. QR is gated behind a Show button so the access token isn't always on screen. Copy-link + textual deeplink are also exposed.
  • Hub terminal startup prints both QRs side by side: existing PWA QR (unchanged) plus a new companion deeplink QR. PWA users ignore the second; native users with the app installed onboard from it via Android intent filter.
  • Adds qrcode to web/ (already a hub dep; bun lockfile diff is just the workspace declaration, no new resolved packages).

Behavior gates / opt-in story

User state What changes
No FCM env, no companion app Nothing. Web Push and SSE work as today.
FCM env set, no companion registered Nothing. FCM channel sees no devices and no-ops.
FCM env set, companion registered for namespace X FCM fires for X. Web Push is suppressed for that namespace (avoids duplicate notifications on phone + watch). Other namespaces are unaffected.

The Web Push suppression is the only existing-user-visible change, and only triggers when an FCM device is registered. Happy to gate it behind an additional config flag (e.g. HAPI_PUSH_PREFER_NATIVE) if you'd prefer Web Push remain literally untouched until the operator opts in twice. Disclosed up front so it's not a surprise.

What this PR is not

  • Not the companion app itself (separate repo).
  • No Firebase service account keys, no Play Store config, no signed APKs - secrets / credentials are entirely the maintainer's call when (if) you adopt this.
  • No mandatory dependency on hapi.run infrastructure - self-hosters provision their own Firebase project; the env-gated FCM_SERVICE_ACCOUNT_PATH points at it. Distribution-tier discussion lives in the companion repo's docs/distribution-spectrum.md.

Test plan

  • bun run typecheck passes (hub, web, shared)
  • bun run test passes - 273 hub tests, 736 web tests, no regressions
  • Migration v10 unit-tested (hub/src/store/migration-v10.test.ts)
  • Device routes unit-tested (hub/src/web/routes/devices.test.ts)
  • FCM channel unit-tested (hub/src/fcm/fcmNotificationChannel.test.ts)
  • Push channel suppression unit-tested (hub/src/push/pushNotificationChannel.test.ts)
  • extractAssistantPlainText + extractNotifySummary unit-tested (shared/src/messages.test.ts)
  • Real-device dogfood: Pixel 10a + Pixel Watch, FCM ready / permission / task notifications observed end-to-end via the companion app for ~3 days
  • Reviewer smoke: with FCM_SERVICE_ACCOUNT_PATH unset, no FCM code paths execute (please confirm at your leisure)

Companion app status

  • Wear OS app: notification tap opens session, Allow/Deny/Reply work via phone bridge, "Open on phone" launches the matching session URL, severity-based notification colors, payload-shrink filter for the 100KB Wearable Data Layer cap.
  • Phone app: FCM relay to watch, deeplink-based pairing, dark Material3 theme.
  • Currently versioned 0.3.8 (phone) / 0.5.14 (wear), dogfood-only, no Play Store listing.

Going forward (out of PR scope, your call)

  • Whether the Play Store binary you ship supports Tier 1 (BYO-Firebase via pairing QR) and/or Tier 2 (relay through hapi.run) for self-hosters - tier doc linked above.
  • Repo transfer / collab arrangement for heavygee/hapi-companion. You're already an admin collaborator; happy to transfer ownership when you're ready and stay on as a committer.

heavygee and others added 4 commits June 4, 2026 17:55
Adds opt-in FCM HTTP v1 notification delivery so a companion mobile/wearable
app can receive permission, ready, and task notifications end-to-end. The
channel is gated entirely on FCM_SERVICE_ACCOUNT_PATH + FCM_PROJECT_ID being
set; operators not running a companion see zero behavior change.

What lands:

- POST/DELETE /api/devices/register — JWT-authed FCM token registry,
  upsert on (namespace, deviceId, platform), platforms `phone` | `wear`.
- Sqlite v9 → v10 migration adds `fcm_devices` (idx on namespace + token).
- FcmService — minimal HTTP v1 client, RS256 service-account JWT via
  jose (dep already in tree), 5-minute access-token cache, 401 retry.
- FcmNotificationChannel — implements NotificationChannel, sends data-only
  FCM (so companion can route to phone+watch surfaces). Body composition
  parses an optional trailing `AGENT_NOTIFY_SUMMARY {json}` line for richer
  ready summaries; truncates plain assistant text to 280 chars otherwise.
  Tags each payload with `severity` (info/warning/success/error) so clients
  can color/categorise the notification.
- PushNotificationChannel gains a NativeFallbackProbe — when a namespace
  has at least one registered FCM device, web-push and SSE in-page toast
  are skipped so the operator does not double-notify on phone+browser.
  Probe is no-op when no FCM device is registered; PWA-only setups
  unchanged. Branch trace gated on HAPI_NOTIFY_DEBUG=1.
- shared/src/messages.ts — `extractAssistantPlainText` (codex + Claude SDK
  shapes) and `extractNotifySummary` (strict end-anchored line parser).
- hub/src/notifications/toolArgs.ts — tool-arg formatters lifted out of
  telegram/sessionView (kept duplicated there in this PR; refactor of
  Telegram is a follow-up).
- docs/api/native-companion-contract.md — payload + endpoints + env vars,
  versioned at contract v1.

Test coverage:

- 260 hub tests pass (incl. 23 new across FCM channel, push dedup,
  v10 migration, devices route).
- 60 shared tests pass (messages parsers).

Notes for reviewers:

- Reference companion implementation lives in a separate Android repo
  (Kotlin, phone APK + Wear OS APK) — this PR is hub-side only.
- No new runtime deps (`jose` and `zod` already declared in hub).

Co-authored-by: Cursor <cursoragent@cursor.com>
…ub-on-phone

Adds a Scope section to the native-companion contract so anyone
implementing it knows the audience: operators running the hub on a
server who want phone/watch as a notification surface, not users
expecting a Termux-bundled hub. Mirrors the framing now in
heavygee/hapi-companion README.

Co-authored-by: Cursor <cursoragent@cursor.com>
Removes the prior framing that referenced a non-existent 'Termux
hub-on-phone' alternative. This contract describes a native client to
the same hub the PWA talks to; it does not change where the hub runs.

Co-authored-by: Cursor <cursoragent@cursor.com>
Companion section in Settings renders a QR code encoding the deeplink
hapicompanion://bind?hub=<base>&code=<token>. Scanning it from the HAPI
companion app (Android phone or Wear OS) auto-fills the bind form and
authenticates against this hub - no manual URL/token paste.

QR is gated behind a Show button so the access token doesn't sit visible
on screen by default; a Copy link affordance and the textual deeplink
are also exposed for manual onboarding.

Adds qrcode + @types/qrcode to web/ (already a hub dep, no new resolved
package - just a workspace declaration).

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings

  • [Major] Web push is suppressed even when no FCM channel is active — nativeFallbackProbe returns true solely from stored device rows, but the FCM notification channel is only registered when resolveFcmConfig() succeeds. If an operator has an fcm_devices row and then runs the hub without FCM_SERVICE_ACCOUNT_PATH/project config, PushNotificationChannel returns early and no FCM channel exists to deliver anything, so ready/permission/task notifications disappear. Evidence hub/src/startHub.ts:206.
    Suggested fix:
    const fcmConfig = resolveFcmConfig()
    const nativeFallbackProbe = (namespace: string): boolean =>
        Boolean(fcmConfig) && store.fcm.getDevicesByNamespace(namespace).length > 0
    
    const notificationChannels: NotificationChannel[] = [
        new PushNotificationChannel(
            pushService,
            sseManager,
            visibilityTracker,
            config.publicUrl,
            nativeFallbackProbe
        )
    ]

Summary

  • Review mode: initial
  • Found one notification delivery regression in the FCM/web-push fallback interaction.

Testing

  • Not run (automation)

HAPI Bot

Comment thread hub/src/startHub.ts Outdated
// operator gets duplicate notifications: one in the native app, one
// from the PWA service worker. The probe is namespace-scoped so a user
// who has only registered the PWA still gets web-push as today.
const nativeFallbackProbe = (namespace: string): boolean =>
Copy link
Copy Markdown

@github-actions github-actions Bot Jun 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MAJOR] Web push now defers to native whenever a device row exists, but FcmNotificationChannel is only added later when resolveFcmConfig() succeeds. A stale/registered device row plus missing FCM env makes this channel return before SSE/web-push, while no FCM channel exists to send the notification. Gate the native fallback on active FCM config.

Suggested fix:

const fcmConfig = resolveFcmConfig()
const nativeFallbackProbe = (namespace: string): boolean =>
    Boolean(fcmConfig) && store.fcm.getDevicesByNamespace(namespace).length > 0

After the existing PWA access QR is rendered on tunnel start, also print
the hapicompanion://bind?hub=...&code=... deeplink and a matching QR.

Same tunnel + token, different scheme: phones with the companion app
installed pick up the deeplink via the manifest intent filter; phones
without it ignore it and fall back to the PWA QR above.

QR rendering failure is non-fatal in both cases - the textual deeplink
above the QR is sufficient for manual paste.

Co-authored-by: Cursor <cursoragent@cursor.com>
@heavygee heavygee force-pushed the feat/companion-fcm-push-api branch from 9179ca9 to ae33e74 Compare June 4, 2026 18:39
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings

  • [Major] Web push is suppressed when FCM is not configured — nativeFallbackProbe only checks stored device rows, but the FCM channel is registered only when resolveFcmConfig() succeeds. A stale companion row plus a hub started without FCM_SERVICE_ACCOUNT_PATH makes PushNotificationChannel return before SSE/web-push while no FCM channel exists to deliver the notification. Evidence hub/src/startHub.ts:207.
    Suggested fix:
    const fcmConfig = resolveFcmConfig()
    const nativeFallbackProbe = (namespace: string): boolean =>
        Boolean(fcmConfig) && store.fcm.getDevicesByNamespace(namespace).length > 0
  • [Major] Transient FCM failures delete registered devices — sendToToken() returns false for every non-OK response, but sendToNamespace() treats every false as an invalid token and removes it from fcm_devices. A 429/500/503 from Google or a temporary auth/project problem can permanently unregister the phone/watch, after which web-push also stays suppressed because registration state changed unexpectedly. Evidence hub/src/fcm/fcmService.ts:73.
    Suggested fix:
    type FcmTokenSendResult = 'sent' | 'invalid' | 'failed'
    
    const result = await this.sendToToken(accessToken, device.token, payload, device.platform)
    if (result === 'sent') {
        sent += 1
        return
    }
    failed += 1
    if (result === 'invalid') {
        invalidTokens.push(device.token)
        this.store.fcm.removeDeviceByToken(namespace, device.token)
    }

Summary

  • Review mode: follow-up after new commits
  • Prior bot finding is still present at the current head, and the FCM send path adds a separate device-registration data-loss risk.

Testing

  • Not run (automation). Missing coverage for stale fcm_devices rows when FCM config is absent, and for retaining devices on transient FCM 429/5xx responses.

HAPI Bot

Comment thread hub/src/startHub.ts Outdated
// from the PWA service worker. The probe is namespace-scoped so a user
// who has only registered the PWA still gets web-push as today.
const nativeFallbackProbe = (namespace: string): boolean =>
store.fcm.getDevicesByNamespace(namespace).length > 0
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MAJOR] This still suppresses web-push solely from a stored FCM device row, even though the FCM channel is only added after resolveFcmConfig() succeeds below. If the hub is restarted without FCM_SERVICE_ACCOUNT_PATH or project config while a row remains in fcm_devices, this probe makes PushNotificationChannel return early and no FCM channel exists to send the notification.

Suggested fix:

const fcmConfig = resolveFcmConfig()
const nativeFallbackProbe = (namespace: string): boolean =>
    Boolean(fcmConfig) && store.fcm.getDevicesByNamespace(namespace).length > 0

Comment thread hub/src/fcm/fcmService.ts Outdated
}
failed += 1
invalidTokens.push(device.token)
this.store.fcm.removeDeviceByToken(namespace, device.token)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MAJOR] sendToToken() returns false for every non-OK FCM response, but this branch removes the token unconditionally. That turns transient Google failures such as 429/500/503 into permanent device unregistration, so a phone/watch can stop receiving notifications until it pairs again. Only delete tokens when FCM explicitly reports an invalid/unregistered token.

Suggested fix:

type FcmTokenSendResult = 'sent' | 'invalid' | 'failed'

const result = await this.sendToToken(accessToken, device.token, payload, device.platform)
if (result === 'sent') {
    sent += 1
    return
}
failed += 1
if (result === 'invalid') {
    invalidTokens.push(device.token)
    this.store.fcm.removeDeviceByToken(namespace, device.token)
}

Two bugs surfaced by the upstream review bot:

1) Web Push silently dropped when FCM is not actually configured.
   The native-fallback probe only checked the device registry; it did
   not check whether resolveFcmConfig() actually succeeded. So an
   operator who previously enabled FCM, registered a phone, then later
   started the hub WITHOUT FCM_SERVICE_ACCOUNT_PATH would see the probe
   return true (devices still in DB) -> Web Push suppressed -> no FCM
   channel registered -> notifications go to /dev/null.

   Fix: extracted the probe construction into buildNativeFallbackProbe()
   which short-circuits to () => false when fcmConfig is missing. Probe
   never even consults the device store in the no-config branch, so
   stale rows can never matter.

2) Transient FCM failures permanently unregistered devices.
   sendToToken() returned a single boolean and sendToNamespace() removed
   any device whose send returned false. A 429 (rate limit), 503
   (server error), 401 (auth glitch), or even an ECONNREFUSED would
   delete the device row, after which the user would need to re-pair to
   get notifications again. The bot caught it; the fix is the obvious
   one.

   Fix: sendToToken() now returns 'sent' | 'invalid' | 'failed'.
   - 'invalid' is reserved for the responses that genuinely indicate a
     dead token: HTTP 404 with UNREGISTERED/NOT_FOUND, and HTTP 400
     with INVALID_ARGUMENT explicitly referencing the token field.
   - Everything else (429, 5xx, 401, 403, network errors) is 'failed'
     and counts toward the failed tally without removing the device.

   sendToNamespace() only calls removeDeviceByToken() on 'invalid'.

Tests: 11 new tests across two new files. fcmService.test.ts covers
all six branches (200, 404 unregistered, 429, 503, 401, network error)
plus a mixed-batch case that proves invalid tokens get removed in the
same call where transient-failure tokens survive. nativeFallbackProbe
.test.ts covers both no-config and configured branches plus the
explicit "no-config never touches the store" guarantee.

Hub test count: 273 -> 284 (all passing).

Co-authored-by: Cursor <cursoragent@cursor.com>
@heavygee
Copy link
Copy Markdown
Collaborator Author

heavygee commented Jun 5, 2026

Both findings addressed in 078ec8e:

Finding 1 (web-push gated on fcmConfig): extracted the probe construction into buildNativeFallbackProbe(). When fcmConfig is null the function short-circuits to () => false and never consults the device store, so stale rows from a prior FCM-enabled boot can never silently suppress web-push.

Finding 2 (transient FCM errors no longer unregister devices): sendToToken() now returns 'sent' | 'invalid' | 'failed'. Only 'invalid' (HTTP 404 with UNREGISTERED/NOT_FOUND, or 400 INVALID_ARGUMENT against the token field) calls removeDeviceByToken(). 429 / 5xx / 401 / 403 / network errors are now 'failed' - they count toward the fail tally without dropping the registration.

11 new tests across fcmService.test.ts and nativeFallbackProbe.test.ts covering both fixes, including a mixed-batch case that proves invalid tokens get removed in the same call where transient-failure tokens survive. Hub test count 273 → 284, all passing.

Thanks to the bot for catching both - especially the device-unregistration one, which would have been a really mysterious "I have to re-pair every couple of days" bug in production.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings

  • [Minor] Contract documents the wrong FCM visibility rule — the docs say FCM only sends when web push would fire and the PWA is not visible, but FcmNotificationChannel.deliver() always calls FCM and the new test asserts delivery even with a visible PWA. Companion apps built from this contract can suppress foreground handling and miss notifications. Evidence docs/api/native-companion-contract.md:46.
    Suggested fix:
    Hub sends FCM HTTP v1 whenever a notification event is emitted for a namespace with registered native devices and FCM is configured. Web Push is suppressed for that namespace to avoid duplicate OS notifications.
  • [Minor] Contract advertises an unsupported session-completed event — NotificationHub only calls channels with sendSessionCompletion, and the new FcmNotificationChannel does not implement it, so native companions will never receive the documented session-completed type. Evidence docs/api/native-companion-contract.md:52.
    Suggested fix:
    | `type` | `ready` | `ready`, `permission-request`, `task-notification` |

Summary

  • Review mode: follow-up after new commits
  • Prior bot findings are addressed at the current head. Remaining issues are contract accuracy gaps in the new native companion docs.

Testing

  • Not run (automation): bun is not installed in this runner (bun: command not found).

HAPI Bot

Comment thread docs/api/native-companion-contract.md Outdated

## Outbound push (hub → device)

Hub sends FCM HTTP v1 when Web Push would fire, if FCM is configured and client not visible via SSE (same rule as `PushNotificationChannel`).
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MINOR] This visibility rule no longer matches the implementation: FcmNotificationChannel.deliver() always sends FCM, and the test added in this PR asserts FCM delivery even when the PWA is visible. Companion apps built from this contract can suppress foreground handling and miss notifications.

Suggested fix:

Hub sends FCM HTTP v1 whenever a notification event is emitted for a namespace with registered native devices and FCM is configured. Web Push is suppressed for that namespace to avoid duplicate OS notifications.

Comment thread docs/api/native-companion-contract.md Outdated

| Key | Example | Purpose |
|-----|---------|---------|
| `type` | `ready` | `ready`, `permission-request`, `task-notification`, `session-completed` |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MINOR] The contract advertises session-completed, but the new FcmNotificationChannel does not implement sendSessionCompletion, so NotificationHub skips this channel for completed sessions. Native companions following this contract will wait for an event type the hub never sends.

Suggested fix:

| `type` | `ready` | `ready`, `permission-request`, `task-notification` |

…ent type

HAPI Bot review on PR tiann#803 caught two contract-doc accuracy gaps:

1) Visibility rule was wrong. Doc said "FCM fires when Web Push would
   fire AND client not visible via SSE", but FcmNotificationChannel
   ALWAYS fires regardless of PWA visibility (deliberately - native
   companion is the canonical wrist-first surface, and there is a
   passing test asserting this). Companion app implementers reading
   the contract would have built foreground-suppression logic and
   then dropped notifications when the PWA tab was open.

2) Documented `session-completed` event doesn't exist. NotificationHub
   never calls into a 'session-completed' channel method on
   FcmNotificationChannel; the type would never reach a native client.
   Removed from the documented enum, leaving only the three actual
   events: ready, permission-request, task-notification.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings

  • [Nit] Remove trailing whitespace — git diff --check origin/main...HEAD fails because the new Audience line ends with Markdown hard-break spaces. Evidence docs/api/native-companion-contract.md:3.
    Suggested fix:
    **Audience:** Implementers of native companion apps (Android phone + Wear OS, iOS, etc.) that pair with a hapi hub via FCM.
    **Auth:** Same JWT as the web client (`POST /api/bind``Authorization: Bearer`).

Summary
Review mode: follow-up after new commits
Prior bot findings are addressed at the current head. One whitespace issue remains from the latest full diff review.

Testing
Not run (automation): bun is not installed in this runner (command -v bun returned non-zero). Static check run: git diff --check origin/main...HEAD fails as noted above.

HAPI Bot

Comment thread docs/api/native-companion-contract.md Outdated
@@ -0,0 +1,101 @@
# Native companion API contract (phone + Wear)

**Audience:** Implementers of native companion apps (Android phone + Wear OS, iOS, etc.) that pair with a hapi hub via FCM.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[NIT] git diff --check origin/main...HEAD fails because this line ends with trailing spaces.

Suggested fix:

**Audience:** Implementers of native companion apps (Android phone + Wear OS, iOS, etc.) that pair with a hapi hub via FCM.

…h break

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings

  • [Major] Persist bind token before rendering the pairing QR — CompanionPairing only builds the deeplink from hapi_access_token::<baseUrl> in localStorage, but Telegram binding authenticates via bind() and never stores the CLI token entered on the bind screen (web/src/App.tsx:364, web/src/hooks/useAuth.ts:147). A Telegram user can successfully bind the Mini App, open Settings, and still hit the “signed in via Telegram...” fallback, so the new Settings companion pairing QR is unavailable for a primary entry path. Evidence web/src/components/settings/CompanionPairing.tsx:31.
    Suggested fix:
    // web/src/hooks/useAuth.ts
    const ACCESS_TOKEN_PREFIX = 'hapi_access_token::'
    
    function rememberAccessToken(baseUrl: string, accessToken: string): void {
        try {
            localStorage.setItem(`${ACCESS_TOKEN_PREFIX}${baseUrl}`, accessToken)
        } catch {
            // Ignore storage errors
        }
    }
    
    // inside bind(), after successful client.bind(...)
    rememberAccessToken(baseUrl, accessToken)

Summary
Review mode: follow-up after new commits
Prior whitespace finding is fixed at the current head. One functional issue remains in the Settings companion pairing flow.

Testing
Not run (automation): bun is not installed in this runner (command -v bun returned no path). Static check: git diff --check origin/main...HEAD passed.

HAPI Bot

const [copied, setCopied] = useState(false)
const canvasRef = useRef<HTMLCanvasElement | null>(null)

const accessToken = useMemo(() => readAccessToken(baseUrl), [baseUrl])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MAJOR] This reads only the access-token value stored by browser login / ?token=..., but the Telegram bind path never writes the CLI token to that key. After a Telegram user enters the CLI token on the bind screen and bind() succeeds, Settings still sees an empty accessToken here and shows the fallback text instead of the pairing QR.

Suggested fix:

// web/src/hooks/useAuth.ts
const ACCESS_TOKEN_PREFIX = 'hapi_access_token::'

function rememberAccessToken(baseUrl: string, accessToken: string): void {
    try {
        localStorage.setItem(`${ACCESS_TOKEN_PREFIX}${baseUrl}`, accessToken)
    } catch {
        // Ignore storage errors
    }
}

// inside bind(), after successful client.bind(...)
rememberAccessToken(baseUrl, accessToken)

…works

The Settings -> Companion pairing QR reads the original CLI access token
from localStorage (hapi_access_token::<baseUrl>) so it can be encoded into
the hapicompanion://bind deeplink. For browser/CLI logins useAuthSource
already persists the token via setAccessToken, but the Telegram Mini App
bind path went through useAuth.bind() which exchanged the typed CLI token
for a JWT and never persisted it. Telegram users therefore always saw the
"signed in via Telegram..." fallback and got no usable QR.

After a successful client.bind() we now mirror useAuthSource's behavior
and write the same accessToken to the same localStorage key, restoring
parity between the two auth paths. No change for browser/CLI users.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings

  • [Major] Do not suppress Web Push before FCM actually succeeds — the new native fallback probe skips the PWA/SSE path whenever a device row exists, but FCM delivery can still fail transiently or during OAuth token exchange. FcmService.sendToNamespace() explicitly treats 401/429/5xx/network errors as failures while keeping the device registered, and getFcmAccessToken() can throw before any send. In those cases this branch returns before Web Push/SSE runs, so permission/ready/task notifications disappear for the namespace. Evidence hub/src/push/pushNotificationChannel.ts:124, related FCM failure paths hub/src/fcm/fcmService.ts:70 and hub/src/fcm/fcmService.ts:144.
    Suggested fix:
    // Make native delivery own the fallback decision instead of suppressing
    // Web Push from a separate channel based only on registration.
    private async sendNativeFirst(
        session: Session,
        fcmPayload: FcmSendPayload,
        sendWebFallback: () => Promise<void>
    ): Promise<void> {
        const result = await this.fcmService.sendToNamespace(session.namespace, fcmPayload)
            .catch(() => ({ sent: 0, failed: 1, invalidTokens: [] }))
    
        if (result.sent === 0) {
            await sendWebFallback()
        }
    }

Questions

  • None.

Summary

  • Review mode: follow-up after new commits
  • One Major issue remains: native registration can suppress all web notification fallbacks even when FCM does not deliver. Prior Telegram bind-token finding appears fixed in web/src/hooks/useAuth.ts.

Testing

  • Not run (automation): bun is not installed in this runner.
  • Static check: git diff --check origin/main...HEAD passed.

HAPI Bot

}
}

if (this.shouldSkipWebPush(session.namespace)) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MAJOR] This suppresses both the SSE toast path and Web Push based only on a registered native device, before the FCM channel has actually delivered anything. FcmService.sendToNamespace() can fail transiently (401/429/5xx/network) and keeps the device registered, and getFcmAccessToken() can throw before sends start; in all of those cases this branch still returns and the notification is lost for this namespace.

Suggested fix:

// Make native delivery own the fallback decision instead of suppressing
// Web Push from a separate channel based only on registration.
private async sendNativeFirst(
    session: Session,
    fcmPayload: FcmSendPayload,
    sendWebFallback: () => Promise<void>
): Promise<void> {
    const result = await this.fcmService.sendToNamespace(session.namespace, fcmPayload)
        .catch(() => ({ sent: 0, failed: 1, invalidTokens: [] }))

    if (result.sent === 0) {
        await sendWebFallback()
    }
}

The native-fallback probe previously returned true whenever FCM was
configured AND devices were registered, which suppressed web-push for
the namespace. The HAPI Bot correctly pointed out the gap: if the FCM
pipeline silently breaks (expired service-account key, sustained 5xx,
OAuth token-fetch failure, network blackhole) the operator gets nothing
on either channel until they manually intervene.

Approach (deliberate, not the bot's exact suggested fix):

- FcmService now keeps a small rolling window (last 8 outcomes) of send
  attempts and exposes `isHealthy()`. The threshold is 5+/8 failures =
  unhealthy; the buffer starts empty so a freshly-booted hub is
  optimistic ("innocent until proven guilty") and does not double-fire
  on event #1.
- Token-fetch failure (`getFcmAccessToken` throws) now records exactly
  one health-failure (not one per device), short-circuits the send
  loop, and returns a result so `sendToNamespace` no longer leaks the
  exception.
- `invalid` token responses are explicitly excluded from the health
  buffer because they are per-device facts (rotated/uninstalled token),
  not pipeline failures - FCM was reachable, it just rejected one
  stale token.
- `buildNativeFallbackProbe` now optionally accepts the FcmService and
  short-circuits to "let web-push fire" when health is bad, before it
  even queries the device registry. The single-arg call shape is still
  supported for back-compat.

Why not the bot's exact suggestion ("invert: call FCM first, fall back
on result.sent === 0"):
- Couples PushNotificationChannel to FcmService and FcmSendPayload,
  reversing the clean parallel-channel architecture established earlier
  in this PR.
- Treats every transient single-event failure as fallback-worthy, which
  re-opens the duplicate-notification race that the suppression logic
  was added to close (FCM HTTP timeout that delivers later + the web
  push we sent in the meantime = two pings).
- A rolling health window only flips on sustained breakage, which is
  the actual operational scenario the bot is worried about.

The wrist-first design intent ("FCM fires unconditionally, web-push is
suppressed for the same namespace") documented in
docs/api/native-companion-contract.md is preserved on the happy path.
The probe only re-enables web-push when there is concrete evidence the
native pipeline is not delivering.

Tests:
- New FcmService.isHealthy suite covers empty-buffer, threshold flip,
  recovery as failures age out of the window, invalid-token exclusion,
  and network-error path.
- nativeFallbackProbe gains coverage for the unhealthy-but-registered,
  healthy-and-registered, and absent-fcmService (back-compat) cases.
- All 292 hub tests still pass; typecheck clean.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings

  • [Major] FCM health gate still drops first failed native notifications — PushNotificationChannel returns before the FCM channel has attempted delivery, while the new health gate starts healthy and stays healthy until five recorded failures. After a hub restart or credential/OAuth/network outage, the first several permission/ready/task notifications for a namespace with a registered device still skip SSE/Web Push and then fail in FCM, so the operator receives nothing. Evidence hub/src/push/pushNotificationChannel.ts:76, related health threshold hub/src/fcm/fcmService.ts:85.
    Suggested fix:
    private async deliverNativeFirst(
        session: Session,
        webPayload: PushPayload,
        fcmPayload: FcmSendPayload
    ): Promise<void> {
        const result = await this.fcmService
            .sendToNamespace(session.namespace, fcmPayload)
            .catch(() => ({ sent: 0, failed: 1, invalidTokens: [] }))
    
        if (result.sent > 0) return
        await this.deliverWebOrToast(session, webPayload)
    }

Questions

  • None.

Summary

  • Review mode: follow-up after new commits
  • One Major issue remains: the new health probe reduces prolonged notification loss, but it still suppresses web/SSE fallback for the first failed FCM notifications after startup or a fresh outage.

Testing

  • Not run (automation): bun is not installed in this runner.
  • Static check: git diff --check origin/main...HEAD passed.

HAPI Bot

// namespace the watch already covers this notification end-to-end.
// Skip the SSE in-page toast AND the web-push fallback - both are
// redundant surfaces that the operator explicitly asked us to mute.
if (this.shouldSkipWebPush(session.namespace)) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MAJOR] This still returns before the FCM channel has attempted the same notification. The new FcmService.isHealthy() gate starts healthy and remains healthy until five recorded failures, so after a hub restart or fresh credential/OAuth/network outage this branch suppresses SSE/Web Push for the first failed FCM sends and those notifications are lost.

Suggested fix:

private async deliverNativeFirst(
    session: Session,
    webPayload: PushPayload,
    fcmPayload: FcmSendPayload
): Promise<void> {
    const result = await this.fcmService
        .sendToNamespace(session.namespace, fcmPayload)
        .catch(() => ({ sent: 0, failed: 1, invalidTokens: [] }))

    if (result.sent > 0) return
    await this.deliverWebOrToast(session, webPayload)
}

@heavygee
Copy link
Copy Markdown
Collaborator Author

heavygee commented Jun 5, 2026

Status on c0217fc: both CI checks green; one open design disagreement with the bot

pr-review ✓ and test ✓ on the latest head. The remaining bot Major (#3 in this thread) is a real design trade-off rather than a code-level bug, and I'd like @tiann's call before iterating further.

What the bot wants

Invert the channel: PushNotificationChannel calls fcmService.sendToNamespace() first and falls back to web-push when result.sent === 0.

Why I implemented a health-gated probe instead

The bot's exact fix has a subtle gap that the rolling health gate is also subject to, just with different timing:

Scenario Bot's fix (sent === 0 -> fallback) Health probe (5+/8 failures -> fallback) Current PR (probe + health)
FCM happy path ✓ no duplicates ✓ no duplicates ✓ no duplicates
FCM 503 burst (transient) every event misses+fallback fires; if FCM later catches up = duplicate first ~5 events missed, then web-push covers first ~5 events missed, then web-push covers
FCM credentials expired every event has fallback; but if FCM HTTP times out and Google retries it later -> duplicate first ~5 events missed, then web-push covers first ~5 events missed, then web-push covers
FCM cold start, no history always tries FCM first (sequential) -> latency on every event optimistic, no first-event penalty optimistic, no first-event penalty
Architectural cost PushNotificationChannel now imports FcmService + FcmSendPayload; channels lose independence one shared isHealthy() signal; channels stay parallel one shared isHealthy() signal; channels stay parallel

There is no design that gives both "no first-failure missed" and "zero duplicates on FCM HTTP-timeout-then-retry-delivers." The wrist-first contract documented in docs/api/native-companion-contract.md was an explicit choice to prioritise no-duplicates over no-misses. The health probe softens the "credentials silently expired" worst case (which was the bot's original concern) without giving up the duplicate-free guarantee on the happy path.

Two questions for @tiann

  1. Are you OK with the current trade-off (~5 missed events on a sustained outage, never duplicate on transient timing weirdness), or would you rather take the bot's invert-and-fall-back approach (no first-event miss, occasional duplicate on FCM-timeout-then-Google-retries)?
  2. If you prefer the latter, happy to refactor channel orchestration up to NotificationHub so PushNotificationChannel doesn't grow an import on FcmService.

CI is green either way. Defer to your call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant