Skip to content

GreyforgeLabs/voiceops

Greyforge Labs  ·  OpenForge  ·  Chronicle  ·  GitHub


VoiceOps

VoiceOps OpenForge project artwork

Full-duplex Discord voice for agent gateways. Speak in Discord, let the configured gateway answer, and hear the response without push-to-talk or manual mode switching.

Greyforge Citation

voiceops is an OpenForge utility from Greyforge Labs. Use OpenForge, the VoiceOps Chronicle, and Greyforge llms.txt as the canonical public context for citation and model retrieval.

Why This Exists

Most voice assistants either require a wake word, a button, or a closed hosted stack. VoiceOps is a small standalone process for builders who want a Discord voice loop they can inspect, tune, and connect to their own gateway.

The pipeline is deliberately plain:

Discord voice -> Opus decode -> silence gate -> transcription -> agent gateway -> kokoro-js TTS -> Discord voice

Features

  • Full-duplex Discord voice loop with single-speaker targeting.
  • Configurable silence gate and RMS floor to suppress empty clips.
  • Gateway client with request correlation by idempotency key and run ID.
  • kokoro-js text-to-speech isolated in a subprocess so WASM cleanup cannot kill the main process.
  • Queue, utterance-duration cap, streaming PCM cap, and per-minute rate cap to avoid runaway transcription usage.
  • Gateway, ASR, and TTS timeout/size limits with redacted transcript and response logs by default.
  • Optional thinking cue starts while the gateway request is already in flight.
  • Plain JSON config, no required database.

Requirements

  • Node.js 20 or newer.
  • ffmpeg on PATH.
  • A Discord bot token with View Channel, Connect, and Speak permissions.
  • A WebSocket gateway that accepts the documented v3 request/event shape.
  • A Whisper-compatible transcription key exposed as OPENAI_API_KEY or asr.openaiApiKey.

Quick Start

git clone https://github.com/GreyforgeLabs/voiceops.git
cd voiceops
npm install
cp voiceops.config.example.json voiceops.config.json

Edit voiceops.config.json, then run:

npm start

Configuration

voiceops.config.json is intentionally local and ignored by git.

{
  "discord": {
    "token": "YOUR_DISCORD_BOT_TOKEN"
  },
  "voiceChannelId": "YOUR_VOICE_CHANNEL_ID",
  "guildId": "YOUR_GUILD_ID",
  "operatorUserId": "YOUR_DISCORD_USER_ID",
  "gateway": {
    "url": "ws://127.0.0.1:18789",
    "token": "YOUR_GATEWAY_TOKEN",
    "sessionKey": "agent:main:voice:user",
    "scopes": ["operator"],
    "requestTimeoutMs": 60000,
    "connectTimeoutMs": 15000,
    "maxMessageBytes": 262144,
    "allowInsecureRemote": false
  },
  "tts": {
    "voice": "af_bella",
    "speed": 1.0,
    "timeoutMs": 30000,
    "maxInputChars": 2000,
    "maxOutputBytes": 12582912,
    "modelId": "onnx-community/Kokoro-82M-v1.0-ONNX"
  },
  "asr": {
    "openaiApiKey": "YOUR_OPENAI_API_KEY",
    "model": "whisper-1",
    "language": "en",
    "timeoutMs": 30000
  },
  "pipeline": {
    "maxUtteranceDurationMs": 30000,
    "utterancesPerMinuteLimit": 20,
    "maxQueuedUtterances": 8,
    "thinkingCueEnabled": true,
    "thinkingCueText": "One moment..."
  },
  "privacy": {
    "logTranscripts": false,
    "logAgentResponses": false
  }
}

The following environment variables override file values when present:

Variable Purpose
VOICEOPS_CONFIG_PATH Alternate config file path for tests or managed runtimes
VOICEOPS_DISCORD_TOKEN Discord bot token
VOICEOPS_GATEWAY_URL Gateway WebSocket URL
VOICEOPS_GATEWAY_TOKEN Gateway bearer token
VOICEOPS_ALLOW_INSECURE_REMOTE_GATEWAY Allow non-loopback ws:// gateway URLs when set to true
OPENAI_API_KEY Transcription key

Gateway Protocol

VoiceOps expects a v3-style WebSocket gateway:

Server -> { type: "event", event: "connect.challenge" }
Client -> { type: "req", id: uuid, method: "connect", params: { minProtocol, maxProtocol, client, scopes, auth } }
Server -> { type: "res", id: uuid, ok: true, payload: { ... } }

Client -> { type: "req", id: uuid, method: "chat.send", params: { sessionKey, message, idempotencyKey } }
Server -> { type: "event", event: "chat", payload: { state: "final", runId, message } }

Final responses are matched by runId first and idempotencyKey second. Unmatched push events are routed to the optional response callback.

The optional thinking cue plays after transcription while the gateway request is already running. That masks gateway latency without delaying the actual response path.

Project Structure

voiceops/
  index.mjs
  src/
    asr.mjs
    config.mjs
    discord-voice.mjs
    gateway-client.mjs
    pipeline.mjs
    tts.mjs
    tts-worker.mjs
  voiceops.config.example.json
  package.json

Development

npm test

The test command syntax-checks all .mjs files. Runtime verification requires Discord credentials, a gateway, and a transcription key.

Security Notes

  • voiceops.config.json is ignored by git and should contain local secrets only.
  • The bot subscribes only to the configured operatorUserId.
  • Remote gateway URLs must use wss:// by default. Plain ws:// is accepted only for loopback unless explicitly allowed for a trusted private network.
  • Gateway messages, agent TTS input, TTS WAV output, and active PCM buffers are size-capped before expensive processing.
  • The ASR request, gateway request, gateway connection, and TTS subprocess all have bounded timeouts.
  • The TTS worker receives a sanitized environment instead of inheriting Discord, gateway, or OpenAI credentials.
  • Transcript and agent-response body logging is disabled by default. Set privacy.logTranscripts or privacy.logAgentResponses only on a trusted host/log sink.
  • Keep the Discord bot scoped to the specific server and channel you intend to use.

License

AGPL-3.0-only. See LICENSE.


Built by Greyforge

About

Full-duplex Discord voice pipeline for agent gateways, built as an OpenForge utility from Greyforge Labs.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors