Version: 1.1
Status: Implemented (Core)
Component: Voice Gateway
Source: src/voice/
1. Overview
The Voice Platform enables real-time, bidirectional voice interaction with Jorvis. It supports both WebSocket (Server-Side) and WebRTC (Client-Side) flows, utilizing the RealtimeGateway to manage audio streams and the AudioTranscoderService to handle format conversion.
2. Core Components
2.1 Realtime Gateway (src/voice/realtime.gateway.ts)
Protocol: WebSocket (/v1/realtime)
Responsibilities:
Manages persistent connection.
Handles audio_chunk events (binary).
Emits transcript and audio_response events.
2.2 Gemini Live Service (src/voice/gemini-live.service.ts)
Role: Direct integration with Google Gemini Live API (Multimodal).
Flow:
Streams user audio chunks directly to Gemini.
Receives streaming text/audio response chunks.
Low latency (<500ms).
2.3 Audio Transcoder (src/voice/audio-transcoder.service.ts)
Input: WebM / Ogg Opus (Browser Default).
Output: Linear16 PCM 24kHz (Required by Gemini).
Library: ffmpeg / prism-media.
2.4 Voice Intent Router (src/voice/voice-intent-router.service.ts)
Role: Classifies incoming transcripts to intelligently route requests between Conversational and OpenClaw execution paths.
3. Data Flow
Connection: Frontend connects to ws://api.jorvis.io/v1/realtime.
Streaming:
Frontend records microphone → Sends Blob.
Gateway → AudioTranscoder → GeminiLiveService.
Response:
Gemini Stream → Gateway keys out audio events.
Frontend plays audio buffer queue.
4. Protocols
4.1 Client Events (Inbound)
Event Payload Description start_session{ config: VoiceConfig }Init session params audio_chunkArrayBufferRaw audio data stop_session{}End stream
4.2 Server Events (Outbound)
Event Payload Description transcript{ text: string, type: 'user'|'agent' }Real-time text audio_chunkArrayBufferResponse audio to play state{ state: 'listening'|'thinking'|'speaking' }UI feedback state
5. Security
Auth: Standard Bearer Token via WebSocket Handshake query param ?token=....
Rate Limit: Enforced by connection duration (max 5 min/session).
6. Implementation Status (v0.7.0)
Feature Status Notes WebSocket Gateway ✅ Active Support for basic audio streaming Gemini Live ✅ Active Primary voice engine Transcoding ✅ Active Robust ffmpeg integration VAD (Voice Detection) ⚠️ Partial Relying on Gemini's internal VAD