WebSocket API Reference
The ZeebeeAI WebSocket API provides real-time, bidirectional communication for chat and voice applications. WebSockets are ideal for streaming AI responses and handling voice interactions.
Authentication
Authentication for WebSocket connections is handled through an initialization message after connecting to the server. You can authenticate using either a user ID or an API key.
Session Initialization
After establishing a WebSocket connection, send an initialization message with your authentication details:
const ws = new WebSocket('wss://zeebee.ai/ws');
// Send initialization message after connection
ws.onopen = () => {
ws.send(JSON.stringify({
type: 'init',
user_id: 'YOUR_USER_ID', // Option 1: Using user_id
api_key: 'YOUR_API_KEY', // Option 2: Using API key (preferred for integrations)
conversation_id: 'OPTIONAL_CONVERSATION_ID',
model: 'gpt-4o', // Optional: Specify model (defaults to gpt-4o)
client_info: {} // Optional: Additional client information
}));
};
The server will validate your credentials and respond with an initialization acknowledgment:
{
"type": "init_ack",
"status": "connected",
"user_id": "user_12345"
}
Authentication Errors
If authentication fails, you'll receive an error message:
{
"type": "error",
"message": "Invalid API key. Please check your credentials."
}
Or for missing authentication:
{
"type": "error",
"message": "Authentication required. Provide a valid user ID or API key."
}
Connection Management
Establishing a Connection
To connect to the WebSocket API:
const ws = new WebSocket('wss://zeebee.ai/ws');
ws.onopen = () => {
console.log('Connected to ZeebeeAI WebSocket API');
// Send initialization message
ws.send(JSON.stringify({
type: 'init',
user_id: 'YOUR_USER_ID' // or use api_key
}));
};
ws.onclose = (event) => {
console.log(`Connection closed: ${event.code} ${event.reason}`);
};
ws.onerror = (error) => {
console.error('WebSocket error:', error);
};
Heartbeat Mechanism
The server sends ping messages every 20 seconds to maintain the connection. Clients should respond to these pings promptly:
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
if (data.type === 'ping') {
// Respond with a pong message
ws.send(JSON.stringify({
type: 'pong',
ping_id: data.ping_id,
timestamp: Date.now()
}));
}
// Process other message types
// ...
};
Connection Timeouts
The server will close inactive connections after 90 seconds without activity. After three missed pings, the connection will be considered inactive and closed by the server.
{
"type": "connection_warning",
"level": "warning",
"message": "Your connection appears to be inactive. Please respond or send any message to keep the connection alive.",
"time_remaining": 30,
"reconnect_required": false,
"ping_requested": true
}
You can also send a ping to the server to maintain the connection:
// Send ping to server
ws.send(JSON.stringify({
type: 'ping',
timestamp: Date.now()
}));
Message Types
All WebSocket messages use a standard JSON format with a
type
field and additional data:
{
"type": "message_type",
// Message-specific fields
}
Main Message Types
Type | Direction | Description |
---|---|---|
init |
Client → Server | Initialize session with user ID or API key and optional conversation ID |
init_ack |
Server → Client | Acknowledge session initialization |
connection_established |
Server → Client | Sent immediately after client connects |
audio |
Client → Server | Send audio data for transcription and processing |
transcript |
Server → Client | Transcribed text from the audio |
response |
Server → Client | AI text response to the user's message |
status |
Server → Client | Processing status updates |
audio_stream_start |
Server → Client | Signals the start of an audio stream response |
audio_chunk |
Server → Client | Chunk of base64-encoded audio data |
audio_stream_end |
Server → Client | Signals the end of an audio stream |
tts_error |
Server → Client | Text-to-speech conversion failed |
ping |
Both | Connection heartbeat (can be sent by server or client) |
pong |
Both | Response to heartbeat ping |
error |
Server → Client | Error information |
Chat Messages
WebSocket connections can be used for real-time AI chat interactions. Chat messages get stored in conversations for future reference.
Creating New Conversations
When sending the first message without a conversation_id, a new conversation will be created automatically:
// Start a new conversation
ws.send(JSON.stringify({
type: 'audio',
audio_data: 'BASE64_ENCODED_AUDIO',
model: 'gpt-4o',
stt_provider: 'openai',
stt_options: {
model: 'whisper-1'
}
}));
The server will respond with the transcribed text and AI response, including a conversation_id:
// Transcription response
{
"type": "transcript",
"text": "What's the weather like today?"
}
// AI text response
{
"type": "response",
"text": "I don't have access to real-time weather data. To get the current weather, you can check a weather website or app, or look outside your window.",
"conversation_id": "conv_12345"
}
Continuing Existing Conversations
To send messages to an existing conversation, include the conversation_id:
// Initialize with a specific conversation
ws.send(JSON.stringify({
type: 'init',
user_id: 'YOUR_USER_ID',
conversation_id: 'conv_12345'
}));
// Send a message to the existing conversation
ws.send(JSON.stringify({
type: 'audio',
audio_data: 'BASE64_ENCODED_AUDIO',
conversation_id: 'conv_12345',
model: 'gpt-4o'
}));
Voice Chat
The ZeebeeAI WebSocket API supports full voice chat capabilities with speech-to-text and text-to-speech functionality.
Sending Audio Data
To send audio data for voice chat:
// Send audio data for transcription and response
ws.send(JSON.stringify({
type: 'audio',
audio_data: 'BASE64_ENCODED_AUDIO', // Base64-encoded audio data
model: 'gpt-4o', // LLM model to use
stt_provider: 'openai', // Speech-to-Text provider (default: openai)
stt_options: { // Speech-to-Text options
model: 'whisper-1' // STT model to use
},
tts_provider: 'openai', // Text-to-Speech provider (default: openai)
tts_voice: 'alloy', // Voice for Text-to-Speech (default: alloy)
tts_options: {}, // Additional TTS options
language_code: 'en-US' // Language code for STT/TTS (default: en-US)
}));
Receiving Voice Responses
After sending audio data, you'll receive multiple responses in sequence:
- A transcript of your audio input
- The AI text response
- An audio stream of the AI response
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
switch (data.type) {
case 'transcript':
console.log('Transcription:', data.text);
// Update UI with transcribed text
break;
case 'response':
console.log('AI response:', data.text);
// Update UI with AI text response
break;
case 'audio_stream_start':
console.log('Audio stream starting, chunks expected:', data.total_chunks);
// Prepare audio player for streaming
break;
case 'audio_chunk':
console.log(`Received audio chunk ${data.chunk_index}/${data.total_chunks}`);
// Process audio chunk (e.g., add to buffer)
const audioChunk = base64ToArrayBuffer(data.data);
// Add chunk to audio buffer
break;
case 'audio_stream_end':
console.log('Audio stream complete');
// Play the complete audio
break;
case 'tts_error':
console.error('TTS error:', data.message);
// Handle TTS error
break;
}
};
Speech-to-Text (STT) Options
The API supports customizing STT behavior:
Option | Description | Default |
---|---|---|
stt_provider |
Provider to use for speech-to-text | openai |
stt_options.model |
Model for speech recognition | whisper-1 |
language_code |
Language for transcription | en-US |
Text-to-Speech (TTS) Options
The API supports customizing TTS behavior:
Option | Description | Default |
---|---|---|
tts_provider |
Provider to use for text-to-speech | openai |
tts_voice |
Voice ID to use |
OpenAI: alloy ElevenLabs: Rachel
|
language_code |
Language for TTS | en-US |
Language Support
The WebSocket API supports multiple languages for both speech recognition and synthesis.
Supported Languages
Language Code | Language Name |
---|---|
en-US |
English (US) |
en-GB |
English (UK) |
es-ES |
Spanish (Spain) |
fr-FR |
French |
de-DE |
German |
it-IT |
Italian |
pt-BR |
Portuguese (Brazil) |
ja-JP |
Japanese |
ko-KR |
Korean |
zh-CN |
Chinese (Simplified) |
Setting the Language
Specify the language when sending audio data:
ws.send(JSON.stringify({
type: 'audio',
audio_data: 'BASE64_ENCODED_AUDIO',
language_code: 'fr-FR', // Use French
model: 'gpt-4o'
}));
Error Handling
The WebSocket API uses a consistent error format for all errors:
{
"type": "error",
"message": "Error message describing what went wrong"
}
Common Error Types
Error Scenario | Error Message |
---|---|
Authentication Failure | "Invalid API key. Please check your credentials." |
Missing Authentication | "Authentication required. Provide a valid user ID or API key." |
User Not Found | "You need to be logged in to use voice chat. Please login or register first." |
Missing Audio Data | "No audio data received. Please try recording again." |
Invalid Audio Format | "Invalid audio data format. Please try again." |
Transcription Failure | "Could not transcribe audio. Please try again." |
TTS Failure | "Text-to-speech conversion failed with provider [provider]." |
Handling Errors
Implement proper error handling in your client:
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
if (data.type === 'error') {
console.error('Server error:', data.message);
// Handle different error scenarios
if (data.message.includes('Invalid API key')) {
// Handle authentication error
promptForAuthentication();
} else if (data.message.includes('transcribe audio')) {
// Handle transcription failure
retryTranscription();
} else {
// Handle generic errors
showErrorToUser(data.message);
}
}
// Process other message types...
};
Examples
Complete Voice Chat Implementation
Here's a complete example of integrating voice chat with the WebSocket API:
// Initialize WebSocket connection
const ws = new WebSocket('wss://zeebee.ai/ws');
let audioContext;
let mediaRecorder;
let recordedChunks = [];
let isRecording = false;
let conversationId = null;
// Handle connection events
ws.onopen = () => {
console.log('Connected to ZeebeeAI WebSocket server');
// Authenticate
ws.send(JSON.stringify({
type: 'init',
api_key: 'YOUR_API_KEY'
}));
// Enable recording button after connection
document.getElementById('recordButton').disabled = false;
};
ws.onclose = (event) => {
console.log(`WebSocket connection closed: ${event.code} ${event.reason}`);
document.getElementById('recordButton').disabled = true;
};
ws.onerror = (error) => {
console.error('WebSocket error:', error);
};
// Handle incoming messages
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
console.log('Received message:', data.type);
switch (data.type) {
case 'init_ack':
console.log('Authentication successful');
document.getElementById('status').textContent = 'Connected';
break;
case 'ping':
// Respond to server pings
ws.send(JSON.stringify({
type: 'pong',
ping_id: data.ping_id,
timestamp: Date.now()
}));
break;
case 'transcript':
// Show transcribed text
document.getElementById('transcript').textContent = data.text;
break;
case 'response':
// Save conversation ID for later use
if (data.conversation_id) {
conversationId = data.conversation_id;
}
// Show AI response text
document.getElementById('response').textContent = data.text;
break;
case 'audio_stream_start':
console.log('Starting to receive audio stream');
// Initialize audio buffer
initializeAudioPlayback(data.total_chunks);
break;
case 'audio_chunk':
// Process audio chunk
processAudioChunk(data.data, data.chunk_index);
break;
case 'audio_stream_end':
// Finalize and play the audio
playAudioResponse();
break;
case 'error':
console.error('Server error:', data.message);
document.getElementById('error').textContent = data.message;
break;
}
};
// Record audio and send to server
async function startRecording() {
try {
// Get microphone access
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
// Initialize audio context and recorder
audioContext = new AudioContext();
mediaRecorder = new MediaRecorder(stream);
recordedChunks = [];
isRecording = true;
// Update UI
document.getElementById('status').textContent = 'Recording...';
document.getElementById('recordButton').textContent = 'Stop';
// Handle recorded data
mediaRecorder.ondataavailable = (event) => {
if (event.data.size > 0) {
recordedChunks.push(event.data);
}
};
// Handle recording stop
mediaRecorder.onstop = async () => {
// Create audio blob
const audioBlob = new Blob(recordedChunks, { type: 'audio/webm' });
// Convert to base64
const base64Audio = await blobToBase64(audioBlob);
// Send to server
ws.send(JSON.stringify({
type: 'audio',
audio_data: base64Audio,
model: 'gpt-4o',
conversation_id: conversationId, // Include conversation ID if continuing a conversation
language_code: 'en-US',
tts_voice: 'alloy'
}));
// Update UI
document.getElementById('status').textContent = 'Processing...';
};
// Start recording
mediaRecorder.start();
} catch (error) {
console.error('Error starting recording:', error);
document.getElementById('error').textContent = 'Could not access microphone.';
}
}
// Stop recording
function stopRecording() {
if (mediaRecorder && isRecording) {
mediaRecorder.stop();
isRecording = false;
document.getElementById('recordButton').textContent = 'Record';
}
}
// Toggle recording
function toggleRecording() {
if (isRecording) {
stopRecording();
} else {
startRecording();
}
}
// Helper function to convert blob to base64
function blobToBase64(blob) {
return new Promise((resolve, reject) => {
const reader = new FileReader();
reader.onloadend = () => resolve(reader.result.split(',')[1]);
reader.onerror = reject;
reader.readAsDataURL(blob);
});
}
// Audio playback handling
let audioBuffer = [];
let totalChunksExpected = 0;
function initializeAudioPlayback(totalChunks) {
audioBuffer = new Array(totalChunks);
totalChunksExpected = totalChunks;
}
function processAudioChunk(base64Data, chunkIndex) {
audioBuffer[chunkIndex] = base64Data;
}
function playAudioResponse() {
// Check if all chunks received
const allChunksReceived = !audioBuffer.includes(undefined);
if (allChunksReceived) {
// Combine all chunks
const base64Audio = audioBuffer.join('');
// Convert to blob
const byteCharacters = atob(base64Audio);
const byteNumbers = new Array(byteCharacters.length);
for (let i = 0; i < byteCharacters.length; i++) {
byteNumbers[i] = byteCharacters.charCodeAt(i);
}
const byteArray = new Uint8Array(byteNumbers);
const audioBlob = new Blob([byteArray], { type: 'audio/mp3' });
// Create audio element and play
const audioUrl = URL.createObjectURL(audioBlob);
const audio = new Audio(audioUrl);
audio.play();
document.getElementById('status').textContent = 'Playing response...';
}
}
HTML Implementation
HTML to accompany the JavaScript example:
<div class="voice-chat-container">
<h2>ZeebeeAI Voice Chat</h2>
<div class="status-container">
<p>Status: <span id="status">Connecting...</span></p>
<p class="error" id="error"></p>
</div>
<div class="controls">
<button id="recordButton" onclick="toggleRecording()" disabled>Record</button>
</div>
<div class="conversation">
<div class="message-container">
<h4>You said:</h4>
<p id="transcript"></p>
</div>
<div class="message-container">
<h4>Zeebee says:</h4>
<p id="response"></p>
</div>
</div>
</div>