WebSocket API Reference

The ZeebeeAI WebSocket API provides real-time, bidirectional communication for chat and voice applications. WebSockets are ideal for streaming AI responses and handling voice interactions.

Authentication

Authentication for WebSocket connections is handled through an initialization message after connecting to the server. You can authenticate using either a user ID or an API key.

Session Initialization

After establishing a WebSocket connection, send an initialization message with your authentication details:

const ws = new WebSocket('wss://zeebee.ai/ws');

// Send initialization message after connection
ws.onopen = () => {
  ws.send(JSON.stringify({
    type: 'init',
    user_id: 'YOUR_USER_ID',    // Option 1: Using user_id
    api_key: 'YOUR_API_KEY',    // Option 2: Using API key (preferred for integrations)
    conversation_id: 'OPTIONAL_CONVERSATION_ID',
    model: 'gpt-4o',            // Optional: Specify model (defaults to gpt-4o)
    client_info: {}             // Optional: Additional client information
  }));
};

The server will validate your credentials and respond with an initialization acknowledgment:

{
  "type": "init_ack",
  "status": "connected",
  "user_id": "user_12345"
}

Authentication Errors

If authentication fails, you'll receive an error message:

{
  "type": "error",
  "message": "Invalid API key. Please check your credentials."
}

Or for missing authentication:

{
  "type": "error",
  "message": "Authentication required. Provide a valid user ID or API key."
}

Connection Management

Establishing a Connection

To connect to the WebSocket API:

const ws = new WebSocket('wss://zeebee.ai/ws');

ws.onopen = () => {
  console.log('Connected to ZeebeeAI WebSocket API');
  
  // Send initialization message
  ws.send(JSON.stringify({
    type: 'init',
    user_id: 'YOUR_USER_ID'  // or use api_key
  }));
};

ws.onclose = (event) => {
  console.log(`Connection closed: ${event.code} ${event.reason}`);
};

ws.onerror = (error) => {
  console.error('WebSocket error:', error);
};

Heartbeat Mechanism

The server sends ping messages every 20 seconds to maintain the connection. Clients should respond to these pings promptly:

ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  
  if (data.type === 'ping') {
    // Respond with a pong message
    ws.send(JSON.stringify({
      type: 'pong',
      ping_id: data.ping_id,
      timestamp: Date.now()
    }));
  }
  
  // Process other message types
  // ...
};

Connection Timeouts

The server will close inactive connections after 90 seconds without activity. After three missed pings, the connection will be considered inactive and closed by the server.

{
  "type": "connection_warning",
  "level": "warning",
  "message": "Your connection appears to be inactive. Please respond or send any message to keep the connection alive.",
  "time_remaining": 30,
  "reconnect_required": false,
  "ping_requested": true
}

You can also send a ping to the server to maintain the connection:

// Send ping to server
ws.send(JSON.stringify({
  type: 'ping',
  timestamp: Date.now()
}));

Message Types

All WebSocket messages use a standard JSON format with a type field and additional data:

{
  "type": "message_type",
  // Message-specific fields
}

Main Message Types

Type Direction Description
init Client → Server Initialize session with user ID or API key and optional conversation ID
init_ack Server → Client Acknowledge session initialization
connection_established Server → Client Sent immediately after client connects
audio Client → Server Send audio data for transcription and processing
transcript Server → Client Transcribed text from the audio
response Server → Client AI text response to the user's message
status Server → Client Processing status updates
audio_stream_start Server → Client Signals the start of an audio stream response
audio_chunk Server → Client Chunk of base64-encoded audio data
audio_stream_end Server → Client Signals the end of an audio stream
tts_error Server → Client Text-to-speech conversion failed
ping Both Connection heartbeat (can be sent by server or client)
pong Both Response to heartbeat ping
error Server → Client Error information

Chat Messages

WebSocket connections can be used for real-time AI chat interactions. Chat messages get stored in conversations for future reference.

Creating New Conversations

When sending the first message without a conversation_id, a new conversation will be created automatically:

// Start a new conversation
ws.send(JSON.stringify({
  type: 'audio',
  audio_data: 'BASE64_ENCODED_AUDIO',
  model: 'gpt-4o',
  stt_provider: 'openai',
  stt_options: {
    model: 'whisper-1'
  }
}));

The server will respond with the transcribed text and AI response, including a conversation_id:

// Transcription response
{
  "type": "transcript",
  "text": "What's the weather like today?"
}

// AI text response
{
  "type": "response",
  "text": "I don't have access to real-time weather data. To get the current weather, you can check a weather website or app, or look outside your window.",
  "conversation_id": "conv_12345"
}

Continuing Existing Conversations

To send messages to an existing conversation, include the conversation_id:

// Initialize with a specific conversation
ws.send(JSON.stringify({
  type: 'init',
  user_id: 'YOUR_USER_ID',
  conversation_id: 'conv_12345'
}));

// Send a message to the existing conversation
ws.send(JSON.stringify({
  type: 'audio',
  audio_data: 'BASE64_ENCODED_AUDIO',
  conversation_id: 'conv_12345',
  model: 'gpt-4o'
}));

Voice Chat

The ZeebeeAI WebSocket API supports full voice chat capabilities with speech-to-text and text-to-speech functionality.

Sending Audio Data

To send audio data for voice chat:

// Send audio data for transcription and response
ws.send(JSON.stringify({
  type: 'audio',
  audio_data: 'BASE64_ENCODED_AUDIO', // Base64-encoded audio data
  model: 'gpt-4o',                     // LLM model to use
  stt_provider: 'openai',              // Speech-to-Text provider (default: openai)
  stt_options: {                       // Speech-to-Text options
    model: 'whisper-1'                 // STT model to use
  },
  tts_provider: 'openai',              // Text-to-Speech provider (default: openai)
  tts_voice: 'alloy',                  // Voice for Text-to-Speech (default: alloy)
  tts_options: {},                     // Additional TTS options
  language_code: 'en-US'               // Language code for STT/TTS (default: en-US)
}));

Receiving Voice Responses

After sending audio data, you'll receive multiple responses in sequence:

  1. A transcript of your audio input
  2. The AI text response
  3. An audio stream of the AI response
ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  
  switch (data.type) {
    case 'transcript':
      console.log('Transcription:', data.text);
      // Update UI with transcribed text
      break;
      
    case 'response':
      console.log('AI response:', data.text);
      // Update UI with AI text response
      break;
      
    case 'audio_stream_start':
      console.log('Audio stream starting, chunks expected:', data.total_chunks);
      // Prepare audio player for streaming
      break;
      
    case 'audio_chunk':
      console.log(`Received audio chunk ${data.chunk_index}/${data.total_chunks}`);
      // Process audio chunk (e.g., add to buffer)
      const audioChunk = base64ToArrayBuffer(data.data);
      // Add chunk to audio buffer
      break;
      
    case 'audio_stream_end':
      console.log('Audio stream complete');
      // Play the complete audio
      break;
      
    case 'tts_error':
      console.error('TTS error:', data.message);
      // Handle TTS error
      break;
  }
};

Speech-to-Text (STT) Options

The API supports customizing STT behavior:

Option Description Default
stt_provider Provider to use for speech-to-text openai
stt_options.model Model for speech recognition whisper-1
language_code Language for transcription en-US

Text-to-Speech (TTS) Options

The API supports customizing TTS behavior:

Option Description Default
tts_provider Provider to use for text-to-speech openai
tts_voice Voice ID to use OpenAI: alloy
ElevenLabs: Rachel
language_code Language for TTS en-US

Language Support

The WebSocket API supports multiple languages for both speech recognition and synthesis.

Supported Languages

Language Code Language Name
en-US English (US)
en-GB English (UK)
es-ES Spanish (Spain)
fr-FR French
de-DE German
it-IT Italian
pt-BR Portuguese (Brazil)
ja-JP Japanese
ko-KR Korean
zh-CN Chinese (Simplified)

Setting the Language

Specify the language when sending audio data:

ws.send(JSON.stringify({
  type: 'audio',
  audio_data: 'BASE64_ENCODED_AUDIO',
  language_code: 'fr-FR', // Use French
  model: 'gpt-4o'
}));

Error Handling

The WebSocket API uses a consistent error format for all errors:

{
  "type": "error",
  "message": "Error message describing what went wrong"
}

Common Error Types

Error Scenario Error Message
Authentication Failure "Invalid API key. Please check your credentials."
Missing Authentication "Authentication required. Provide a valid user ID or API key."
User Not Found "You need to be logged in to use voice chat. Please login or register first."
Missing Audio Data "No audio data received. Please try recording again."
Invalid Audio Format "Invalid audio data format. Please try again."
Transcription Failure "Could not transcribe audio. Please try again."
TTS Failure "Text-to-speech conversion failed with provider [provider]."

Handling Errors

Implement proper error handling in your client:

ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  
  if (data.type === 'error') {
    console.error('Server error:', data.message);
    
    // Handle different error scenarios
    if (data.message.includes('Invalid API key')) {
      // Handle authentication error
      promptForAuthentication();
    } else if (data.message.includes('transcribe audio')) {
      // Handle transcription failure
      retryTranscription();
    } else {
      // Handle generic errors
      showErrorToUser(data.message);
    }
  }
  
  // Process other message types...
};

Examples

Complete Voice Chat Implementation

Here's a complete example of integrating voice chat with the WebSocket API:

// Initialize WebSocket connection
const ws = new WebSocket('wss://zeebee.ai/ws');
let audioContext;
let mediaRecorder;
let recordedChunks = [];
let isRecording = false;
let conversationId = null;

// Handle connection events
ws.onopen = () => {
  console.log('Connected to ZeebeeAI WebSocket server');
  
  // Authenticate
  ws.send(JSON.stringify({
    type: 'init',
    api_key: 'YOUR_API_KEY'
  }));
  
  // Enable recording button after connection
  document.getElementById('recordButton').disabled = false;
};

ws.onclose = (event) => {
  console.log(`WebSocket connection closed: ${event.code} ${event.reason}`);
  document.getElementById('recordButton').disabled = true;
};

ws.onerror = (error) => {
  console.error('WebSocket error:', error);
};

// Handle incoming messages
ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  console.log('Received message:', data.type);
  
  switch (data.type) {
    case 'init_ack':
      console.log('Authentication successful');
      document.getElementById('status').textContent = 'Connected';
      break;
      
    case 'ping':
      // Respond to server pings
      ws.send(JSON.stringify({
        type: 'pong',
        ping_id: data.ping_id,
        timestamp: Date.now()
      }));
      break;
      
    case 'transcript':
      // Show transcribed text
      document.getElementById('transcript').textContent = data.text;
      break;
      
    case 'response':
      // Save conversation ID for later use
      if (data.conversation_id) {
        conversationId = data.conversation_id;
      }
      
      // Show AI response text
      document.getElementById('response').textContent = data.text;
      break;
      
    case 'audio_stream_start':
      console.log('Starting to receive audio stream');
      // Initialize audio buffer
      initializeAudioPlayback(data.total_chunks);
      break;
      
    case 'audio_chunk':
      // Process audio chunk
      processAudioChunk(data.data, data.chunk_index);
      break;
      
    case 'audio_stream_end':
      // Finalize and play the audio
      playAudioResponse();
      break;
      
    case 'error':
      console.error('Server error:', data.message);
      document.getElementById('error').textContent = data.message;
      break;
  }
};

// Record audio and send to server
async function startRecording() {
  try {
    // Get microphone access
    const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
    
    // Initialize audio context and recorder
    audioContext = new AudioContext();
    mediaRecorder = new MediaRecorder(stream);
    recordedChunks = [];
    isRecording = true;
    
    // Update UI
    document.getElementById('status').textContent = 'Recording...';
    document.getElementById('recordButton').textContent = 'Stop';
    
    // Handle recorded data
    mediaRecorder.ondataavailable = (event) => {
      if (event.data.size > 0) {
        recordedChunks.push(event.data);
      }
    };
    
    // Handle recording stop
    mediaRecorder.onstop = async () => {
      // Create audio blob
      const audioBlob = new Blob(recordedChunks, { type: 'audio/webm' });
      
      // Convert to base64
      const base64Audio = await blobToBase64(audioBlob);
      
      // Send to server
      ws.send(JSON.stringify({
        type: 'audio',
        audio_data: base64Audio,
        model: 'gpt-4o',
        conversation_id: conversationId, // Include conversation ID if continuing a conversation
        language_code: 'en-US',
        tts_voice: 'alloy'
      }));
      
      // Update UI
      document.getElementById('status').textContent = 'Processing...';
    };
    
    // Start recording
    mediaRecorder.start();
    
  } catch (error) {
    console.error('Error starting recording:', error);
    document.getElementById('error').textContent = 'Could not access microphone.';
  }
}

// Stop recording
function stopRecording() {
  if (mediaRecorder && isRecording) {
    mediaRecorder.stop();
    isRecording = false;
    document.getElementById('recordButton').textContent = 'Record';
  }
}

// Toggle recording
function toggleRecording() {
  if (isRecording) {
    stopRecording();
  } else {
    startRecording();
  }
}

// Helper function to convert blob to base64
function blobToBase64(blob) {
  return new Promise((resolve, reject) => {
    const reader = new FileReader();
    reader.onloadend = () => resolve(reader.result.split(',')[1]);
    reader.onerror = reject;
    reader.readAsDataURL(blob);
  });
}

// Audio playback handling
let audioBuffer = [];
let totalChunksExpected = 0;

function initializeAudioPlayback(totalChunks) {
  audioBuffer = new Array(totalChunks);
  totalChunksExpected = totalChunks;
}

function processAudioChunk(base64Data, chunkIndex) {
  audioBuffer[chunkIndex] = base64Data;
}

function playAudioResponse() {
  // Check if all chunks received
  const allChunksReceived = !audioBuffer.includes(undefined);
  
  if (allChunksReceived) {
    // Combine all chunks
    const base64Audio = audioBuffer.join('');
    
    // Convert to blob
    const byteCharacters = atob(base64Audio);
    const byteNumbers = new Array(byteCharacters.length);
    
    for (let i = 0; i < byteCharacters.length; i++) {
      byteNumbers[i] = byteCharacters.charCodeAt(i);
    }
    
    const byteArray = new Uint8Array(byteNumbers);
    const audioBlob = new Blob([byteArray], { type: 'audio/mp3' });
    
    // Create audio element and play
    const audioUrl = URL.createObjectURL(audioBlob);
    const audio = new Audio(audioUrl);
    audio.play();
    
    document.getElementById('status').textContent = 'Playing response...';
  }
}

HTML Implementation

HTML to accompany the JavaScript example:

<div class="voice-chat-container">
  <h2>ZeebeeAI Voice Chat</h2>
  
  <div class="status-container">
    <p>Status: <span id="status">Connecting...</span></p>
    <p class="error" id="error"></p>
  </div>
  
  <div class="controls">
    <button id="recordButton" onclick="toggleRecording()" disabled>Record</button>
  </div>
  
  <div class="conversation">
    <div class="message-container">
      <h4>You said:</h4>
      <p id="transcript"></p>
    </div>
    
    <div class="message-container">
      <h4>Zeebee says:</h4>
      <p id="response"></p>
    </div>
  </div>
</div>