HTTP Streaming API

The ZeebeeAI HTTP Streaming API allows you to receive AI responses in real-time through a server-sent events (SSE) stream. This enables your application to display responses progressively as they are generated, rather than waiting for the entire response to be completed.

Note: Server-Sent Events (SSE) is a technology where a browser receives automatic updates from a server via HTTP connection. The connection is unidirectional; the server can send updates to the client, but the client cannot send messages to the server through this connection.

HTTP Streaming Endpoint

To make a streaming request, use the following endpoint:

POST https://api.zeebee.ai/v1/chat/completions/stream

Headers

Include the following headers in your request:

Header Value Description
Content-Type application/json Specifies that the request body is JSON
Authorization Bearer YOUR_API_KEY Your API key for authentication
Accept text/event-stream Indicates that you want to receive a stream of events

Request Body

The request body is the same as for non-streaming requests:

{
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "Tell me about HTTP streaming."
    }
  ],
  "model": "gpt-4-turbo",
  "temperature": 0.7,
  "stream": true
}

Notice the stream: true parameter, which is required for streaming responses.

Response Format

The streaming response will be delivered as a series of server-sent events, where each event contains a chunk of the AI's response.

Event Structure

Each event in the stream follows this format:

data: {"id": "msg_123", "object": "chat.completion.chunk", "created": 1677825456, "model": "gpt-4-turbo", "choices": [{"delta": {"content": "Hello"}, "index": 0, "finish_reason": null}]}

data: {"id": "msg_123", "object": "chat.completion.chunk", "created": 1677825456, "model": "gpt-4-turbo", "choices": [{"delta": {"content": " world"}, "index": 0, "finish_reason": null}]}

data: {"id": "msg_123", "object": "chat.completion.chunk", "created": 1677825456, "model": "gpt-4-turbo", "choices": [{"delta": {"content": "!"}, "index": 0, "finish_reason": null}]}

data: {"id": "msg_123", "object": "chat.completion.chunk", "created": 1677825456, "model": "gpt-4-turbo", "choices": [{"delta": {}, "index": 0, "finish_reason": "stop"}]}

data: [DONE]

Understanding Streaming Responses

Important: To reconstruct the complete message, concatenate all the delta.content values in order.

Example: JavaScript Client

Here's an example of how to consume the streaming API using JavaScript:

async function streamCompletion() {
  const apiKey = 'your-api-key';
  const response = await fetch('https://api.zeebee.ai/v1/chat/completions/stream', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': `Bearer ${apiKey}`,
      'Accept': 'text/event-stream'
    },
    body: JSON.stringify({
      messages: [
        { role: 'system', content: 'You are a helpful assistant.' },
        { role: 'user', content: 'Tell me a short story.' }
      ],
      model: 'gpt-4-turbo',
      temperature: 0.7,
      stream: true
    })
  });

  // Create a reader for the response body stream
  const reader = response.body.getReader();
  const decoder = new TextDecoder();
  const outputElement = document.getElementById('output');
  
  let buffer = '';
  
  while (true) {
    const { value, done } = await reader.read();
    if (done) break;
    
    // Decode the received bytes to text
    const chunk = decoder.decode(value, { stream: true });
    buffer += chunk;
    
    // Process complete lines in the buffer
    let lineEnd;
    while ((lineEnd = buffer.indexOf('\n')) >= 0) {
      const line = buffer.slice(0, lineEnd).trim();
      buffer = buffer.slice(lineEnd + 1);
      
      if (line.startsWith('data: ')) {
        const data = line.slice(6);
        
        if (data === '[DONE]') {
          console.log('Stream completed');
          break;
        }
        
        try {
          const json = JSON.parse(data);
          const content = json.choices[0].delta.content || '';
          
          if (content) {
            // Append the new content to the output
            outputElement.textContent += content;
          }
        } catch (error) {
          console.error('Error parsing JSON:', error);
        }
      }
    }
  }
}

Handling Errors

If an error occurs during the streaming process, the server will send an error event:

data: {"error": {"message": "Error message", "type": "error_type", "code": "error_code"}}

Common error scenarios include:

Comparing Streaming vs. Non-Streaming

Feature Streaming API Non-Streaming API
Perceived latency Lower (responses appear immediately) Higher (wait for complete response)
Connection duration Longer (open for entire generation) Shorter (single request/response)
Implementation complexity Higher (stream handling required) Lower (standard HTTP request)
User experience More interactive, typing-like effect Complete responses appear at once

Limits and Considerations

Supported Models

HTTP streaming is supported by all LLM models offered on the ZeebeeAI platform, including: