HTTP Streaming API

The ZeebeeAI HTTP Streaming API allows you to receive AI responses in real-time through a server-sent events (SSE) stream. This enables your application to display responses progressively as they are generated, rather than waiting for the entire response to be completed.

Note: Server-Sent Events (SSE) is a technology where a browser receives automatic updates from a server via HTTP connection. The connection is unidirectional; the server can send updates to the client, but the client cannot send messages to the server through this connection.

HTTP Streaming Endpoint

To make a streaming request, use the following endpoint:

POST https://api.zeebee.ai/v1/chat/completions/stream

Headers

Include the following headers in your request:

Header	Value	Description
`Content-Type`	`application/json`	Specifies that the request body is JSON
`Authorization`	`Bearer YOUR_API_KEY`	Your API key for authentication
`Accept`	`text/event-stream`	Indicates that you want to receive a stream of events

Request Body

The request body is the same as for non-streaming requests:

{
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "Tell me about HTTP streaming."
    }
  ],
  "model": "gpt-4-turbo",
  "temperature": 0.7,
  "stream": true
}

Notice the stream: true parameter, which is required for streaming responses.

Response Format

The streaming response will be delivered as a series of server-sent events, where each event contains a chunk of the AI's response.

Event Structure

Each event in the stream follows this format:

data: {"id": "msg_123", "object": "chat.completion.chunk", "created": 1677825456, "model": "gpt-4-turbo", "choices": [{"delta": {"content": "Hello"}, "index": 0, "finish_reason": null}]}

data: {"id": "msg_123", "object": "chat.completion.chunk", "created": 1677825456, "model": "gpt-4-turbo", "choices": [{"delta": {"content": " world"}, "index": 0, "finish_reason": null}]}

data: {"id": "msg_123", "object": "chat.completion.chunk", "created": 1677825456, "model": "gpt-4-turbo", "choices": [{"delta": {"content": "!"}, "index": 0, "finish_reason": null}]}

data: {"id": "msg_123", "object": "chat.completion.chunk", "created": 1677825456, "model": "gpt-4-turbo", "choices": [{"delta": {}, "index": 0, "finish_reason": "stop"}]}

data: [DONE]

Understanding Streaming Responses

Each line starting with data: contains a JSON object with a piece of the response.
The delta field contains the new content being added to the response.
The finish_reason will be null until the response is complete.
The final event has an empty delta and finish_reason set to "stop".
The stream ends with data: [DONE].

Important: To reconstruct the complete message, concatenate all the delta.content values in order.

Example: JavaScript Client

Here's an example of how to consume the streaming API using JavaScript:

async function streamCompletion() {
  const apiKey = 'your-api-key';
  const response = await fetch('https://api.zeebee.ai/v1/chat/completions/stream', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': `Bearer ${apiKey}`,
      'Accept': 'text/event-stream'
    },
    body: JSON.stringify({
      messages: [
        { role: 'system', content: 'You are a helpful assistant.' },
        { role: 'user', content: 'Tell me a short story.' }
      ],
      model: 'gpt-4-turbo',
      temperature: 0.7,
      stream: true
    })
  });

  // Create a reader for the response body stream
  const reader = response.body.getReader();
  const decoder = new TextDecoder();
  const outputElement = document.getElementById('output');
  
  let buffer = '';
  
  while (true) {
    const { value, done } = await reader.read();
    if (done) break;
    
    // Decode the received bytes to text
    const chunk = decoder.decode(value, { stream: true });
    buffer += chunk;
    
    // Process complete lines in the buffer
    let lineEnd;
    while ((lineEnd = buffer.indexOf('\n')) >= 0) {
      const line = buffer.slice(0, lineEnd).trim();
      buffer = buffer.slice(lineEnd + 1);
      
      if (line.startsWith('data: ')) {
        const data = line.slice(6);
        
        if (data === '[DONE]') {
          console.log('Stream completed');
          break;
        }
        
        try {
          const json = JSON.parse(data);
          const content = json.choices[0].delta.content || '';
          
          if (content) {
            // Append the new content to the output
            outputElement.textContent += content;
          }
        } catch (error) {
          console.error('Error parsing JSON:', error);
        }
      }
    }
  }
}

Handling Errors

If an error occurs during the streaming process, the server will send an error event:

data: {"error": {"message": "Error message", "type": "error_type", "code": "error_code"}}

Common error scenarios include:

Authentication failures
Rate limiting
Invalid request parameters
Server-side issues

Comparing Streaming vs. Non-Streaming

Feature	Streaming API	Non-Streaming API
Perceived latency	Lower (responses appear immediately)	Higher (wait for complete response)
Connection duration	Longer (open for entire generation)	Shorter (single request/response)
Implementation complexity	Higher (stream handling required)	Lower (standard HTTP request)
User experience	More interactive, typing-like effect	Complete responses appear at once

Limits and Considerations

The connection will time out after 60 seconds if no chunks are available to send.
If you lose connection, you'll need to start a new request.
Streaming counts toward your API usage limits in the same way as non-streaming requests.
For mobile apps or low-bandwidth connections, consider whether streaming is beneficial.

Supported Models

HTTP streaming is supported by all LLM models offered on the ZeebeeAI platform, including:

gpt-4-turbo
gpt-3.5-turbo
claude-3-opus
claude-3-sonnet
claude-3-haiku
gemini-pro