HTTP Streaming API

The ZeebeeAI HTTP Streaming API allows you to receive AI responses in real-time through a server-sent events (SSE) stream. This enables your application to display responses progressively as they are generated, rather than waiting for the entire response to be completed.

Note: Server-Sent Events (SSE) is a technology where a browser receives automatic updates from a server via HTTP connection. The connection is unidirectional; the server can send updates to the client, but the client cannot send messages to the server through this connection.

HTTP Streaming Endpoint

To make a streaming request, use the following endpoint:

POST https://api.zeebee.ai/v1/chat/completions/stream

Headers

Include the following headers in your request:

Header Value Description
Content-Type application/json Indicates that the request body is a JSON object
Authorization Bearer YOUR_API_KEY Your API key for authentication
Accept text/event-stream Indicates that the client expects a server-sent events stream

Request Body

Use the same request body format as for the regular completion endpoint:

{
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Tell me about streaming APIs"}
  ],
  "model": "gpt-4o",
  "temperature": 0.7,
  "max_tokens": 1000,
  "user_id": "unique-user-id",
  "conversation_id": "optional-conversation-id"
}

Response Format

The stream consists of a series of server-sent events. Each event is a JSON object with the following structure:

{
  "id": "event-id",
  "type": "chunk",
  "data": {
    "content": "Partial content...",
    "is_complete": false,
    "model": "gpt-4o"
  }
}

The final event in the stream will have is_complete set to true:

{
  "id": "event-id",
  "type": "chunk",
  "data": {
    "content": "Final chunk of content",
    "is_complete": true,
    "model": "gpt-4o",
    "usage": {
      "prompt_tokens": 40,
      "completion_tokens": 350,
      "total_tokens": 390
    }
  }
}

Client Implementation

JavaScript

Here's an example of how to consume the streaming API with JavaScript:

async function streamConversation() {
  const response = await fetch('https://api.zeebee.ai/v1/chat/completions/stream', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': 'Bearer YOUR_API_KEY',
      'Accept': 'text/event-stream'
    },
    body: JSON.stringify({
      messages: [
        {role: 'system', content: 'You are a helpful assistant.'},
        {role: 'user', content: 'Tell me about streaming APIs'}
      ],
      model: 'gpt-4o',
      user_id: 'user-123'
    })
  });

  // Create a new reader for the response
  const reader = response.body.getReader();
  const decoder = new TextDecoder();
  const outputElement = document.getElementById('output');
  
  while (true) {
    const {done, value} = await reader.read();
    if (done) break;
    
    // Decode the chunk and split by event delimiter
    const chunk = decoder.decode(value);
    const events = chunk.split('\\n\\n').filter(e => e.trim());
    
    for (const event of events) {
      if (event.startsWith('data: ')) {
        try {
          const data = JSON.parse(event.slice(6));
          
          // Update UI with the streamed content
          if (data.type === 'chunk') {
            outputElement.textContent += data.data.content;
            
            // If this is the last chunk, we're done
            if (data.data.is_complete) {
              console.log('Stream complete');
              console.log('Usage stats:', data.data.usage);
            }
          }
        } catch (e) {
          console.error('Failed to parse event data:', e);
        }
      }
    }
  }
}

Python

Here's an example using Python with the requests library:

import requests
import json

def stream_conversation():
    url = 'https://api.zeebee.ai/v1/chat/completions/stream'
    headers = {
        'Content-Type': 'application/json',
        'Authorization': 'Bearer YOUR_API_KEY',
        'Accept': 'text/event-stream'
    }
    data = {
        'messages': [
            {'role': 'system', 'content': 'You are a helpful assistant.'},
            {'role': 'user', 'content': 'Tell me about streaming APIs'}
        ],
        'model': 'gpt-4o',
        'user_id': 'user-123'
    }
    
    # Make request with stream=True
    response = requests.post(url, headers=headers, json=data, stream=True)
    
    # Collect full response for debugging
    full_response = ""
    
    # Process the stream
    for line in response.iter_lines():
        if line:
            line_text = line.decode('utf-8')
            
            # SSE events start with "data: "
            if line_text.startswith('data: '):
                try:
                    event_data = json.loads(line_text[6:])
                    
                    if event_data['type'] == 'chunk':
                        content = event_data['data']['content']
                        print(content, end='', flush=True)
                        full_response += content
                        
                        # Check if this is the final chunk
                        if event_data['data']['is_complete']:
                            print("\n\nStream complete")
                            if 'usage' in event_data['data']:
                                print(f"Usage stats: {event_data['data']['usage']}")
                except json.JSONDecodeError as e:
                    print(f"Failed to parse event data: {e}")
    
    return full_response

if __name__ == "__main__":
    response = stream_conversation()
    print(f"\nTotal response length: {len(response)}")

Using the SDK

Our SDK provides built-in support for streaming. See the SDK Documentation for details on how to use streaming with our client libraries.

Timeout Considerations: Be sure to implement proper timeout handling in your client code. The streaming connection may last several seconds or even minutes depending on the length of the response.

Error Handling

If an error occurs during streaming, you'll receive an event with type: "error":

{
  "id": "error-id",
  "type": "error",
  "data": {
    "code": "rate_limit_exceeded",
    "message": "You have exceeded your rate limit. Please try again later."
  }
}

After receiving an error event, the stream will be closed. Your client should handle these errors appropriately and potentially implement retry logic with backoff.

Limitations

  • Streaming is not available for all models; check the model capabilities in the Available Models section.
  • Streaming connections have a maximum duration of 5 minutes.
  • Rate limits for streaming requests may be lower than for standard API requests.

Example Applications

For complete example applications using streaming, see our Streaming Examples in the Examples Hub.