HTTP Streaming API
The ZeebeeAI HTTP Streaming API allows you to receive AI responses in real-time through a server-sent events (SSE) stream. This enables your application to display responses progressively as they are generated, rather than waiting for the entire response to be completed.
HTTP Streaming Endpoint
To make a streaming request, use the following endpoint:
POST https://api.zeebee.ai/v1/chat/completions/stream
Headers
Include the following headers in your request:
Header | Value | Description |
---|---|---|
Content-Type |
application/json |
Indicates that the request body is a JSON object |
Authorization |
Bearer YOUR_API_KEY |
Your API key for authentication |
Accept |
text/event-stream |
Indicates that the client expects a server-sent events stream |
Request Body
Use the same request body format as for the regular completion endpoint:
{
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Tell me about streaming APIs"}
],
"model": "gpt-4o",
"temperature": 0.7,
"max_tokens": 1000,
"user_id": "unique-user-id",
"conversation_id": "optional-conversation-id"
}
Response Format
The stream consists of a series of server-sent events. Each event is a JSON object with the following structure:
{
"id": "event-id",
"type": "chunk",
"data": {
"content": "Partial content...",
"is_complete": false,
"model": "gpt-4o"
}
}
The final event in the stream will have is_complete
set
to true
:
{
"id": "event-id",
"type": "chunk",
"data": {
"content": "Final chunk of content",
"is_complete": true,
"model": "gpt-4o",
"usage": {
"prompt_tokens": 40,
"completion_tokens": 350,
"total_tokens": 390
}
}
}
Client Implementation
JavaScript
Here's an example of how to consume the streaming API with JavaScript:
async function streamConversation() {
const response = await fetch('https://api.zeebee.ai/v1/chat/completions/stream', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'Bearer YOUR_API_KEY',
'Accept': 'text/event-stream'
},
body: JSON.stringify({
messages: [
{role: 'system', content: 'You are a helpful assistant.'},
{role: 'user', content: 'Tell me about streaming APIs'}
],
model: 'gpt-4o',
user_id: 'user-123'
})
});
// Create a new reader for the response
const reader = response.body.getReader();
const decoder = new TextDecoder();
const outputElement = document.getElementById('output');
while (true) {
const {done, value} = await reader.read();
if (done) break;
// Decode the chunk and split by event delimiter
const chunk = decoder.decode(value);
const events = chunk.split('\\n\\n').filter(e => e.trim());
for (const event of events) {
if (event.startsWith('data: ')) {
try {
const data = JSON.parse(event.slice(6));
// Update UI with the streamed content
if (data.type === 'chunk') {
outputElement.textContent += data.data.content;
// If this is the last chunk, we're done
if (data.data.is_complete) {
console.log('Stream complete');
console.log('Usage stats:', data.data.usage);
}
}
} catch (e) {
console.error('Failed to parse event data:', e);
}
}
}
}
}
Python
Here's an example using Python with the requests library:
import requests
import json
def stream_conversation():
url = 'https://api.zeebee.ai/v1/chat/completions/stream'
headers = {
'Content-Type': 'application/json',
'Authorization': 'Bearer YOUR_API_KEY',
'Accept': 'text/event-stream'
}
data = {
'messages': [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': 'Tell me about streaming APIs'}
],
'model': 'gpt-4o',
'user_id': 'user-123'
}
# Make request with stream=True
response = requests.post(url, headers=headers, json=data, stream=True)
# Collect full response for debugging
full_response = ""
# Process the stream
for line in response.iter_lines():
if line:
line_text = line.decode('utf-8')
# SSE events start with "data: "
if line_text.startswith('data: '):
try:
event_data = json.loads(line_text[6:])
if event_data['type'] == 'chunk':
content = event_data['data']['content']
print(content, end='', flush=True)
full_response += content
# Check if this is the final chunk
if event_data['data']['is_complete']:
print("\n\nStream complete")
if 'usage' in event_data['data']:
print(f"Usage stats: {event_data['data']['usage']}")
except json.JSONDecodeError as e:
print(f"Failed to parse event data: {e}")
return full_response
if __name__ == "__main__":
response = stream_conversation()
print(f"\nTotal response length: {len(response)}")
Using the SDK
Our SDK provides built-in support for streaming. See the SDK Documentation for details on how to use streaming with our client libraries.
Error Handling
If an error occurs during streaming, you'll receive an event with
type: "error"
:
{
"id": "error-id",
"type": "error",
"data": {
"code": "rate_limit_exceeded",
"message": "You have exceeded your rate limit. Please try again later."
}
}
After receiving an error event, the stream will be closed. Your client should handle these errors appropriately and potentially implement retry logic with backoff.
Limitations
- Streaming is not available for all models; check the model capabilities in the Available Models section.
- Streaming connections have a maximum duration of 5 minutes.
- Rate limits for streaming requests may be lower than for standard API requests.
Example Applications
For complete example applications using streaming, see our Streaming Examples in the Examples Hub.