HTTP Streaming API
The ZeebeeAI HTTP Streaming API allows you to receive AI responses in real-time through a server-sent events (SSE) stream. This enables your application to display responses progressively as they are generated, rather than waiting for the entire response to be completed.
HTTP Streaming Endpoint
To make a streaming request, use the following endpoint:
POST https://api.zeebee.ai/v1/chat/completions/stream
Headers
Include the following headers in your request:
Header | Value | Description |
---|---|---|
Content-Type |
application/json |
Specifies that the request body is JSON |
Authorization |
Bearer YOUR_API_KEY |
Your API key for authentication |
Accept |
text/event-stream |
Indicates that you want to receive a stream of events |
Request Body
The request body is the same as for non-streaming requests:
{
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Tell me about HTTP streaming."
}
],
"model": "gpt-4-turbo",
"temperature": 0.7,
"stream": true
}
Notice the stream: true
parameter, which is required for streaming responses.
Response Format
The streaming response will be delivered as a series of server-sent events, where each event contains a chunk of the AI's response.
Event Structure
Each event in the stream follows this format:
data: {"id": "msg_123", "object": "chat.completion.chunk", "created": 1677825456, "model": "gpt-4-turbo", "choices": [{"delta": {"content": "Hello"}, "index": 0, "finish_reason": null}]}
data: {"id": "msg_123", "object": "chat.completion.chunk", "created": 1677825456, "model": "gpt-4-turbo", "choices": [{"delta": {"content": " world"}, "index": 0, "finish_reason": null}]}
data: {"id": "msg_123", "object": "chat.completion.chunk", "created": 1677825456, "model": "gpt-4-turbo", "choices": [{"delta": {"content": "!"}, "index": 0, "finish_reason": null}]}
data: {"id": "msg_123", "object": "chat.completion.chunk", "created": 1677825456, "model": "gpt-4-turbo", "choices": [{"delta": {}, "index": 0, "finish_reason": "stop"}]}
data: [DONE]
Understanding Streaming Responses
- Each line starting with
data:
contains a JSON object with a piece of the response. - The
delta
field contains the new content being added to the response. - The
finish_reason
will benull
until the response is complete. - The final event has an empty
delta
andfinish_reason
set to"stop"
. - The stream ends with
data: [DONE]
.
delta.content
values in order.
Example: JavaScript Client
Here's an example of how to consume the streaming API using JavaScript:
async function streamCompletion() {
const apiKey = 'your-api-key';
const response = await fetch('https://api.zeebee.ai/v1/chat/completions/stream', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${apiKey}`,
'Accept': 'text/event-stream'
},
body: JSON.stringify({
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'Tell me a short story.' }
],
model: 'gpt-4-turbo',
temperature: 0.7,
stream: true
})
});
// Create a reader for the response body stream
const reader = response.body.getReader();
const decoder = new TextDecoder();
const outputElement = document.getElementById('output');
let buffer = '';
while (true) {
const { value, done } = await reader.read();
if (done) break;
// Decode the received bytes to text
const chunk = decoder.decode(value, { stream: true });
buffer += chunk;
// Process complete lines in the buffer
let lineEnd;
while ((lineEnd = buffer.indexOf('\n')) >= 0) {
const line = buffer.slice(0, lineEnd).trim();
buffer = buffer.slice(lineEnd + 1);
if (line.startsWith('data: ')) {
const data = line.slice(6);
if (data === '[DONE]') {
console.log('Stream completed');
break;
}
try {
const json = JSON.parse(data);
const content = json.choices[0].delta.content || '';
if (content) {
// Append the new content to the output
outputElement.textContent += content;
}
} catch (error) {
console.error('Error parsing JSON:', error);
}
}
}
}
}
Handling Errors
If an error occurs during the streaming process, the server will send an error event:
data: {"error": {"message": "Error message", "type": "error_type", "code": "error_code"}}
Common error scenarios include:
- Authentication failures
- Rate limiting
- Invalid request parameters
- Server-side issues
Comparing Streaming vs. Non-Streaming
Feature | Streaming API | Non-Streaming API |
---|---|---|
Perceived latency | Lower (responses appear immediately) | Higher (wait for complete response) |
Connection duration | Longer (open for entire generation) | Shorter (single request/response) |
Implementation complexity | Higher (stream handling required) | Lower (standard HTTP request) |
User experience | More interactive, typing-like effect | Complete responses appear at once |
Limits and Considerations
- The connection will time out after 60 seconds if no chunks are available to send.
- If you lose connection, you'll need to start a new request.
- Streaming counts toward your API usage limits in the same way as non-streaming requests.
- For mobile apps or low-bandwidth connections, consider whether streaming is beneficial.
Supported Models
HTTP streaming is supported by all LLM models offered on the ZeebeeAI platform, including:
- gpt-4-turbo
- gpt-3.5-turbo
- claude-3-opus
- claude-3-sonnet
- claude-3-haiku
- gemini-pro