Create Chat Completion
Generates a model response for a conversation. This endpoint is fully compatible with the OpenAI Chat Completions API format.Request Body
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | The model ID to use (e.g., meta-llama/Meta-Llama-3.1-8B-Instruct). |
messages | array | Yes | An array of message objects representing the conversation. |
max_tokens | integer | No | Maximum number of tokens to generate in the response. |
temperature | number | No | Sampling temperature between 0 and 2. Higher values produce more random output. Default: 0.7. |
top_p | number | No | Nucleus sampling parameter. Only tokens with cumulative probability up to top_p are considered. Default: 1. |
stream | boolean | No | If true, partial responses are streamed as server-sent events. Default: false. |
Messages Format
Each message object must include arole and content field:
| Role | Description |
|---|---|
system | Sets the behavior and persona of the assistant. |
user | The user’s message or question. |
assistant | A previous response from the assistant (for multi-turn context). |
Vision Input
For models that support vision, you can pass images by providingcontent as an array of content parts:
Example Request
Response
Streaming
Whenstream is set to true, the response is delivered as server-sent events (SSE). Each event contains a JSON chunk with a partial response:
data: line contains a JSON object. The stream ends with data: [DONE].