Documentation Index
Fetch the complete documentation index at: https://runcrate.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
The chat completions endpoint powers text generation across Chat, Code, Reasoning, and Vision models. It supports streaming, system prompts, multi-turn conversations, and image inputs for vision-capable models.
Endpoint
POST https://api.runcrate.ai/v1/chat/completions
Basic Usage
curl https://api.runcrate.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer rc_live_YOUR_API_KEY" \
-d '{
"model": "deepseek-ai/DeepSeek-V3",
"messages": [
{"role": "user", "content": "Explain quantum computing in simple terms"}
],
"max_tokens": 512,
"temperature": 0.7
}'
Streaming
Enable real-time token streaming with stream: true:
stream = client.chat.completions.create(
model="deepseek-ai/DeepSeek-V3",
messages=[{"role": "user", "content": "Write a poem"}],
stream=True,
)
for chunk in stream:
content = chunk.choices[0].delta.content
if content:
print(content, end="")
Vision Models
Vision-capable models accept images in the message content. Send images as URLs or base64:
response = client.chat.completions.create(
model="google/gemini-2.5-flash",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What is in this image?"},
{"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}},
],
}
],
)
Vision-capable models include Gemini 2.5, Llama 4 Maverick, Gemma 3, GPT-4o, and others marked with “Vision” in the Model Catalog.
Reasoning Models
Reasoning models (DeepSeek-R1, QwQ, etc.) produce chain-of-thought output. The reasoning steps appear in the reasoning_content field of the streamed response delta, separate from the final answer in content.
Parameters
| Parameter | Type | Default | Description |
|---|
model | string | required | Model ID (e.g., deepseek-ai/DeepSeek-V3) |
messages | array | required | Conversation messages with role and content |
max_tokens | integer | varies | Maximum tokens to generate |
temperature | number | 0.7 | Randomness (0 = deterministic, 2 = very random) |
stream | boolean | false | Enable streaming responses |
top_p | number | 1.0 | Nucleus sampling threshold |
Message Roles
| Role | Purpose |
|---|
system | Sets the model’s behavior and personality |
user | The user’s input |
assistant | Previous model responses (for multi-turn) |
Popular Chat Models
| Model | Context | Best For |
|---|
deepseek-ai/DeepSeek-V3 | 128K | General purpose, cost-effective |
anthropic/claude-4-sonnet | 200K | Reasoning, analysis, coding |
google/gemini-2.5-flash | 1M | Fast, multimodal, long context |
meta-llama/Llama-4-Scout | 128K | Multilingual, efficient |
Qwen/Qwen3-Max | 128K | Reasoning, multilingual |