Documentation Index
Fetch the complete documentation index at: https://runcrate.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Not every request needs the same model. Use a fast, cheap model for classification and routing, then send generation tasks to a stronger model. With Runcrate, every model shares the same API — switching is a string change.
The routing pattern
User request
↓
Classify intent (fast model — DeepSeek V3.2)
↓
┌─────────────────────────────────────────────┐
│ simple question → DeepSeek V3.2 ($0.30/M) │
│ creative writing → Claude 4 Sonnet ($3/M) │
│ code generation → Qwen3 Coder ($0.20/M) │
│ unsafe content → blocked │
└─────────────────────────────────────────────┘
↓
Response
Build a model router
from openai import OpenAI
import json
client = OpenAI(
base_url="https://api.runcrate.ai/v1",
api_key="rc_live_YOUR_API_KEY",
)
# Step 1: Classify the request with a fast, cheap model
def classify_intent(user_message: str) -> str:
response = client.chat.completions.create(
model="deepseek-ai/DeepSeek-V3.2",
messages=[
{
"role": "system",
"content": 'Classify the user message into exactly one category: "simple_qa", "creative", "code", "unsafe". Return only the category string, no quotes, no explanation.',
},
{"role": "user", "content": user_message},
],
max_tokens=16,
)
return response.choices[0].message.content.strip().lower()
# Step 2: Route to the right model
MODEL_MAP = {
"simple_qa": "deepseek-ai/DeepSeek-V3.2",
"creative": "anthropic/claude-4-sonnet",
"code": "Qwen/Qwen3-Coder-480B-A35B-Instruct-Turbo",
}
def route_and_generate(user_message: str) -> str:
intent = classify_intent(user_message)
if intent == "unsafe":
return "I can't help with that request."
model = MODEL_MAP.get(intent, "deepseek-ai/DeepSeek-V3.2")
response = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": user_message}],
max_tokens=2048,
)
return response.choices[0].message.content
# Try it
queries = [
"What is the capital of Japan?",
"Write a short story about a robot learning to paint.",
"Write a Python function to parse CSV files with error handling.",
]
for query in queries:
intent = classify_intent(query)
model = MODEL_MAP.get(intent, "deepseek-ai/DeepSeek-V3.2")
print(f"Query: {query}")
print(f"Intent: {intent} → Model: {model}")
print(f"Response: {route_and_generate(query)[:100]}...")
print()
Cost comparison
| Strategy | Avg cost per request | Quality |
|---|
| Always use Claude 4 Sonnet | ~$0.015 | Highest |
| Always use DeepSeek V3.2 | ~$0.001 | Good |
| Routed (this example) | ~$0.003 | Highest where it matters |
Routing typically cuts costs 60-80% compared to always using the strongest model, with minimal quality loss — because most requests are simple Q&A that a fast model handles perfectly.
Next steps