Documentation Index
Fetch the complete documentation index at: https://runcrate.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Reasoning models think step-by-step before answering. They produce a reasoning_content trace followed by a final answer — improving accuracy on math, logic, coding, and multi-step analysis.
Available models
| Model | Parameters | Strengths |
|---|
| DeepSeek-R1-0528 | 671B MoE | Top-tier math and code reasoning |
| Qwen3-Max-Thinking | Proprietary | Strong multilingual reasoning |
| Qwen3-235B-A22B-Thinking-2507 | 235B MoE | Open weights, cost-effective |
Basic reasoning
from runcrate import Runcrate
client = Runcrate(api_key="rc_live_YOUR_API_KEY")
response = client.models.chat_completion(
model="deepseek-ai/DeepSeek-R1-0528",
messages=[{"role": "user", "content": "A train leaves Chicago at 9am at 80mph. Another leaves New York (790mi away) at 10am at 100mph toward Chicago. When do they meet?"}],
)
print("Thinking:", response.choices[0].message.reasoning_content)
print("Answer:", response.choices[0].message.content)
Streaming
# Using the same client
stream = client.models.chat_completion(
model="deepseek-ai/DeepSeek-R1-0528",
messages=[{"role": "user", "content": "Prove that the square root of 2 is irrational."}],
stream=True,
)
for chunk in stream:
delta = chunk["choices"][0]["delta"]
if delta.get("reasoning_content"):
print(delta["reasoning_content"], end="", flush=True)
if delta.get("content"):
print(delta["content"], end="", flush=True)
Vercel AI SDK
import { runcrate } from '@runcrate/ai';
import { streamText } from 'ai';
const result = streamText({
model: runcrate('deepseek-ai/DeepSeek-R1-0528'),
prompt: 'Find all primes p where p^2 + 2 is also prime. Prove completeness.',
});
for await (const chunk of result.textStream) {
process.stdout.write(chunk);
}
Financial analysis
response = client.models.chat_completion(
model="Qwen/Qwen3-Max-Thinking",
messages=[
{"role": "system", "content": "You are a financial analyst. Think step by step."},
{"role": "user", "content": "Q1: Rev $12.4M, COGS $4.8M, OpEx $5.1M. Q2: $14.1M, $5.2M, $5.4M. Q3: $13.8M, $5.5M, $5.6M. Q4: $16.2M, $5.9M, $5.8M. Calculate margins, identify trends, project Q1 next year."}
],
)
print(response.choices[0].message.content)
Code debugging
response = client.models.chat_completion(
model="Qwen/Qwen3-235B-A22B-Thinking-2507",
messages=[{"role": "user", "content": "Find the bug:\ndef merge_sorted(a, b):\n result, i, j = [], 0, 0\n while i < len(a) and j < len(b):\n if a[i] <= b[j]: result.append(a[i]); i += 1\n else: result.append(b[j]); j += 1\n return result"}],
)
print(response.choices[0].message.content)
Tips
- reasoning_content contains the step-by-step trace. The final answer is in
content.
- Longer thinking = better answers. 10-30 seconds on hard problems is normal.
- Cost scales with thinking tokens. Complex problems generate more reasoning tokens.
- When to use. Math, logic, code debugging, multi-step analysis. For simple Q&A, standard models are faster.