Documentation Index
Fetch the complete documentation index at: https://runcrate.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Zhipu AI’s GLM family offers strong multilingual chat models with competitive reasoning. All three generations are available through the Runcrate API.
Available GLM models
| Model | Context | Strengths |
|---|
| GLM-5.1 | 128K | Latest generation, strongest reasoning |
| GLM-5 | 128K | Strong general-purpose chat |
| GLM-4.7 | 128K | Cost-effective, fast inference |
Basic chat completion
from runcrate import Runcrate
client = Runcrate(api_key="rc_live_YOUR_API_KEY")
response = client.models.chat_completion(
model="zai-org/GLM-5.1",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain the difference between TCP and UDP for a beginner."},
],
max_tokens=512,
)
print(response.choices[0].message.content)
Streaming with Vercel AI SDK
// app/api/chat/route.ts
import { runcrate } from '@runcrate/ai';
import { streamText } from 'ai';
export async function POST(req: Request) {
const { messages } = await req.json();
const result = streamText({
model: runcrate('zai-org/GLM-5.1'),
system: 'You are a helpful assistant specializing in technical explanations.',
messages,
});
return result.toDataStreamResponse();
}
Structured output
import { runcrate } from '@runcrate/ai';
import { generateText, Output } from 'ai';
import { z } from 'zod';
const AnalysisSchema = z.object({
topic: z.string(),
keyPoints: z.array(z.string()).describe('3–5 main arguments'),
conclusion: z.string(),
confidence: z.number().min(0).max(1),
});
const { output } = await generateText({
model: runcrate('zai-org/GLM-5.1'),
output: Output.object({ schema: AnalysisSchema }),
prompt: 'Analyze the impact of remote work on software engineering productivity.',
});
Comparing GLM generations
from runcrate import Runcrate
client = Runcrate(api_key="rc_live_YOUR_API_KEY")
models = ["zai-org/GLM-5.1", "zai-org/GLM-5", "zai-org/GLM-4.7"]
prompt = "Write a SQL query for the top 10 customers by order value in the last 30 days."
for model in models:
response = client.models.chat_completion(
model=model,
messages=[{"role": "user", "content": prompt}],
max_tokens=512,
temperature=0.3,
)
print(f"\n--- {model} ---")
print(response.choices[0].message.content)
Choosing the right GLM model
| Use case | Model | Reason |
|---|
| Complex reasoning | GLM-5.1 | Strongest in the family |
| General chat | GLM-5 | Good balance of quality and speed |
| High-volume, cost-sensitive | GLM-4.7 | Fastest, lowest cost per token |
Tips
- GLM-5.1 is the recommended default unless you need cost savings.
- Multilingual: GLM models handle Chinese and English equally well.
- Temperature 0.3–0.5 works best for factual tasks; 0.7–0.9 for creative writing.
Next steps