Skip to main content

Documentation Index

Fetch the complete documentation index at: https://runcrate.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

Zhipu AI’s GLM family offers strong multilingual chat models with competitive reasoning. All three generations are available through the Runcrate API.

Available GLM models

ModelContextStrengths
GLM-5.1128KLatest generation, strongest reasoning
GLM-5128KStrong general-purpose chat
GLM-4.7128KCost-effective, fast inference

Basic chat completion

from runcrate import Runcrate

client = Runcrate(api_key="rc_live_YOUR_API_KEY")

response = client.models.chat_completion(
    model="zai-org/GLM-5.1",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain the difference between TCP and UDP for a beginner."},
    ],
    max_tokens=512,
)

print(response.choices[0].message.content)

Streaming with Vercel AI SDK

// app/api/chat/route.ts
import { runcrate } from '@runcrate/ai';
import { streamText } from 'ai';

export async function POST(req: Request) {
  const { messages } = await req.json();

  const result = streamText({
    model: runcrate('zai-org/GLM-5.1'),
    system: 'You are a helpful assistant specializing in technical explanations.',
    messages,
  });

  return result.toDataStreamResponse();
}

Structured output

import { runcrate } from '@runcrate/ai';
import { generateText, Output } from 'ai';
import { z } from 'zod';

const AnalysisSchema = z.object({
  topic: z.string(),
  keyPoints: z.array(z.string()).describe('3–5 main arguments'),
  conclusion: z.string(),
  confidence: z.number().min(0).max(1),
});

const { output } = await generateText({
  model: runcrate('zai-org/GLM-5.1'),
  output: Output.object({ schema: AnalysisSchema }),
  prompt: 'Analyze the impact of remote work on software engineering productivity.',
});

Comparing GLM generations

from runcrate import Runcrate

client = Runcrate(api_key="rc_live_YOUR_API_KEY")

models = ["zai-org/GLM-5.1", "zai-org/GLM-5", "zai-org/GLM-4.7"]
prompt = "Write a SQL query for the top 10 customers by order value in the last 30 days."

for model in models:
    response = client.models.chat_completion(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        max_tokens=512,
        temperature=0.3,
    )
    print(f"\n--- {model} ---")
    print(response.choices[0].message.content)

Choosing the right GLM model

Use caseModelReason
Complex reasoningGLM-5.1Strongest in the family
General chatGLM-5Good balance of quality and speed
High-volume, cost-sensitiveGLM-4.7Fastest, lowest cost per token

Tips

  • GLM-5.1 is the recommended default unless you need cost savings.
  • Multilingual: GLM models handle Chinese and English equally well.
  • Temperature 0.3–0.5 works best for factual tasks; 0.7–0.9 for creative writing.

Next steps