Skip to main content

Documentation Index

Fetch the complete documentation index at: https://runcrate.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

DeepSeek V4 Pro offers a 1M token context window — enough to process entire codebases, full-length books, or hundreds of pages of legal contracts in a single request. Access it through the Runcrate Models API with no GPU management and no waitlists.

Why DeepSeek V4 Pro

FeatureDetail
Context window1,000,000 tokens
ArchitectureMixture-of-Experts
Best forLong document analysis, code generation, multi-step reasoning
API compatibilityOpenAI-compatible chat completions

Basic chat completion

curl https://api.runcrate.ai/v1/chat/completions \
  -H "Authorization: Bearer rc_live_YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-ai/DeepSeek-V4-Pro",
    "messages": [
      {"role": "user", "content": "Explain the mixture-of-experts architecture in three sentences."}
    ],
    "max_tokens": 512
  }'

Long document analysis (1M context)

The 1M context window means you can pass an entire document — a legal contract, research paper, or codebase — directly in the prompt. No chunking, no RAG pipeline, no lost context.
from openai import OpenAI
from pathlib import Path

client = OpenAI(
    base_url="https://api.runcrate.ai/v1",
    api_key="rc_live_YOUR_API_KEY",
)

# Load a full document — a 200-page contract, a novel, a codebase dump
document = Path("contract.txt").read_text()

response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V4-Pro",
    messages=[
        {"role": "system", "content": "You are a legal analyst. Read the full contract and answer questions precisely, citing section numbers."},
        {"role": "user", "content": f"Here is the contract:\n\n{document}\n\nList every clause that limits liability, with the exact section number and a one-sentence summary of each."},
    ],
    max_tokens=4096,
)

print(response.choices[0].message.content)

Streaming

Add stream: true to any request and iterate over chunks as they arrive. Works with the OpenAI SDK in both Python (for chunk in stream) and TypeScript (for await ... of stream).

Next steps