Skip to main content

Documentation Index

Fetch the complete documentation index at: https://runcrate.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

Runcrate is built around two products on a single account:
  • Inference Engine — One OpenAI-compatible API for 140+ open-source models. Billed per token.
  • Compute — On-demand GPU instances, persistent storage, and dedicated clusters. Containers, VMs, or bare-metal.
Everything is managed from one dashboard with shared billing.
Runcrate is built for AI teams running real workloads:
  • AI product teams building inference-heavy features on open-source models
  • ML teams that need on-demand GPUs for training, fine-tuning, or evaluation
  • AI companies that have outgrown aggregators and want predictable per-token pricing or reserved capacity
  • Research labs that need bare-metal access without long-term commitments
140+ open-source models across 8 categories: Chat, Reasoning, Code, Vision, Image Generation, Video Generation, Text-to-Speech, and Speech-to-Text. Families include Llama, DeepSeek, Qwen, GLM, Kimi, Mistral, FLUX, and more. See the Model Catalog for the full list.
No. The Models API is accessible via HTTP requests from any language or framework. The dashboard is fully web-based. For GPU instances, you only need an SSH client (built into macOS, Linux, and Windows).
There is no free tier. Runcrate uses a prepaid credit system — you add credits and only pay for what you use. You can start with as little as $5.
  • Models API — Instant. Make your first API call as soon as you have a key.
  • GPU Instances — Typically 1 to 3 minutes from deployment to SSH access.