General FAQ

What is Runcrate?

Runcrate is built around two products on a single account:

Inference Engine — One OpenAI-compatible API for 140+ open-source models. Billed per token.
Compute — On-demand GPU instances, persistent storage, and dedicated clusters. Containers, VMs, or bare-metal.

Everything is managed from one dashboard with shared billing.

Who is Runcrate for?

Runcrate is built for AI teams running real workloads:

AI product teams building inference-heavy features on open-source models
ML teams that need on-demand GPUs for training, fine-tuning, or evaluation
AI companies that have outgrown aggregators and want predictable per-token pricing or reserved capacity
Research labs that need bare-metal access without long-term commitments

What models are available?

140+ open-source models across 8 categories: Chat, Reasoning, Code, Vision, Image Generation, Video Generation, Text-to-Speech, and Speech-to-Text. Families include Llama, DeepSeek, Qwen, GLM, Kimi, Mistral, FLUX, and more. See the Model Catalog for the full list.

Do I need to install anything?

No. The Models API is accessible via HTTP requests from any language or framework. The dashboard is fully web-based. For GPU instances, you only need an SSH client (built into macOS, Linux, and Windows).

Is there a free tier?

There is no free tier. Runcrate uses a prepaid credit system — you add credits and only pay for what you use. You can start with as little as $5.

How fast are deployments?

Models API — Instant. Make your first API call as soon as you have a key.
GPU Instances — Typically 1 to 3 minutes from deployment to SSH access.

How can I get support?

Discord — Join our Discord community for real-time help
Email — Contact us at support@runcrate.ai

Welcome

SDKs

CLI

Inference

Compute

Storage

Dedicated Clusters

Billing

Account

FAQ