Skip to main content

Documentation Index

Fetch the complete documentation index at: https://runcrate.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

For Agents

Fetch the complete documentation index at: https://runcrate.ai/docs/llms.txt Use this file to discover all available pages before exploring further.

Introducing Runcrate

Runcrate is the complete platform for AI teams to access open-source models and GPU compute. One account gives you production inference for 140+ models, on-demand GPU instances, dedicated clusters, and the SDKs to build with all of it.

Quickstart

Make your first API call in under 60 seconds.

Model Catalog

Browse 140+ open-source models across text, image, video, and audio.

SDKs

Python and TypeScript clients. Drop-in OpenAI SDK replacements.

API Reference

Full REST API documentation for inference and infrastructure.

The Runcrate Platform

Everything your AI team needs: production inference, GPU compute, and dedicated clusters — all under one account and one bill.

Inference Engine

OpenAI-compatible API for 140+ open-source models. Chat, image, video, TTS, ASR — billed per token or per generation.

GPU Compute

On-demand instances and dedicated clusters. H100, H200, B200, B300 with root SSH access.

Models API

Chat completions, image generation, video, TTS, and transcription endpoints.

GPU Instances

Deploy containers or VMs with dedicated NVIDIA GPUs in 60 seconds.

Storage

Persistent volumes with a built-in file explorer. Data survives instance termination.

Dedicated Clusters

Reserved bare-metal clusters from 16 to 128+ nodes with InfiniBand.

Explore use cases

See how teams use Runcrate to build AI products, run inference at scale, train models, and deploy custom servers.

AI SaaS Backend

Build a production AI backend with chat, image generation, and RAG.

RAG Pipeline

Build retrieval-augmented generation with embeddings and vector search.

Fine-tune LLMs

Fine-tune Llama, Mistral, or Qwen on your own data with GPU instances.

Video Generation

Generate videos with Kling, Veo, Sora, and Seedance APIs.

Start building

Python SDK

Official Python client. Drop-in replacement for the OpenAI SDK.

TypeScript SDK

Official TypeScript client for Node.js and edge runtimes.

Vercel AI SDK

First-class Runcrate provider for the Vercel AI SDK.

MCP Server

Control Runcrate from Claude, Cursor, or any MCP-compatible AI assistant. Deploy instances, manage storage, and monitor usage with natural language.
Or use the CLI for full terminal control:

CLI Overview

Deploy instances, SSH in, transfer files, and manage volumes from your terminal.

CLI Installation

Install on macOS, Linux, or Windows and authenticate in 30 seconds.

Which product do you need?

Inference EngineCompute
Best forBuilding AI features on open-source modelsTraining, fine-tuning, custom inference servers, reserved capacity
BillingPer token / per generationPer hour (instances) · Monthly (dedicated)
Setup time60 seconds60 seconds (instances) · 1–2 weeks (dedicated)
CommitmentNoneNone (instances) · 12–24 months (dedicated)
AccessSelf-serve · API keySelf-serve (instances) · Contact sales (dedicated)
GPUsManaged for youH100, H200, B200, B300, A100, L40S, RTX 4090
Not sure which fits? Start with the Inference Engine quickstart. Most teams never need anything else.