Skip to main content

Documentation Index

Fetch the complete documentation index at: https://runcrate.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

Connect the Runcrate MCP server to your AI coding assistant and manage your entire GPU cloud with natural language. Deploy instances, SSH into machines, check costs, and tear down resources — without leaving your editor.

What you’ll do

Use Claude Code, Cursor, or any MCP-compatible agent to:
  • Deploy and terminate GPU instances on demand
  • Run shell commands on remote machines
  • Monitor GPU utilization and training progress
  • Track spending and manage storage
  • Automate multi-step workflows like “deploy, train, download, teardown”

Setup

Claude Code

// .mcp.json in your project root
{
  "mcpServers": {
    "runcrate": {
      "url": "https://mcp.runcrate.ai/sse"
    }
  }
}

Cursor

Add to Cursor Settings > MCP Servers:
{
  "mcpServers": {
    "runcrate": {
      "url": "https://mcp.runcrate.ai/sse"
    }
  }
}
First connection opens a browser window for OAuth — authorize once, then your agent has access.

Real conversations

”What GPUs do I have running and what are they costing me?”

The agent calls list_instances and summarizes:
You have 3 running instances:
  • training-v2 — 1x A100, 1.20/hr,running14hours(1.20/hr, running 14 hours (16.80 so far)
  • dev-box — 1x RTX 4090, 0.35/hr,running3hours(0.35/hr, running 3 hours (1.05)
  • inference-prod — 1x H100, 2.50/hr,running48hours(2.50/hr, running 48 hours (120.00)
Total burn rate: 4.05/hr.Balance:4.05/hr. Balance: 342.15 (~84 hours at current rate).

”My training script is throwing CUDA OOM errors. Check GPU memory on training-v2.”

The agent calls ssh_execute with nvidia-smi:
training-v2 GPU memory usage:
  • GPU 0: 38,912 MiB / 40,960 MiB (95% used)
You’re almost maxing out the A100 40GB. Options:
  1. Reduce batch size
  2. Enable gradient checkpointing
  3. Switch to an A100 80GB instance

”Deploy a fresh H100 for a quick benchmark, run my benchmark script, and tear it down.”

The agent orchestrates the full workflow:
  1. create_instance — deploys an H100 called “benchmark”
  2. instance_status — polls until deployed
  3. ssh_executepip install torch && python -c "import torch; ...benchmark code..."
  4. Returns the benchmark results
  5. delete_instance — tears down immediately

”Kill anything that’s been idle for more than 6 hours.”

The agent calls list_instances, identifies instances by creation time, uses ssh_execute to check GPU utilization on each, and calls delete_instance on idle ones:
Checked 3 instances:
  • training-v2: GPU at 98% utilization — keeping
  • dev-box: GPU at 0% for 4 hours — terminated
  • old-experiment: GPU at 0% for 12 hours — terminated
Saved 0.35/hr+0.35/hr + 1.20/hr = $1.55/hr.

”I need a persistent volume for my dataset. Create 500GB in us-east, then show me what regions are available.”

The agent calls list_storage_regions, then create_volume:
Created volume imagenet-data (500 GB) in us-east-1. Cost: ~3.75/week(3.75/week (15/month). Available regions: us-east-1, us-west-2, eu-central-1, ap-southeast-1.

”Set up a vLLM server on an A100 with Llama 3.1 70B and give me the endpoint URL.”

The agent chains:
  1. create_instance with gpu: "A100"
  2. instance_status polling
  3. ssh_executepip install vllm
  4. ssh_execute — starts vLLM with --model meta-llama/Llama-3.1-70B-Instruct --port 8000 --host 0.0.0.0
  5. get_instance — returns the IP
Your vLLM server is live at:
http://203.0.113.42:8000/v1/chat/completions
Use it as a drop-in OpenAI endpoint.

”What templates and environments do I have?”

The agent calls list_templates and list_environments:
Templates: ubuntu-devbox, ubuntu-train, ubuntu-inference Environments: main (default), staging, production

”Upload my train.py script to the training-run instance at /workspace/train.py”

The agent calls file_upload with the file content and remote path:
Uploaded train.py (4.2 KB) to training-run at /workspace/train.py.

”Create a new environment called ‘staging’ in my workspace, then list all environments”

The agent calls create_environment with name “staging”, then list_environments:
Created environment staging. Environments: main (default), staging

What it can’t do (yet)

  • Port forwarding or SSH tunnels (use native SSH)
  • Modify billing settings (use the dashboard)
  • Create or delete workspaces (use the dashboard)
Environment create/delete IS supported via MCP — use the create_environment and delete_environment tools.

Tips

  • Be specific with instance names — the agent uses them to target ssh_execute and delete_instance
  • Ask the agent to check nvidia-smi and df -h before debugging — most issues are GPU OOM or disk full
  • Chain requests: “deploy, install, run, download, teardown” in a single message works
  • The agent remembers instance IDs within a conversation, so you can say “check the status of that instance” after deploying one