Connect the Runcrate MCP server to your AI coding assistant and manage your entire GPU cloud with natural language. Deploy instances, SSH into machines, check costs, and tear down resources — without leaving your editor.Documentation Index
Fetch the complete documentation index at: https://runcrate.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
What you’ll do
Use Claude Code, Cursor, or any MCP-compatible agent to:- Deploy and terminate GPU instances on demand
- Run shell commands on remote machines
- Monitor GPU utilization and training progress
- Track spending and manage storage
- Automate multi-step workflows like “deploy, train, download, teardown”
Setup
Claude Code
Cursor
Add to Cursor Settings > MCP Servers:Real conversations
”What GPUs do I have running and what are they costing me?”
The agent callslist_instances and summarizes:
You have 3 running instances:Total burn rate: 342.15 (~84 hours at current rate).
- training-v2 — 1x A100, 16.80 so far)
- dev-box — 1x RTX 4090, 1.05)
- inference-prod — 1x H100, 120.00)
”My training script is throwing CUDA OOM errors. Check GPU memory on training-v2.”
The agent callsssh_execute with nvidia-smi:
training-v2 GPU memory usage:You’re almost maxing out the A100 40GB. Options:
- GPU 0: 38,912 MiB / 40,960 MiB (95% used)
- Reduce batch size
- Enable gradient checkpointing
- Switch to an A100 80GB instance
”Deploy a fresh H100 for a quick benchmark, run my benchmark script, and tear it down.”
The agent orchestrates the full workflow:create_instance— deploys an H100 called “benchmark”instance_status— polls until deployedssh_execute—pip install torch && python -c "import torch; ...benchmark code..."- Returns the benchmark results
delete_instance— tears down immediately
”Kill anything that’s been idle for more than 6 hours.”
The agent callslist_instances, identifies instances by creation time, uses ssh_execute to check GPU utilization on each, and calls delete_instance on idle ones:
Checked 3 instances:Saved 1.20/hr = $1.55/hr.
- training-v2: GPU at 98% utilization — keeping
- dev-box: GPU at 0% for 4 hours — terminated
- old-experiment: GPU at 0% for 12 hours — terminated
”I need a persistent volume for my dataset. Create 500GB in us-east, then show me what regions are available.”
The agent callslist_storage_regions, then create_volume:
Created volume imagenet-data (500 GB) in us-east-1. Cost: ~15/month). Available regions: us-east-1, us-west-2, eu-central-1, ap-southeast-1.
”Set up a vLLM server on an A100 with Llama 3.1 70B and give me the endpoint URL.”
The agent chains:create_instancewithgpu: "A100"instance_statuspollingssh_execute—pip install vllmssh_execute— starts vLLM with--model meta-llama/Llama-3.1-70B-Instruct --port 8000 --host 0.0.0.0get_instance— returns the IP
Your vLLM server is live at:Use it as a drop-in OpenAI endpoint.
”What templates and environments do I have?”
The agent callslist_templates and list_environments:
Templates: ubuntu-devbox, ubuntu-train, ubuntu-inference Environments: main (default), staging, production
”Upload my train.py script to the training-run instance at /workspace/train.py”
The agent callsfile_upload with the file content and remote path:
Uploaded train.py (4.2 KB) to training-run at /workspace/train.py.
”Create a new environment called ‘staging’ in my workspace, then list all environments”
The agent callscreate_environment with name “staging”, then list_environments:
Created environment staging. Environments: main (default), staging
What it can’t do (yet)
- Port forwarding or SSH tunnels (use native SSH)
- Modify billing settings (use the dashboard)
- Create or delete workspaces (use the dashboard)
Environment create/delete IS supported via MCP — use the
create_environment and delete_environment tools.Tips
- Be specific with instance names — the agent uses them to target
ssh_executeanddelete_instance - Ask the agent to check
nvidia-smianddf -hbefore debugging — most issues are GPU OOM or disk full - Chain requests: “deploy, install, run, download, teardown” in a single message works
- The agent remembers instance IDs within a conversation, so you can say “check the status of that instance” after deploying one