Manage GPU Infrastructure with AI Agents

Connect the Runcrate MCP server to your AI coding assistant and manage your entire GPU cloud with natural language. Deploy instances, SSH into machines, check costs, and tear down resources — without leaving your editor.

What you’ll do

Use Claude Code, Cursor, or any MCP-compatible agent to:

Deploy and terminate GPU instances on demand
Run shell commands on remote machines
Monitor GPU utilization and training progress
Track spending and manage storage
Automate multi-step workflows like “deploy, train, download, teardown”

Setup

Claude Code

// .mcp.json in your project root
{
  "mcpServers": {
    "runcrate": {
      "url": "https://mcp.runcrate.ai/sse"
    }
  }
}

Cursor

Add to Cursor Settings > MCP Servers:

{
  "mcpServers": {
    "runcrate": {
      "url": "https://mcp.runcrate.ai/sse"
    }
  }
}

First connection opens a browser window for OAuth — authorize once, then your agent has access.

Real conversations

”What GPUs do I have running and what are they costing me?”

The agent calls list_instances and summarizes:

You have 3 running instances:

training-v2 — 1x A100, $1.20/hr, running 14 hours ($ 16.80 so far)

dev-box — 1x RTX 4090, $0.35/hr, running 3 hours ($ 1.05)

inference-prod — 1x H100, $2.50/hr, running 48 hours ($ 120.00)

Total burn rate: $4.05/hr. Balance:$ 342.15 (~84 hours at current rate).

”My training script is throwing CUDA OOM errors. Check GPU memory on training-v2.”

The agent calls ssh_execute with nvidia-smi:

training-v2 GPU memory usage:

GPU 0: 38,912 MiB / 40,960 MiB (95% used)

You’re almost maxing out the A100 40GB. Options:

Reduce batch size

Enable gradient checkpointing

Switch to an A100 80GB instance

”Deploy a fresh H100 for a quick benchmark, run my benchmark script, and tear it down.”

The agent orchestrates the full workflow:

create_instance — deploys an H100 called “benchmark”
instance_status — polls until deployed
ssh_execute — pip install torch && python -c "import torch; ...benchmark code..."
Returns the benchmark results
delete_instance — tears down immediately

”Kill anything that’s been idle for more than 6 hours.”

The agent calls list_instances, identifies instances by creation time, uses ssh_execute to check GPU utilization on each, and calls delete_instance on idle ones:

Checked 3 instances:

training-v2: GPU at 98% utilization — keeping

dev-box: GPU at 0% for 4 hours — terminated

old-experiment: GPU at 0% for 12 hours — terminated

Saved $0.35/hr +$ 1.20/hr = $1.55/hr.

”I need a persistent volume for my dataset. Create 500GB in us-east, then show me what regions are available.”

The agent calls list_storage_regions, then create_volume:

Created volume imagenet-data (500 GB) in us-east-1. Cost: ~ $3.75/week ($ 15/month). Available regions: us-east-1, us-west-2, eu-central-1, ap-southeast-1.

”Set up a vLLM server on an A100 with Llama 3.1 70B and give me the endpoint URL.”

The agent chains:

create_instance with gpu: "A100"
instance_status polling
ssh_execute — pip install vllm
ssh_execute — starts vLLM with --model meta-llama/Llama-3.1-70B-Instruct --port 8000 --host 0.0.0.0
get_instance — returns the IP

Your vLLM server is live at:
http://203.0.113.42:8000/v1/chat/completions
Use it as a drop-in OpenAI endpoint.

”What templates and environments do I have?”

The agent calls list_templates and list_environments:

Templates: ubuntu-devbox, ubuntu-train, ubuntu-inference Environments: main (default), staging, production

”Upload my train.py script to the training-run instance at /workspace/train.py”

The agent calls file_upload with the file content and remote path:

Uploaded train.py (4.2 KB) to training-run at /workspace/train.py.

”Create a new environment called ‘staging’ in my workspace, then list all environments”

The agent calls create_environment with name “staging”, then list_environments:

Created environment staging. Environments: main (default), staging

What it can’t do (yet)

Port forwarding or SSH tunnels (use native SSH)
Modify billing settings (use the dashboard)
Create or delete workspaces (use the dashboard)

Environment create/delete IS supported via MCP — use the create_environment and delete_environment tools.

Tips

Be specific with instance names — the agent uses them to target ssh_execute and delete_instance
Ask the agent to check nvidia-smi and df -h before debugging — most issues are GPU OOM or disk full
Chain requests: “deploy, install, run, download, teardown” in a single message works
The agent remembers instance IDs within a conversation, so you can say “check the status of that instance” after deploying one

​What you’ll do

​Setup

​Claude Code

​Cursor

​Real conversations

​”What GPUs do I have running and what are they costing me?”

​”My training script is throwing CUDA OOM errors. Check GPU memory on training-v2.”

​”Deploy a fresh H100 for a quick benchmark, run my benchmark script, and tear it down.”

​”Kill anything that’s been idle for more than 6 hours.”

​”I need a persistent volume for my dataset. Create 500GB in us-east, then show me what regions are available.”

​”Set up a vLLM server on an A100 with Llama 3.1 70B and give me the endpoint URL.”

​”What templates and environments do I have?”

​”Upload my train.py script to the training-run instance at /workspace/train.py”

​”Create a new environment called ‘staging’ in my workspace, then list all environments”

​What it can’t do (yet)

​Tips

What you’ll do

Setup

Claude Code

Cursor

Real conversations

”What GPUs do I have running and what are they costing me?”

”My training script is throwing CUDA OOM errors. Check GPU memory on training-v2.”

”Deploy a fresh H100 for a quick benchmark, run my benchmark script, and tear it down.”

”Kill anything that’s been idle for more than 6 hours.”

”I need a persistent volume for my dataset. Create 500GB in us-east, then show me what regions are available.”

”Set up a vLLM server on an A100 with Llama 3.1 70B and give me the endpoint URL.”

”What templates and environments do I have?”

”Upload my train.py script to the training-run instance at /workspace/train.py”

”Create a new environment called ‘staging’ in my workspace, then list all environments”

What it can’t do (yet)

Tips