Skip to main content

Documentation Index

Fetch the complete documentation index at: https://runcrate.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

When training crashes or inference slows down, your AI agent can SSH into the instance, inspect GPU state, read logs, and suggest concrete fixes.

”My training script is crashing with CUDA OOM. Can you check what’s happening?”

The agent investigates the GPU and the error:
  1. list_instances — finds your running instances
  2. ssh_execute — runs nvidia-smi on the target instance
  3. ssh_execute — runs tail -50 /root/train.log to find the crash
Instance: training-run (1x A100 40GB) GPU memory: 39,800 / 40,960 MiB (97% — nearly full) The log shows:
torch.cuda.OutOfMemoryError: CUDA out of memory.
Tried to allocate 2.00 GiB. GPU 0 has 1.12 GiB free.
You’re running out of memory during the backward pass. Three options:
  1. Reduce batch size from 16 to 8 — halves peak memory
  2. Enable gradient checkpointing — add model.gradient_checkpointing_enable() before training
  3. Upgrade GPU — deploy on an A100 80GB for 2x the VRAM at $1.60/hr

”Check if there are any zombie processes eating GPU memory on my dev-box.”

The agent inspects processes on the GPU:
  1. ssh_execute — runs nvidia-smi to see GPU process list
  2. ssh_execute — runs ps aux | grep python to cross-reference
Found 3 Python processes using GPU memory on dev-box:
PIDGPU MemoryCommand
184212,400 MiBpython train.py (your active run)
12038,200 MiBpython eval.py (finished 2 hours ago, still holding memory)
9804,100 MiBpython -m jupyter (idle notebook kernel)
PIDs 1203 and 980 are zombie processes. Want me to kill them? That would free 12.3 GB of VRAM.
If you say yes, the agent runs:
  1. ssh_executekill 1203 1980
  2. ssh_executenvidia-smi again to confirm memory is freed
Killed 2 zombie processes. GPU memory dropped from 24.7 GB to 12.4 GB. You now have 28.5 GB free.

”My inference server is responding slowly. What’s wrong?”

The agent checks GPU utilization, system resources, and the serving process:
  1. get_instance — confirms instance specs
  2. ssh_execute — runs nvidia-smi dmon -s u -c 5 (5 samples of GPU utilization)
  3. ssh_execute — runs free -h to check RAM
  4. ssh_execute — runs df -h / to check disk space
  5. ssh_execute — runs tail -30 /root/vllm.log to check the serving logs
Instance: inference-prod (1x H100 80GB)
  • GPU utilization: averaging 15% (not compute-bound)
  • GPU memory: 72 GB / 80 GB (healthy)
  • System RAM: 58 GB / 64 GB (92% — swapping likely)
  • Disk: 94% full — only 12 GB free
The bottleneck is system RAM, not GPU. vLLM is likely swapping to disk during request preprocessing. Two fixes:
  1. Clear disk cache: rm -rf /root/.cache/pip /tmp/*
  2. Restart vLLM with --max-num-seqs 64 (down from default 256) to reduce CPU-side memory

Tools used in this workflow

ToolPurpose
list_instancesFind the problematic instance
get_instanceCheck instance specs and configuration
ssh_executeRun nvidia-smi, read logs, kill processes, check system resources