When training crashes or inference slows down, your AI agent can SSH into the instance, inspect GPU state, read logs, and suggest concrete fixes.Documentation Index
Fetch the complete documentation index at: https://runcrate.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
”My training script is crashing with CUDA OOM. Can you check what’s happening?”
The agent investigates the GPU and the error:list_instances— finds your running instancesssh_execute— runsnvidia-smion the target instancessh_execute— runstail -50 /root/train.logto find the crash
Instance:training-run(1x A100 40GB) GPU memory: 39,800 / 40,960 MiB (97% — nearly full) The log shows:You’re running out of memory during the backward pass. Three options:
- Reduce batch size from 16 to 8 — halves peak memory
- Enable gradient checkpointing — add
model.gradient_checkpointing_enable()before training- Upgrade GPU — deploy on an A100 80GB for 2x the VRAM at $1.60/hr
”Check if there are any zombie processes eating GPU memory on my dev-box.”
The agent inspects processes on the GPU:ssh_execute— runsnvidia-smito see GPU process listssh_execute— runsps aux | grep pythonto cross-reference
Found 3 Python processes using GPU memory on dev-box:If you say yes, the agent runs:PIDs 1203 and 980 are zombie processes. Want me to kill them? That would free 12.3 GB of VRAM.
PID GPU Memory Command 1842 12,400 MiB python train.py(your active run)1203 8,200 MiB python eval.py(finished 2 hours ago, still holding memory)980 4,100 MiB python -m jupyter(idle notebook kernel)
ssh_execute—kill 1203 1980ssh_execute—nvidia-smiagain to confirm memory is freed
Killed 2 zombie processes. GPU memory dropped from 24.7 GB to 12.4 GB. You now have 28.5 GB free.
”My inference server is responding slowly. What’s wrong?”
The agent checks GPU utilization, system resources, and the serving process:get_instance— confirms instance specsssh_execute— runsnvidia-smi dmon -s u -c 5(5 samples of GPU utilization)ssh_execute— runsfree -hto check RAMssh_execute— runsdf -h /to check disk spacessh_execute— runstail -30 /root/vllm.logto check the serving logs
Instance:inference-prod(1x H100 80GB)The bottleneck is system RAM, not GPU. vLLM is likely swapping to disk during request preprocessing. Two fixes:
- GPU utilization: averaging 15% (not compute-bound)
- GPU memory: 72 GB / 80 GB (healthy)
- System RAM: 58 GB / 64 GB (92% — swapping likely)
- Disk: 94% full — only 12 GB free
- Clear disk cache:
rm -rf /root/.cache/pip /tmp/*- Restart vLLM with
--max-num-seqs 64(down from default 256) to reduce CPU-side memory
Tools used in this workflow
| Tool | Purpose |
|---|---|
list_instances | Find the problematic instance |
get_instance | Check instance specs and configuration |
ssh_execute | Run nvidia-smi, read logs, kill processes, check system resources |