Skip to main content

Documentation Index

Fetch the complete documentation index at: https://runcrate.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

GPU compute is billed by the hour. Every idle instance is wasted money.

Monitor your spend

runcrate billing balance     # Current credit balance
runcrate billing usage       # Per-instance spending breakdown
runcrate ps                  # List running instances (all billing right now)

Pick the right GPU

WorkloadRecommended GPUHourly cost
Inference (7B-8B models)RTX 4090~$0.35/hr
Inference (70B models)A100 80 GB~$1.60/hr
Fine-tuning (7B QLoRA)RTX 4090~$0.35/hr
Fine-tuning (70B QLoRA)A100 80 GB~$1.60/hr
Training (custom models)H100~$2.50/hr
runcrate instances types     # Browse GPUs and pricing

Delete instances when done

runcrate instances delete <name>
runcrate ps                  # Verify nothing is left running

Use volumes to avoid re-setup costs

Re-downloading models wastes 10-30 minutes of GPU time per session:
runcrate volumes create --name workspace --size 100
runcrate instances create --name dev --gpu RTX4090 --template ubuntu-devbox --storage workspace
Models and packages at /workspace/ persist across deploys.

Right-size your instance

runcrate ssh <instance> -- nvidia-smi
nvidia-smi readingAction
GPU-Util: 90%+, Memory: 80%+Correctly sized
GPU-Util: 90%+, Memory: 40%Consider a GPU with less VRAM
GPU-Util: 20%, Memory: 20%Overpaying — use a smaller GPU

Batch your work

Deploy, process, tear down — pay only for the minutes your job runs:
runcrate instances create --name batch --gpu A100 --template ubuntu-inference
runcrate cp ./inputs/ batch:/workspace/inputs/
runcrate ssh batch -- "cd /workspace && python process.py"
runcrate cp batch:/workspace/outputs/ ./outputs/
runcrate instances delete batch

Use the Models API for light workloads

For inference under ~1,000 requests/day, the Models API is cheaper than a dedicated GPU:
curl https://api.runcrate.ai/v1/chat/completions \
  -H "Authorization: Bearer rc_live_YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Llama-3.1-8B-Instruct",
    "messages": [{"role": "user", "content": "Hello."}],
    "max_tokens": 128
  }'

Quick checklist

  • Run runcrate ps daily — kill anything not in use.
  • Run runcrate billing usage weekly — spot unexpected charges early.
  • Use volumes for models and data — avoid re-downloads.
  • Match GPU to workload — check nvidia-smi utilization.
  • Delete instances immediately after batch jobs complete.