Skip to main content

Documentation Index

Fetch the complete documentation index at: https://runcrate.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

Before deploying, ask your AI agent to compare GPU options. It can pull live pricing, filter by VRAM requirements, and recommend the best option for your workload.

”Compare A100, H100, and RTX 4090 pricing. I need at least 40GB VRAM.”

The agent queries available GPUs:
  1. list_gpu_types — fetches all available GPU types with pricing and specs
GPUs with 40GB+ VRAM:
GPUVRAMPrice/hrBest for
RTX A600048 GB$0.60/hrDevelopment, small model fine-tuning
A100 40 GB40 GB$1.20/hrTraining, mid-size inference
A100 80 GB80 GB$1.60/hrLarge model training, 70B inference
H100 80 GB80 GB$2.50/hrMaximum throughput, 70B+ models
For your use case:
  • If you’re fine-tuning a 7B model, the A6000 ($0.60/hr) has enough VRAM and saves 75% vs H100.
  • If you’re serving a 70B model, you need the A100 80GB minimum. The H100 gives ~40% more throughput for 56% more cost.
  • The RTX 4090 (24 GB) doesn’t meet your 40GB requirement.

”What’s the cheapest GPU that can run Llama 3.1 70B?”

The agent calculates VRAM requirements and matches:
  1. list_gpu_types — gets all GPU options
Llama 3.1 70B needs ~70 GB VRAM at BF16, or ~35 GB at 8-bit quantization. Cheapest single-GPU options:
OptionGPUVRAMQuantizationPrice/hr
BudgetA100 80 GB80 GBBF16 (full precision)$1.60/hr
PremiumH100 80 GB80 GBBF16 (full precision)$2.50/hr
Multi-GPU option (lower per-GPU cost):
OptionGPUsVRAM totalPrice/hr
Split2x A100 40 GB80 GB$2.40/hr
Recommendation: Single A100 80GB at 1.60/hr.Itsthecheapestpathto70Bservingwithfullprecision.The2xA10040GBsetupcostsmore(1.60/hr. It's the cheapest path to 70B serving with full precision. The 2x A100 40GB setup costs more (2.40/hr) and adds tensor-parallel overhead.

”Show me everything available under $1/hr for quick experiments.”

The agent filters by price:
  1. list_gpu_types — gets all GPU types, filters by hourly rate
GPUs under $1/hr:
GPUVRAMPrice/hrGood for
RTX 409024 GB$0.35/hrDev work, 7-8B models, fine-tuning
RTX A600048 GB$0.60/hrLarger dev work, 13B models
RTX 309024 GB$0.25/hrBudget experiments, prototyping
Best value for quick experiments: RTX 4090 at $0.35/hr. It has the best price-to-performance ratio for single-GPU workloads — fast enough for real training, cheap enough to leave running during iteration. A 4-hour experiment on RTX 4090 costs $1.40 total.

Tools used in this workflow

ToolPurpose
list_gpu_typesFetch all available GPUs with pricing, VRAM, and region info