Documentation Index
Fetch the complete documentation index at: https://runcrate.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Before deploying, ask your AI agent to compare GPU options. It can pull live pricing, filter by VRAM requirements, and recommend the best option for your workload.
”Compare A100, H100, and RTX 4090 pricing. I need at least 40GB VRAM.”
The agent queries available GPUs:
list_gpu_types — fetches all available GPU types with pricing and specs
GPUs with 40GB+ VRAM:
| GPU | VRAM | Price/hr | Best for |
|---|
| RTX A6000 | 48 GB | $0.60/hr | Development, small model fine-tuning |
| A100 40 GB | 40 GB | $1.20/hr | Training, mid-size inference |
| A100 80 GB | 80 GB | $1.60/hr | Large model training, 70B inference |
| H100 80 GB | 80 GB | $2.50/hr | Maximum throughput, 70B+ models |
For your use case:
- If you’re fine-tuning a 7B model, the A6000 ($0.60/hr) has enough VRAM and saves 75% vs H100.
- If you’re serving a 70B model, you need the A100 80GB minimum. The H100 gives ~40% more throughput for 56% more cost.
- The RTX 4090 (24 GB) doesn’t meet your 40GB requirement.
”What’s the cheapest GPU that can run Llama 3.1 70B?”
The agent calculates VRAM requirements and matches:
list_gpu_types — gets all GPU options
Llama 3.1 70B needs ~70 GB VRAM at BF16, or ~35 GB at 8-bit quantization.
Cheapest single-GPU options:
| Option | GPU | VRAM | Quantization | Price/hr |
|---|
| Budget | A100 80 GB | 80 GB | BF16 (full precision) | $1.60/hr |
| Premium | H100 80 GB | 80 GB | BF16 (full precision) | $2.50/hr |
Multi-GPU option (lower per-GPU cost):
| Option | GPUs | VRAM total | Price/hr |
|---|
| Split | 2x A100 40 GB | 80 GB | $2.40/hr |
Recommendation: Single A100 80GB at 1.60/hr.It′sthecheapestpathto70Bservingwithfullprecision.The2xA10040GBsetupcostsmore(2.40/hr) and adds tensor-parallel overhead.
”Show me everything available under $1/hr for quick experiments.”
The agent filters by price:
list_gpu_types — gets all GPU types, filters by hourly rate
GPUs under $1/hr:
| GPU | VRAM | Price/hr | Good for |
|---|
| RTX 4090 | 24 GB | $0.35/hr | Dev work, 7-8B models, fine-tuning |
| RTX A6000 | 48 GB | $0.60/hr | Larger dev work, 13B models |
| RTX 3090 | 24 GB | $0.25/hr | Budget experiments, prototyping |
Best value for quick experiments: RTX 4090 at $0.35/hr. It has the best price-to-performance ratio for single-GPU workloads — fast enough for real training, cheap enough to leave running during iteration.
A 4-hour experiment on RTX 4090 costs $1.40 total.
| Tool | Purpose |
|---|
list_gpu_types | Fetch all available GPUs with pricing, VRAM, and region info |