Documentation Index
Fetch the complete documentation index at: https://runcrate.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Run a one-off experiment without any setup overhead. Tell your AI agent what you want to test, and it handles deploy, execute, collect, and cleanup in a single conversation.
The agent handles everything end-to-end:
create_instance — deploys bench-run with RTX 4090
instance_status — waits until running
ssh_execute — pip install torch jax[cuda12]
file_upload — uploads the benchmark script
ssh_execute — python /root/benchmark.py
delete_instance — terminates immediately
Benchmark results (RTX 4090, 4096x4096 matmul, 1000 iterations):
| Framework | Mean time | Std dev | TFLOPS |
|---|
| PyTorch 2.4 | 2.31 ms | 0.04 ms | 59.4 |
| JAX 0.4.31 | 2.28 ms | 0.03 ms | 60.2 |
Both frameworks perform nearly identically on this workload. JAX is marginally faster (~1.3%) due to XLA compilation.
**Total cost: 0.12∗∗(20minutesonRTX4090at0.35/hr).
Instance terminated. Nothing left running.
”Test if my fine-tuned model loads correctly. Deploy a GPU, load it, run 5 prompts, and shut down.”
The agent runs the validation:
create_instance — deploys model-test with RTX 4090
instance_status — waits until running
ssh_execute — pip install torch transformers
file_upload — uploads test_model.py with 5 test prompts
ssh_execute — python /root/test_model.py
billing_usage — checks cost
delete_instance — terminates
Model validation results:
| Prompt | Output | Quality |
|---|
| ”Summarize this paper…“ | 3-sentence summary, accurate | pass |
| ”Translate to French…” | Correct translation | pass |
| ”Extract entities from…” | Found 4/4 entities | pass |
| ”Generate SQL for…” | Valid query, correct joins | pass |
| ”Classify sentiment…” | Correct label, 0.94 confidence | pass |
All 5 test prompts passed. Model loads in 8.2 seconds, inference at ~45 tok/s.
Cost: $0.06 (10 minutes). Instance terminated.
”I want to try 3 different LoRA ranks and see which gives the best eval loss. Quick and dirty.”
The agent runs a fast comparison:
create_instance — deploys lora-test with A100 80GB
instance_status — waits until running
ssh_execute — installs dependencies
file_upload — uploads training script and small eval dataset
ssh_execute — runs with --lora-rank 8 (500 steps)
ssh_execute — runs with --lora-rank 16 (500 steps)
ssh_execute — runs with --lora-rank 32 (500 steps)
delete_instance — terminates
LoRA rank comparison (500 steps each, Llama 8B, 1K sample dataset):
| Rank | Trainable params | Eval loss | Train time |
|---|
| 8 | 3.4M | 1.842 | 4 min |
| 16 | 6.8M | 1.687 | 5 min |
| 32 | 13.6M | 1.691 | 7 min |
Rank 16 wins. It achieves the lowest eval loss — rank 32 doesn’t improve further and trains 40% slower. The extra parameters at rank 32 are likely overfitting on this small dataset.
**Total cost: 0.43∗∗(16minutesonA100at1.60/hr). Instance terminated.
| Tool | Purpose |
|---|
create_instance / instance_status | Provision and wait for GPU |
ssh_execute | Install packages, run experiments |
file_upload | Transfer experiment scripts |
delete_instance | Tear down immediately after results |