Skip to main content

Documentation Index

Fetch the complete documentation index at: https://runcrate.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

Walk through a full training workflow without leaving your editor. Your AI agent handles GPU provisioning, file transfers, training execution, and cleanup.

”Deploy an A100, upload my dataset, and start training.”

The agent orchestrates the entire setup:
  1. list_gpu_types — confirms A100 80GB availability and pricing ($1.60/hr)
  2. create_instance — deploys an instance named fine-tune-run with an A100
  3. instance_status — polls every 10 seconds until the instance is running
  4. file_upload — uploads train.py to /workspace/train.py
  5. file_upload — uploads dataset.jsonl to /workspace/dataset.jsonl
  6. ssh_execute — runs pip install torch transformers datasets accelerate
  7. ssh_execute — runs nohup python /workspace/train.py > /root/train.log 2>&1 &
Deployed fine-tune-run (1x A100 80GB, $1.60/hr). Uploaded train.py and dataset.jsonl. Training started — check progress with “how’s the training going?"

"How’s the training going? Show me GPU usage and the last few log lines.”

The agent checks two things in parallel:
  1. ssh_execute — runs nvidia-smi --query-gpu=utilization.gpu,memory.used,memory.total --format=csv,noheader
  2. ssh_execute — runs tail -20 /root/train.log
GPU utilization: 97% | Memory: 68,420 / 81,920 MiB (83%)
Epoch 2/5 | Step 1240/3100 | Loss: 0.4812 | LR: 2.1e-5
Epoch 2/5 | Step 1260/3100 | Loss: 0.4756 | LR: 2.0e-5
Epoch 2/5 | Step 1280/3100 | Loss: 0.4701 | LR: 2.0e-5
Loss is decreasing steadily. At the current rate, training will finish in ~2.5 hours.

”Training is done. Download the model checkpoint and tear down the instance.”

The agent wraps up the workflow:
  1. ssh_execute — runs ls -lh /workspace/output/ to confirm checkpoint files
  2. file_download — downloads /workspace/output/adapter_model.safetensors
  3. file_download — downloads /workspace/output/adapter_config.json
  4. file_download — downloads /workspace/output/training_args.json
  5. billing_usage — checks total spend for this session
  6. delete_instance — terminates fine-tune-run
Downloaded 3 files (adapter weights, config, training args). Total cost for this training run: $6.40 (4 hours on A100). Instance fine-tune-run terminated. Billing stopped.

Tools used in this workflow

ToolPurpose
list_gpu_typesCheck GPU availability and pricing
create_instanceProvision the training machine
instance_statusWait for deployment to complete
file_uploadTransfer training code and data
ssh_executeInstall dependencies, start training, check logs
file_downloadRetrieve trained model artifacts
billing_usageVerify session cost
delete_instanceClean up and stop billing