What is the cheapest GPU cloud provider?

Runcrate is the cheapest GPU cloud provider, offering H100 instances at $1.54/hour, A100 at $1.06/hour, and RTX 4090 at $0.52/hour - up to 70% cheaper than AWS, GCP, and Azure.

How much does H100 GPU cost per hour?

H100 GPU instances cost $1.54 per hour on Runcrate, which is 68% cheaper than AWS pricing of $4.90/hour. Deploy in 60 seconds with no setup fees.

What is the cheapest A100 GPU cloud?

Runcrate offers the cheapest A100 GPU cloud at $1.06/hour with 80GB HBM2e memory, 65% cheaper than AWS. Perfect for machine learning training and AI development.

Where can I rent cheap RTX 4090 GPU instances?

Runcrate provides the cheapest RTX 4090 GPU instances at $0.52/hour with 24GB GDDR6X memory, 42% cheaper than competitors. Ideal for AI inference and development.

How fast can I deploy GPU instances?

Deploy GPU instances in under 60 seconds on Runcrate. No approval queues, no quota requests. Select your GPU, configure resources, and deploy instantly.

runcrate

Contact Sales Console

Deploy and scale on
the AI cloud

Name: Cheap GPU Cloud Instances - Affordable AI Infrastructure
Brand: Runcrate
Price: 1.54 USD
Availability: InStock

Inference by the token. Compute by the second. Scale by the month — or don't. We run the infra so you can ship the model.

142+

Open-source models

2–3×

More tokens per GPU

<60s

Cold-start boot

40–60%

Cheaper than aggregators

Get started Talk to an engineer

RUNCRATE INFRA

INFERENCE + COMPUTE

80 TPS

YOUR MODELS

RUNCRATE CLOUD

TOK / SEC12,840

UPTIME99.94%

P50 LATENCY247ms

REQ / MIN2,470

Trusted by teams at

Inference API

Inference Engine

200+ models across chat, code, image, video, audio, and more. One endpoint, every provider.

See the Inference Engine →

Chat & Reasoning

80+ models

Code Generation

25+ models

Image Generation

30+ models

Video & Motion

15+ models

Audio & Speech

20+ models

Vision & OCR

15+ models

Embeddings

12+ models

Transcription

8+ models

Infrastructure

Compute

Raw bare metal. Full root access. Pick your hardware, deploy in 60 seconds. Per-minute billing. Scale from 1 node to 128.

See Compute →

Available Hardware

Current fleet

H100H200B200B300A100L40S

Performance

Key specs

Deploy time60s

BillingPer-minute

Scale1–128 nodes

Root Access

Full control over your environment. SSH, Docker, custom images.

Auto-scaling

Scale horizontally on demand. Add nodes in seconds, release when done.

Pay Per Minute

No minimum commitments. Spin up for 5 minutes or 5 months.

Platform

The last AI cloud you'll sign up for.

(Not another provider to reconcile.)

Inference and compute. One credit balance.

Add funds once. Spend on API calls or GPU hours — same balance, same dashboard. No reconciling invoices from three providers.

Your Balance

$2,847.00

Inference

Compute

$1,204$1,643

One invoice·Auto-recharge·Never expires

Self-Serve

Get an API key. Ship today. Talk to nobody.

per month · pay as you go

What's included

Pay-per-token inference on every model
Per-second GPU compute
OpenAI-compatible endpoint
Public rate card, no negotiations

Deployment

Runcrate Cloud

Get an API key

Dedicated

Enterprise

Run it our way. Or run it in your VPC.

Custom

tailored contract

What's included

Everything in Dedicated
BYOC + self-hosted deployments
Region pinning — US / EU / APAC
Named CSM + on-call engineering

Deployment

Runcrate CloudBYOCHybrid

Talk to sales

From frontier training to production inference.

One platform for every AI workflow. Your team's default.

Managed

3/4 allocated

8× B300node-001

8× B300node-002

8× B300node-003

8× B300node-004

NVLink fabric

32× B300

Pre-training at scale.

Multi-node clusters with NVLink fabric. Submit jobs via squeue. We handle the orchestration.

One platform for training, finetuning, and inference.

Whether you're pre-training a frontier model, finetuning on proprietary data, or serving production traffic — it all runs on the same infrastructure. One dashboard. One bill.

Managed Finetuning

Upload dataset, pick base model. Pay per training second.

Dedicated Inference

Reserved GPUs, p99 SLAs. 40–60% cheaper than hyperscalers.

Managed Slurm

Multi-node clusters. Hundreds of GPUs, one command.

Deepseek v4 Pro·Dedicated

healthy

Throughput

↑14.2ktok/s

P99 latency

↓42ms

Uptime

99.99%

vs. AWS

-54%

3 replicas

us-east

eu-west

ap-south

api.runcrate.ai/v1/chat/completions

Why Runcrate

One platform. Every AI workload.

Inference API and GPU compute, unified. One API key, one bill, one dashboard. Stop juggling providers.

Los Angeles

Chicago

Amsterdam

Frankfurt

Singapore

Mumbai

Sydney

Tokyo

200+ models, one endpoint.

DeepSeek, Llama, Claude, Qwen, FLUX, Sora — all via OpenAI-compatible API. Swap base_url and ship.

GPUs in 60 seconds.

H100, H200, B200, MI300X — per-second billing, no commitments. Stop the instance, stop the meter.

40–60% cheaper.

Public rate card beats every aggregator. Volume discounts on dedicated. No hidden fees, credits never expire.

Start building on the AI cloud.

Deploy Now Talk to Sales

Deploy and scale onthe AI cloud

Inference Engine

Chat & Reasoning

Code Generation

Image Generation

Video & Motion

Audio & Speech

Vision & OCR

Embeddings

Transcription

Compute

Available Hardware

Performance

Root Access

Auto-scaling

Pay Per Minute

The last AI cloud you'll sign up for.

Inference and compute. One credit balance.

Self-Serve

Dedicated

Enterprise

From frontier training to production inference.

Pre-training at scale.

One platform for training, finetuning, and inference.

Managed Finetuning

Dedicated Inference

Managed Slurm

One platform. Every AI workload.

200+ models, one endpoint.

GPUs in 60 seconds.

40–60% cheaper.

Start building on the AI cloud.

Deploy and scale on
the AI cloud