Skip to main content

Documentation Index

Fetch the complete documentation index at: https://runcrate.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

Run DeepSeek V3 (685B MoE) or DeepSeek R1 on your own GPU. DeepSeek models use mixture-of-experts — only ~37B parameters are active per token, so they fit on fewer GPUs than the total parameter count suggests.

GPU requirements

ModelGPUVRAM neededApprox. cost
DeepSeek R1 Distill 8BRTX 4090 (24 GB)~16 GB~$0.35/hr
DeepSeek R1 Distill 70BA100 80 GB~70 GB~$1.60/hr
DeepSeek V3 / R1 full (FP8)4x H100 80 GB~50 GB each~$10.00/hr

Deploy DeepSeek R1 Distill 8B (RTX 4090)

runcrate instances create --name deepseek-8b --gpu RTX4090
runcrate instances status deepseek-8b

runcrate ssh deepseek-8b -- "pip install vllm"

runcrate ssh deepseek-8b -- "nohup python -m vllm.entrypoints.openai.api_server \
  --model deepseek-ai/DeepSeek-R1-Distill-Llama-8B \
  --max-model-len 8192 \
  --port 8000 --host 0.0.0.0 \
  > /root/vllm.log 2>&1 &"

Deploy DeepSeek R1 Distill 70B (A100)

runcrate instances create --name deepseek-70b --gpu A100
runcrate ssh deepseek-70b -- "pip install vllm"

runcrate ssh deepseek-70b -- "nohup python -m vllm.entrypoints.openai.api_server \
  --model deepseek-ai/DeepSeek-R1-Distill-Llama-70B \
  --max-model-len 8192 \
  --port 8000 --host 0.0.0.0 \
  > /root/vllm.log 2>&1 &"

Deploy DeepSeek V3 full (4x H100)

runcrate instances create --name deepseek-v3 --gpu H100 --gpu-count 4
runcrate ssh deepseek-v3 -- "pip install vllm"

runcrate ssh deepseek-v3 -- "nohup python -m vllm.entrypoints.openai.api_server \
  --model deepseek-ai/DeepSeek-V3 \
  --tensor-parallel-size 4 \
  --max-model-len 16384 \
  --trust-remote-code \
  --port 8000 --host 0.0.0.0 \
  > /root/vllm.log 2>&1 &"

Test the endpoint

runcrate instances info deepseek-8b

curl http://<INSTANCE_IP>:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
    "messages": [{"role": "user", "content": "Explain mixture-of-experts in two sentences."}],
    "max_tokens": 256
  }'

Monitoring

runcrate ssh deepseek-8b -- nvidia-smi
runcrate ssh deepseek-8b -- "tail -50 /root/vllm.log"

Cleanup

runcrate instances delete deepseek-8b
runcrate instances delete deepseek-70b
runcrate instances delete deepseek-v3