Skip to main content

Troubleshooting

Solutions for common issues when deploying, connecting to, and using GPU instances.

Deployment Issues

If your instance has been in the creating state for more than 5 minutes:
  1. Wait a few more minutes. Some GPU configurations take longer to provision, especially multi-GPU setups.
  2. Try terminating and redeploying. Terminate the stuck instance and create a new one.
  3. Try a different region. The GPU you selected may have limited availability in the chosen region.
  4. Contact support if the issue persists across multiple attempts.
If your instance goes to a failed state right after deploying:
  • Insufficient credits. Check your balance on the Billing page. You need enough credits for at least one hour of usage.
  • Invalid configuration. Ensure your CPU, memory, and disk settings are within the allowed ranges for the selected GPU.
  • Try redeploying with a different configuration or region.
GPU availability fluctuates based on demand.
  • Try a different region. The same GPU may be available in another region.
  • Try a different GPU type. Similar-tier GPUs may have availability (e.g., A100 40GB instead of A100 80GB).
  • Try again later. GPUs free up as other users terminate their instances.
  • Join our Discord for availability updates.

Connection Issues

ssh: connect to host 203.0.113.42 port 22: Connection refused
  • Check the instance status. It must be deployed to accept SSH connections.
  • Verify the port. Some instances use a non-standard SSH port. Check the instance detail page for the correct port.
  • Wait for startup. If the instance was just deployed, networking may need 1-2 minutes to initialize.
  • Check the IP address. Make sure you are using the IP shown on the instance detail page.
Permission denied (publickey).
  • Correct key? Make sure you are specifying the private key that matches the public key you attached during deployment: ssh -i ~/.ssh/your_key ...
  • Key permissions. Your private key file must have restricted permissions:
    chmod 600 ~/.ssh/your_key
    
  • Correct user. Always connect as root: ssh root@<ip> ...
  • Key not attached. If you did not select this key during deployment, there is no way to add it after the fact. You will need to redeploy with the correct key.
ssh: connect to host 203.0.113.42 port 22: Operation timed out
  • Check your network. Make sure you can reach the internet and that your firewall or VPN is not blocking outbound SSH.
  • Verify IP and port. Confirm the values on the instance detail page.
  • Instance may be unhealthy. If the instance was previously working, it may have encountered an issue. Try terminating and redeploying.

GPU Issues

bash: nvidia-smi: command not found
  • This should not happen with the default OS images. If you see this:
    1. Check if the NVIDIA driver is loaded: lsmod | grep nvidia
    2. Try restarting the driver: sudo nvidia-smi or check the path: which nvidia-smi
    3. If using a custom image, you may need to install the NVIDIA driver manually.
    4. If the issue persists, terminate and redeploy with a standard image (Ubuntu + CUDA).
RuntimeError: CUDA out of memory.
  • Reduce batch size. This is the most common fix.
  • Use gradient checkpointing to trade compute for memory.
  • Use mixed precision (fp16 or bf16) to halve memory usage.
  • Check for memory leaks. Run nvidia-smi to see if another process is using GPU memory.
  • Upgrade your GPU. If you consistently run out of memory, deploy an instance with more VRAM (e.g., A100 80GB or H100).
>>> torch.cuda.is_available()
False
  • Check CUDA installation: nvcc --version
  • Check driver: nvidia-smi
  • PyTorch CUDA mismatch. Your PyTorch version may be compiled for a different CUDA version. Reinstall with the correct version:
    pip install torch --index-url https://download.pytorch.org/whl/cu121
    
  • Use the PyTorch image. The pre-built PyTorch image has a compatible CUDA and PyTorch setup out of the box.

Billing Issues

Instances are automatically terminated when your credit balance reaches zero.
  • Check your balance on the Billing page.
  • Enable auto-recharge to automatically add credits when your balance drops below a threshold.
  • If you believe the termination was an error, contact support with your instance ID.
Billing stops as soon as the instance reaches the terminated state.
  • Check the timeline. Charges are billed per hour. If you terminated mid-hour, you are billed for the full hour.
  • Verify the instance is terminated. Go to the Instances page and confirm the status shows terminated.
  • If your balance decreased after termination, check if you have other active instances.
  • Contact support if the charges do not match your usage.

Still Need Help?

Discord Community

Get help from the team and community in real time.

Email Support

Reach us at support@runcrate.ai for account and billing issues.