Troubleshooting
Solutions for common issues when deploying, connecting to, and using GPU instances.Deployment Issues
Instance stuck in 'Creating' state
Instance stuck in 'Creating' state
If your instance has been in the
creating state for more than 5 minutes:- Wait a few more minutes. Some GPU configurations take longer to provision, especially multi-GPU setups.
- Try terminating and redeploying. Terminate the stuck instance and create a new one.
- Try a different region. The GPU you selected may have limited availability in the chosen region.
- Contact support if the issue persists across multiple attempts.
Deployment fails immediately
Deployment fails immediately
If your instance goes to a failed state right after deploying:
- Insufficient credits. Check your balance on the Billing page. You need enough credits for at least one hour of usage.
- Invalid configuration. Ensure your CPU, memory, and disk settings are within the allowed ranges for the selected GPU.
- Try redeploying with a different configuration or region.
No GPUs available
No GPUs available
GPU availability fluctuates based on demand.
- Try a different region. The same GPU may be available in another region.
- Try a different GPU type. Similar-tier GPUs may have availability (e.g., A100 40GB instead of A100 80GB).
- Try again later. GPUs free up as other users terminate their instances.
- Join our Discord for availability updates.
Connection Issues
SSH connection refused
SSH connection refused
- Check the instance status. It must be
deployedto accept SSH connections. - Verify the port. Some instances use a non-standard SSH port. Check the instance detail page for the correct port.
- Wait for startup. If the instance was just deployed, networking may need 1-2 minutes to initialize.
- Check the IP address. Make sure you are using the IP shown on the instance detail page.
Permission denied (publickey)
Permission denied (publickey)
- Correct key? Make sure you are specifying the private key that matches the public key you attached during deployment:
ssh -i ~/.ssh/your_key ... - Key permissions. Your private key file must have restricted permissions:
- Correct user. Always connect as
root:ssh root@<ip> ... - Key not attached. If you did not select this key during deployment, there is no way to add it after the fact. You will need to redeploy with the correct key.
Connection timed out
Connection timed out
- Check your network. Make sure you can reach the internet and that your firewall or VPN is not blocking outbound SSH.
- Verify IP and port. Confirm the values on the instance detail page.
- Instance may be unhealthy. If the instance was previously working, it may have encountered an issue. Try terminating and redeploying.
GPU Issues
nvidia-smi not found
nvidia-smi not found
- This should not happen with the default OS images. If you see this:
- Check if the NVIDIA driver is loaded:
lsmod | grep nvidia - Try restarting the driver:
sudo nvidia-smior check the path:which nvidia-smi - If using a custom image, you may need to install the NVIDIA driver manually.
- If the issue persists, terminate and redeploy with a standard image (Ubuntu + CUDA).
- Check if the NVIDIA driver is loaded:
CUDA out of memory
CUDA out of memory
- Reduce batch size. This is the most common fix.
- Use gradient checkpointing to trade compute for memory.
- Use mixed precision (
fp16orbf16) to halve memory usage. - Check for memory leaks. Run
nvidia-smito see if another process is using GPU memory. - Upgrade your GPU. If you consistently run out of memory, deploy an instance with more VRAM (e.g., A100 80GB or H100).
PyTorch cannot see GPU
PyTorch cannot see GPU
- Check CUDA installation:
nvcc --version - Check driver:
nvidia-smi - PyTorch CUDA mismatch. Your PyTorch version may be compiled for a different CUDA version. Reinstall with the correct version:
- Use the PyTorch image. The pre-built PyTorch image has a compatible CUDA and PyTorch setup out of the box.
Billing Issues
Instance terminated unexpectedly
Instance terminated unexpectedly
Instances are automatically terminated when your credit balance reaches zero.
- Check your balance on the Billing page.
- Enable auto-recharge to automatically add credits when your balance drops below a threshold.
- If you believe the termination was an error, contact support with your instance ID.
Still being charged after termination
Still being charged after termination
Billing stops as soon as the instance reaches the
terminated state.- Check the timeline. Charges are billed per hour. If you terminated mid-hour, you are billed for the full hour.
- Verify the instance is terminated. Go to the Instances page and confirm the status shows
terminated. - If your balance decreased after termination, check if you have other active instances.
- Contact support if the charges do not match your usage.
Still Need Help?
Discord Community
Get help from the team and community in real time.
Email Support
Reach us at support@runcrate.ai for account and billing issues.