GPU Rental FAQ - Common Questions Answered
Everything you need to know about renting GPUs for AI, machine learning, and LLM training.
What's the cheapest GPU for LLM training?
For budget LLM training, the RTX 3090 (24GB) offers the best value at around $0.08-0.29/hr on cloud providers like Vast.ai and RunPod.
For larger models (70B+), you'll need 80GB VRAM — the A100 80GB is your cheapest option at $1.09-1.49/hr (Crusoe, Hyperstack).
Pro tip: Use spot instances on Vast.ai for 50-80% savings if your training can handle interruptions.
H100 vs A100: which should I rent?
TL;DR: H100 is 2-3x faster but costs 2-3x more. Choose based on your use case:
Choose H100 if:
- Training large models (70B+) and time is critical
- Using FP8/INT8 quantization (H100's Transformer Engine shines here)
- Budget allows $2.50-4.00/hr
Choose A100 if:
- Running inference or fine-tuning smaller models (7B-13B)
- Budget-conscious ($1-2/hr sweet spot)
- Don't need cutting-edge speed
Cost example: Training a 7B model for 24 hours:
• A100 80GB: $1.29/hr × 24h =
$31
• H100 PCIe: $2.49/hr × 12h (2x faster) =
$30
→ Similar total cost, but H100 finishes in half the time.
How much does it cost to fine-tune a 7B model?
Fine-tuning a 7B model (like Llama 3.1 7B) typically costs
$10-50 depending on dataset size and GPU choice:
Quick fine-tune (10K samples, 3 epochs):
• RTX 3090 (24GB): ~6 hours × $0.29/hr =
$1.74
• A100 40GB: ~3 hours × $1.28/hr =
$3.84
• H100 PCIe: ~1.5 hours × $2.49/hr =
$3.74
Full fine-tune (100K samples, 5 epochs):
• A100 80GB: ~24 hours × $1.29/hr =
$31
• H100 SXM: ~12 hours × $2.95/hr =
$35
Pro tips:
- Use LoRA/QLoRA to reduce VRAM needs (can train 7B on 16GB GPUs)
- Vast.ai spot instances can cut costs by 50-80%
- For experimentation, RTX 3090 is unbeatable value
Can I run 70B models on consumer GPUs?
Yes, but you'll need creative solutions:
Inference (running the model):
• Quantization: 4-bit quantized 70B fits in ~40GB VRAM
• Best GPUs: RTX A6000 (48GB), A40 (48GB), or A100 40GB
• Cost: $0.69-1.28/hr on cloud providers
Training/Fine-tuning:
• Requires 80GB+ VRAM (A100 80GB or H100)
• Alternative: Use QLoRA on 2x RTX 3090s (48GB total)
• Cost: $1.29-4.25/hr depending on provider
Local option: Buy 2x RTX 3090 (used ~$800 each) and use model parallelism. Total cost: ~$1,600 one-time vs $0.58/hr ($425/mo if running 24/7).
Which cloud GPU provider is the cheapest?
It depends on the GPU, but here are the winners:
Budget GPUs (RTX 3090, RTX 4090):
• Vast.ai: $0.08-0.18/hr (cheapest, but variable availability)
• FluidStack/TensorDock: $0.29-0.44/hr (more reliable)
Mid-range (A100 40GB, A6000):
• Vast.ai: $0.12/hr A100 40GB (spot)
• Jarvis Labs: $0.69/hr A6000 (India datacenter)
• TensorDock: $0.69/hr A6000
High-end (H100, A100 80GB):
• Crusoe: $1.09/hr A100 80GB, $2.49/hr H100
• Lambda: $1.29/hr A100 80GB, $2.49/hr H100
• AWS/GCP/Azure: 2-3x more expensive but better reliability
Pro tip: Use aggregators like Vast.ai for best prices, but expect occasional downtime. For production, stick with RunPod, Lambda, or hyperscalers.
How much VRAM do I need for AI?
Quick reference guide:
12-16GB (RTX 3060, RTX 4060 Ti):
• Stable Diffusion, SDXL
• Inference for 7B models (quantized)
• Fine-tuning small models (<1B) with LoRA
24GB (RTX 3090, RTX 4090, A5000):
• 7B-13B model inference (full precision)
• Fine-tuning 7B models with QLoRA
• Multi-modal models (image + text)
40-48GB (A6000, A100 40GB, RTX 6000 Ada):
• 30B-70B model inference (quantized)
• Fine-tuning 13B models
• Large batch training for smaller models
80GB (A100 80GB, H100):
• 70B model inference (full precision)
• Fine-tuning 70B models
• Pre-training models from scratch
Rule of thumb: Model size in billions × 2 bytes = GB VRAM needed (FP16). Quantization can cut this by 2-4x.