Official release of SeedVR2 for ComfyUI that enables high-quality video and image upscaling.
Can run as Multi-GPU standalone CLI too, see ๐ฅ๏ธ Run as Standalone section.


We're actively working on improvements and new features. To stay informed:
2025.11.09 - Version 2.5.5
2025.11.08 - Version 2.5.4
.view() with .reshape() to handle non-contiguous tensors after spatial padding, resolving "view size is not compatible with input tensor's size and stride" error2025.11.08 - Version 2.5.3
"mps" instead of "mps:0", resolving invalid device errors on M-series Macstorch.mps.is_available() to handle PyTorch versions where the method doesn't exist on non-Mac platforms2025.11.07 - Version 2.5.0 ๐
โ ๏ธ BREAKING CHANGE: This is a major update requiring workflow recreation. All nodes and CLI parameters have been redesigned for better usability and consistency. Watch the latest video from AInVFX for a deep dive and check out the usage section.
๐ฆ Official Release: Now available on main branch with ComfyUI Manager support for easy installation and automatic version tracking. Updated dependencies and local imports prevent conflicts with other ComfyUI custom nodes.
uniform_batch_size, temporal_overlap, prepend_frames, and max_resolution for better control2025.08.07
enable_debug now available on main nodecache_model moved to main node, fixed memory leaks with proper RoPE/wrapper cleanupconstants.py, model_registry.py, debug.py), removed legacy codetorch.cuda.ipc_collect(), improved RoPE handling2025.07.17
2025.07.11
2025.09.07
2025.07.03
2025.06.30
2025.06.24
2025.06.22
2025.06.20
With the current optimizations (tiling, BlockSwap, GGUF quantization), SeedVR2 can run on a wide range of hardware:
Registry Link: ComfyUI Registry - SeedVR2 Video Upscaler
cd ComfyUI
git clone https://github.com/numz/ComfyUI-SeedVR2_VideoUpscaler.git custom_nodes/seedvr2_videoupscaler
# Install requirements (from same ComfyUI directory)
# Windows:
.venv\Scripts\python.exe -m pip install -r custom_nodes\seedvr2_videoupscaler\requirements.txt
# Linux/macOS:
.venv/bin/python -m pip install -r custom_nodes/seedvr2_videoupscaler/requirements.txt
Models will be automatically downloaded on first use and saved to ComfyUI/models/SEEDVR2.
You can also manually download models from:
Complete walkthrough of version 2.5 by Adrien from AInVFX, covering the new 4-node architecture, GGUF support, memory optimizations, and production workflows:
This comprehensive tutorial covers:
For reference, here's the original tutorial covering the initial release:
Note: This tutorial covers the previous single-node architecture. While the UI has changed significantly in v2.5, the core concepts about BlockSwap and memory management remain valuable.
SeedVR2 uses a modular node architecture with four specialized nodes:

Configure the DiT (Diffusion Transformer) model for video upscaling.
Parameters:
model: Choose your DiT model
seedvr2_ema_3b_fp16.safetensors: FP16 (best quality)seedvr2_ema_3b_fp8_e4m3fn.safetensors: FP8 8-bit (good quality)seedvr2_ema_3b-Q4_K_M.gguf: GGUF 4-bit quantized (acceptable quality)seedvr2_ema_3b-Q8_0.gguf: GGUF 8-bit quantized (good quality)seedvr2_ema_7b_fp16.safetensors: FP16 (best quality)seedvr2_ema_7b_fp8_e4m3fn_mixed_block35_fp16.safetensors: FP8 with last block in FP16 to reduce artifacts (good quality)seedvr2_ema_7b-Q4_K_M.gguf: GGUF 4-bit quantized (acceptable quality)seedvr2_ema_7b_sharp_*: Sharp variants for enhanced detaildevice: GPU device for DiT inference (e.g., cuda:0)
offload_device: Device to offload DiT model when not actively processing
none: Keep model on inference device (fastest, highest VRAM)cpu: Offload to system RAM (reduces VRAM)cuda:X: Offload to another GPU (good balance if available)cache_model: Keep DiT model loaded on offload_device between workflow runs
blocks_to_swap: BlockSwap memory optimization
0: Disabled (default)1-32: Number of transformer blocks to swap for 3B model1-36: Number of transformer blocks to swap for 7B modelswap_io_components: Offload input/output embeddings and normalization layers
attention_mode: Attention computation backend
sdpa: PyTorch scaled_dot_product_attention (default, stable, always available)flash_attn: Flash Attention 2 (faster on supported hardware, requires flash-attn package)torch_compile_args: Connect to SeedVR2 Torch Compile Settings node for 20-40% speedup
BlockSwap Explained:
BlockSwap enables running large models on GPUs with limited VRAM by dynamically swapping transformer blocks between GPU and CPU memory during inference. Here's how it works:
offload_device to cpu or another GPUblocks_to_swap=16 (half the blocks)swap_io_components for maximum VRAM savingsExample Configuration for Low VRAM (8GB):
seedvr2_ema_3b-Q8_0.ggufcuda:0cpu32True
Configure the VAE (Variational Autoencoder) model for encoding/decoding video frames.
Parameters:
model: VAE model selection
ema_vae_fp16.safetensors: Default and recommendeddevice: GPU device for VAE inference (e.g., cuda:0)
offload_device: Device to offload VAE model when not actively processing
none: Keep model on inference device (default, fastest)cpu: Offload to system RAM (reduces VRAM)cuda:X: Offload to another GPU (good balance if available)cache_model: Keep VAE model loaded on offload_device between workflow runs
encode_tiled: Enable tiled encoding to reduce VRAM usage during encoding phase
encode_tile_size: Encoding tile size in pixels (default: 1024)
encode_tile_overlap: Encoding tile overlap in pixels (default: 128)
decode_tiled: Enable tiled decoding to reduce VRAM usage during decoding phase
decode_tile_size: Decoding tile size in pixels (default: 1024)
decode_tile_overlap: Decoding tile overlap in pixels (default: 128)
torch_compile_args: Connect to SeedVR2 Torch Compile Settings node for 15-25% speedup
VAE Tiling Explained:
VAE tiling processes large resolutions in smaller tiles to reduce VRAM requirements. Here's how to use it:
enable_debug on main node)encode_tiledencode_tile_size (try 768, 512, etc.)decode_tileddecode_tile_sizeExample Configuration for High Resolution (4K):
True1024128True1024128
Configure torch.compile optimization for 20-40% DiT speedup and 15-25% VAE speedup.
Requirements:
Parameters:
backend: Compilation backend
inductor: Full optimization with Triton kernel generation and fusion (recommended)cudagraphs: Lightweight wrapper using CUDA graphs, no kernel optimizationmode: Optimization level (compilation time vs runtime performance)
default: Fast compilation with good speedup (recommended for development)reduce-overhead: Lower overhead, optimized for smaller modelsmax-autotune: Slowest compilation, best runtime performance (recommended for production)max-autotune-no-cudagraphs: Like max-autotune but without CUDA graphsfullgraph: Compile entire model as single graph without breaks
False: Allow graph breaks for better compatibility (default, recommended)True: Enforce no breaks for maximum optimization (may fail with dynamic shapes)dynamic: Handle varying input shapes without recompilation
False: Specialize for exact input shapes (default)True: Create dynamic kernels that adapt to shape variations (enable when processing different resolutions or batch sizes)dynamo_cache_size_limit: Max cached compiled versions per function (default: 64)
dynamo_recompile_limit: Max recompilation attempts before falling back to eager mode (default: 128)
Usage:
torch_compile_args input of DiT and/or VAE loader nodesWhen to use:
Recommended Settings:
mode=default, backend=inductor, fullgraph=Falsemode=max-autotune, backend=inductor, fullgraph=False
Main upscaling node that processes video frames using DiT and VAE models.
Required Inputs:
Parameters:
seed: Random seed for reproducible generation (default: 42)
resolution: Target resolution for shortest edge in pixels (default: 1080)
max_resolution: Maximum resolution for any edge (default: 0 = no limit)
batch_size: Frames per batch (default: 5)
uniform_batch_size (default: False)
Pads the final batch to match batch_size for uniform processing
Prevents temporal artifacts when the last batch is significantly smaller than others
Example: 45 frames with batch_size=33 creates [33, 33] instead of [33, 12]
Recommended when using large batch sizes and video length is not a multiple of batch_size
Increases VRAM usage slightly but ensures consistent temporal coherence across all batches
temporal_overlap: Overlapping frames between batches (default: 0)
prepend_frames: Frames to prepend (default: 0)
color_correction: Color correction method (default: "wavelet")
lab: Full perceptual color matching with detail preservation (recommended for highest fidelity to original)wavelet: Frequency-based natural colors, preserves details wellwavelet_adaptive: Wavelet base + targeted saturation correctionhsv: Hue-conditional saturation matchingadain: Statistical style transfernone: No color correctioninput_noise_scale: Input noise injection scale 0.0-1.0 (default: 0.0)
latent_noise_scale: Latent space noise scale 0.0-1.0 (default: 0.0)
offload_device: Device for storing intermediate tensors between processing phases (default: "cpu")
none: Keep all tensors on inference device (fastest but highest VRAM)cpu: Offload to system RAM (recommended for long videos, slower transfers)cuda:X: Offload to another GPU (good balance if available, faster than CPU)enable_debug: Enable detailed debug logging (default: False)
Output:
Basic Workflow (High VRAM - 24GB+):
Load Video Frames
โ
SeedVR2 Load DiT Model
โโ model: seedvr2_ema_3b_fp16.safetensors
โโ device: cuda:0
โ
SeedVR2 Load VAE Model
โโ model: ema_vae_fp16.safetensors
โโ device: cuda:0
โ
SeedVR2 Video Upscaler
โโ batch_size: 21
โโ resolution: 1080
โ
Save Video/Frames
Low VRAM Workflow (8-12GB):
Load Video Frames
โ
SeedVR2 Load DiT Model
โโ model: seedvr2_ema_3b-Q8_0.gguf
โโ device: cuda:0
โโ offload_device: cpu
โโ blocks_to_swap: 32
โโ swap_io_components: True
โ
SeedVR2 Load VAE Model
โโ model: ema_vae_fp16.safetensors
โโ device: cuda:0
โโ encode_tiled: True
โโ decode_tiled: True
โ
SeedVR2 Video Upscaler
โโ batch_size: 5
โโ resolution: 720
โ
Save Video/Frames
High Performance Workflow (24GB+ with torch.compile):
Load Video Frames
โ
SeedVR2 Torch Compile Settings
โโ mode: max-autotune
โโ backend: inductor
โ
SeedVR2 Load DiT Model
โโ model: seedvr2_ema_7b_sharp_fp16.safetensors
โโ device: cuda:0
โโ torch_compile_args: connected
โ
SeedVR2 Load VAE Model
โโ model: ema_vae_fp16.safetensors
โโ device: cuda:0
โโ torch_compile_args: connected
โ
SeedVR2 Video Upscaler
โโ batch_size: 81
โโ resolution: 1080
โ
Save Video/Frames
The standalone CLI provides powerful batch processing capabilities with multi-GPU support and sophisticated optimization options.
Choose the appropriate setup based on your installation:
If you've already installed SeedVR2 as part of ComfyUI (via ComfyUI installation), you can use the CLI directly:
# Navigate to your ComfyUI directory
cd ComfyUI
# Run the CLI using standalone Python (display help message)
# Windows:
.venv\Scripts\python.exe custom_nodes\seedvr2_videoupscaler\inference_cli.py --help
# Linux/macOS:
.venv/bin/python custom_nodes/seedvr2_videoupscaler/inference_cli.py --help
Skip to Command Line Usage below.
If you want to use the CLI without ComfyUI installation, follow these steps:
# Windows
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
# macOS and Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
git clone https://github.com/numz/ComfyUI-SeedVR2_VideoUpscaler.git seedvr2_videoupscaler
cd seedvr2_videoupscaler
# Create virtual environment with Python 3.13
uv venv --python 3.13
# Activate virtual environment
# Windows:
.venv\Scripts\activate
# Linux/macOS:
source .venv/bin/activate
# Install PyTorch with CUDA support
# Check command line based on your environment: https://pytorch.org/get-started/locally/
uv pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu130
# Install SeedVR2 requirements
uv pip install -r requirements.txt
# Run the CLI (display help message)
# Windows:
.venv\Scripts\python.exe inference_cli.py --help
# Linux/macOS:
.venv/bin/python inference_cli.py --help
The CLI provides comprehensive options for single-GPU, multi-GPU, and batch processing workflows.
Basic Usage Examples:
# Basic image upscaling
python inference_cli.py image.jpg
# Basic video video upscaling with temporal consistency
python inference_cli.py video.mp4 --resolution 720 --batch_size 33
# Multi-GPU processing with temporal overlap
python inference_cli.py video.mp4 \
--cuda_device 0,1 \
--resolution 1080 \
--batch_size 81 \
--uniform_batch_size \
--temporal_overlap 3 \
--prepend_frames 4
# Memory-optimized for low VRAM (8GB)
python inference_cli.py image.png \
--dit_model seedvr2_ema_3b-Q8_0.gguf \
--resolution 1080 \
--blocks_to_swap 32 \
--swap_io_components \
--dit_offload_device cpu \
--vae_offload_device cpu
# High resolution with VAE tiling
python inference_cli.py video.mp4 \
--resolution 1440 \
--batch_size 31 \
--uniform_batch_size \
--temporal_overlap 3 \
--vae_encode_tiled \
--vae_decode_tiled
# Batch directory processing with model caching
python inference_cli.py media_folder/ \
--output processed/ \
--cuda_device 0 \
--cache_dit \
--cache_vae \
--dit_offload_device cpu \
--vae_offload_device cpu \
--resolution 1080 \
--max_resolution 1920
Input/Output:
<input>: Input file (.mp4, .avi, .png, .jpg, etc.) or directory--output: Output path (default: auto-generated in 'output/' directory)--output_format: Output format: 'mp4' (video) or 'png' (image sequence). Default: auto-detect from input type--model_dir: Model directory (default: ./models/SEEDVR2)Model Selection:
--dit_model: DiT model to use. Options: 3B/7B with fp16/fp8/GGUF variants (default: 3B FP8)Processing Parameters:
--resolution: Target short-side resolution in pixels (default: 1080)--max_resolution: Maximum resolution for any edge. Scales down if exceeded. 0 = no limit (default: 0)--batch_size: Frames per batch (must follow 4n+1: 1, 5, 9, 13, 17, 21...). Ideally matches shot length for best temporal consistency (default: 5)--seed: Random seed for reproducibility (default: 42)--skip_first_frames: Skip N initial frames (default: 0)--load_cap: Load maximum N frames from video. 0 = load all (default: 0)--prepend_frames: Prepend N reversed frames to reduce start artifacts (auto-removed) (default: 0)--temporal_overlap: Frames to overlap between batches/GPUs for smooth blending (default: 0)Quality Control:
--color_correction: Color correction method: 'lab' (perceptual, recommended), 'wavelet', 'wavelet_adaptive', 'hsv', 'adain', or 'none' (default: lab)--input_noise_scale: Input noise injection scale (0.0-1.0). Reduces artifacts at high resolutions (default: 0.0)--latent_noise_scale: Latent space noise scale (0.0-1.0). Softens details if needed (default: 0.0)Memory Management:
--dit_offload_device: Device to offload DiT model: 'none' (keep on GPU), 'cpu', or 'cuda:X' (default: none)--vae_offload_device: Device to offload VAE model: 'none', 'cpu', or 'cuda:X' (default: none)--blocks_to_swap: Number of transformer blocks to swap (0=disabled, 3B: 0-32, 7B: 0-36). Requires dit_offload_device (default: 0)--swap_io_components: Offload I/O components for additional VRAM savings. Requires dit_offload_device--use_non_blocking: Use non-blocking memory transfers for BlockSwap (recommended)VAE Tiling:
--vae_encode_tiled: Enable VAE encode tiling to reduce VRAM during encoding--vae_encode_tile_size: VAE encode tile size in pixels (default: 1024)--vae_encode_tile_overlap: VAE encode tile overlap in pixels (default: 128)--vae_decode_tiled: Enable VAE decode tiling to reduce VRAM during decoding--vae_decode_tile_size: VAE decode tile size in pixels (default: 1024)--vae_decode_tile_overlap: VAE decode tile overlap in pixels (default: 128)--tile_debug: Visualize tiles: 'false' (default), 'encode', or 'decode'Performance Optimization:
--attention_mode: Attention backend: 'sdpa' (default, stable) or 'flash_attn' (faster, requires package)--compile_dit: Enable torch.compile for DiT model (20-40% speedup, requires PyTorch 2.0+ and Triton)--compile_vae: Enable torch.compile for VAE model (15-25% speedup, requires PyTorch 2.0+ and Triton)--compile_backend: Compilation backend: 'inductor' (full optimization) or 'cudagraphs' (lightweight) (default: inductor)--compile_mode: Optimization level: 'default', 'reduce-overhead', 'max-autotune', 'max-autotune-no-cudagraphs' (default: default)--compile_fullgraph: Compile entire model as single graph (faster but less flexible) (default: False)--compile_dynamic: Handle varying input shapes without recompilation (default: False)--compile_dynamo_cache_size_limit: Max cached compiled versions per function (default: 64)--compile_dynamo_recompile_limit: Max recompilation attempts before fallback (default: 128)Model Caching (batch processing):
--cache_dit: Cache DiT model between files (single GPU only, speeds up directory processing)--cache_vae: Cache VAE model between files (single GPU only, speeds up directory processing)Multi-GPU:
--cuda_device: CUDA device id(s). Single id (e.g., '0') or comma-separated list '0,1' for multi-GPUDebugging:
--debug: Enable verbose debug loggingThe CLI's multi-GPU mode automatically distributes the workload across multiple GPUs with intelligent temporal overlap handling:
How it works:
--temporal_overlap framesExample for 2 GPUs with temporal_overlap=4:
GPU 0: Frames 0-50 (includes 4 overlap frames at end)
GPU 1: Frames 46-100 (includes 4 overlap frames at beginning)
Result: Frames 0-100 with smooth transition at frame 48
Best practices:
--temporal_overlap to 2-8 frames for smooth blending--prepend_frames to reduce artifacts at video startBatch Size Constraint: The model requires batch_size to follow the 4n+1 formula (1, 5, 9, 13, 17, 21, 25, ...) due to temporal consistency architecture. All frames in a batch are processed together for temporal coherence, then batches can be blended using temporal_overlap. Ideally, set batch_size to match your shot length for optimal quality.
VAE Bottleneck: Even with optimized DiT upscaling (BlockSwap, GGUF, torch.compile), the VAE encoding/decoding stages can be the bottleneck, especially for high resolutions. The VAE is slow. Use large batch_size to mitigate this.
VRAM Usage: While the integration now supports low VRAM systems (8GB or less with proper optimization), VRAM usage varies based on:
Speed: Processing speed depends on:
Contributions are welcome! We value community input and improvements.
For detailed contribution guidelines, see CONTRIBUTING.md.
Quick Start:
git checkout -b feature/AmazingFeature)git commit -m 'Add some AmazingFeature')git push origin feature/AmazingFeature)Get Help:
This ComfyUI implementation is a collaborative project by NumZ and AInVFX (Adrien Toupet), based on the original SeedVR2 by ByteDance Seed Team.
Special thanks to our community contributors including benjaminherb, cmeka, FurkanGozukara, JohnAlcatraz, lihaoyun6, Luchuanzhao, Luke2642, naxci1, q5sys, and many others for their improvements, bug fixes, and testing.
The code in this repository is released under the MIT license as found in the LICENSE file.