Multimodal AI Support
Host advanced Text-to-Video (T2V), Image-to-Video (I2V), and Video Auto-Captioning & Editing (VACE) models with support for 1.3B and 14B parameter sizes.
Enterprise GPU Dedicated Server - RTX A6000
Enterprise GPU Dedicated Server - A100
Enterprise GPU Dedicated Server - RTX 4090
Enterprise GPU Dedicated Server - RTX 5090
Multi-GPU Dedicated Server- 2xRTX 5090
Enterprise GPU Dedicated Server - H100
Enterprise GPU Dedicated Server - RTX PRO 6000
Multi-GPU Dedicated Server - 4xRTX A6000
Wan-AI Hosting is the self-hosted deployment of Wan-AI’s multimodal generative models, including:
These models are developed by Wan-AI and are available in 1.3B and 14B parameter sizes. Hosting them on your own GPU server enables you to run video generation, editing, and captioning pipelines without relying on external APIs or cloud platforms.
Model Name | Size (4-bit Quantization) | Recommended GPUs |
---|---|---|
Wan-AI/Wan2.1-T2V-1.3B | 17.5 GB | RTX4090 < A100-40gb < RTX5090 |
Wan-AI/Wan2.1-VACE-1.3B | 19.05GB | RTX4090 < A100-40gb < RTX5090 |
Wan-AI/Wan2.1-T2V-1.3B-Diffusers | 19.05GB | RTX4090 < A100-40gb < RTX5090 |
Wan-AI/Wan2.1-T2V-14B | 69.06GB | 2*A6000 < A100-80GB < H100 |
Wan-AI/Wan2.1-VACE-14B | 75.16GB | 2*A6000 < A100-80GB < H100 |
Wan-AI/Wan2.1-I2V-14B-720P | 82.25GB | 2*A6000 < 2*A100-80GB < 2*H100 |
Wan-AI/Wan2.1-I2V-14B-480P | 82.25 GB | 2*A6000 < 2*A100-80GB < 2*H100 |
Wan-AI/Wan2.1-VACE-14B-diffusers | 82.25 GB | 2*A6000 < 2*A100-80GB < 2*H100 |
Deployment Method | Pros | Cons | Steps |
---|---|---|---|
Method 1: Diffusers Pipeline via Hugging Face + PyTorch | Full access, customizable, Hugging Face ecosystem | Requires coding and model management knowledge | 1. Set up a GPU server with Python ≥ 3.9 and CUDA toolkit 2. Install transformers, diffusers, accelerate, torch, xformers 3. Load the model via Hugging Face’s from_pretrained() 4. Run generation with Diffusers pipeline (e.g., TextToVideoPipeline) |
Method 2: ComfyUI Integration (For Diffusers Versions) | Visual interface, modular, community-supported | Needs optimization for large models (esp. 14B) | 1. Install ComfyUI on your server 2. Load the Wan2.1-Diffusers versions (1.3B or 14B) 3. Connect nodes like Text Prompt → Model Loader → Video Output |
Method 3: Custom FastAPI or Gradio Web UI | Web-accessible, scriptable, shareable | Needs backend development setup | 1. Wrap the Hugging Face model loading and inference in FastAPI or Gradio 2. Host on the GPU server with nginx + uvicorn 3. Add endpoints for /generate-video, /generate-from-image, etc. |
Method 4: Dockerized Inference Setup | Portable, deployable at scale, good for CI/CD | Slightly heavier setup, slower updates | 1. Create a Dockerfile with preinstalled PyTorch, CUDA, and dependencies 2. Preload Wan-AI model weights into the image or volume 3. Use NVIDIA Docker runtime for GPU access |