

Wan Hosting Service: Self-Host Wan-AI T2V, I2V, and VACE Models (1.3B/14B)

Wan Hosting Service is to deploying and running Wan-AI’s cutting-edge multimodal models—including Wan2.1-T2V (text-to-video), I2V (image-to-video), and VACE (video auto-captioning and editing)—on your own GPU servers. These models are available in both 1.3B and 14B parameter variants, with support for standard PyTorch and Hugging Face Diffusers formats. By self-hosting, you gain full control over generation speed, resolution (e.g., 480p, 720p), prompt privacy, and integration with your custom pipelines.

The Best GPU Plans for Wan-AI Hosting Service

Choose the appropriate GPU model according to the Bark model size.

Enterprise GPU Dedicated Server - RTX A6000

$ 409.00/mo

1mo3mo12mo24mo

Order Now

256GB RAM
GPU: Nvidia Quadro RTX A6000
Dual 18-Core E5-2697v4
240GB SSD + 2TB NVMe + 8TB SATA
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Ampere
CUDA Cores: 10,752
Tensor Cores: 336
GPU Memory: 48GB GDDR6
FP32 Performance: 38.71 TFLOPS

New Year Sale

Enterprise GPU Dedicated Server - A100

$ 399.20/mo

50% OFF Recurring (Was $799.00)

1mo3mo12mo24mo

Order Now

256GB RAM
GPU: Nvidia A100
Dual 18-Core E5-2697v4
240GB SSD + 2TB NVMe + 8TB SATA
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Ampere
CUDA Cores: 6912
Tensor Cores: 432
GPU Memory: 40GB HBM2
FP32 Performance: 19.5 TFLOPS

Enterprise GPU Dedicated Server - RTX 4090

$ 409.00/mo

1mo3mo12mo24mo

Order Now

256GB RAM
GPU: GeForce RTX 4090
Dual 18-Core E5-2697v4
240GB SSD + 2TB NVMe + 8TB SATA
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Ada Lovelace
CUDA Cores: 16,384
Tensor Cores: 512
GPU Memory: 24 GB GDDR6X
FP32 Performance: 82.6 TFLOPS

New Arrival

Enterprise GPU Dedicated Server - RTX 5090

$ 479.00/mo

1mo3mo12mo24mo

Order Now

256GB RAM
GPU: GeForce RTX 5090
Dual 18-Core E5-2697v4
240GB SSD + 2TB NVMe + 8TB SATA
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Blackwell 2.0
CUDA Cores: 21,760
Tensor Cores: 680
GPU Memory: 32 GB GDDR7
FP32 Performance: 109.7 TFLOPS

Multi-GPU Dedicated Server- 2xRTX 5090

$ 859.00/mo

1mo3mo12mo24mo

Order Now

256GB RAM
GPU: 2 x GeForce RTX 5090
Dual E5-2699v4
240GB SSD + 2TB NVMe + 8TB SATA
1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Blackwell 2.0
CUDA Cores: 21,760
Tensor Cores: 680
GPU Memory: 32 GB GDDR7
FP32 Performance: 109.7 TFLOPS

Enterprise GPU Dedicated Server - H100

$ 2099.00/mo

1mo3mo12mo24mo

Order Now

256GB RAM
GPU: Nvidia H100
Dual 18-Core E5-2697v4
240GB SSD + 2TB NVMe + 8TB SATA
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Hopper
CUDA Cores: 14,592
Tensor Cores: 456
GPU Memory: 80GB HBM2e
FP32 Performance: 183TFLOPS

Multi-GPU Dedicated Server - 4xRTX A6000

$ 1199.00/mo

1mo3mo12mo24mo

Order Now

512GB RAM
GPU: 4 x Quadro RTX A6000
Dual 22-Core E5-2699v4
240GB SSD + 4TB NVMe + 16TB SATA
1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Ampere
CUDA Cores: 10,752
Tensor Cores: 336
GPU Memory: 48GB GDDR6
FP32 Performance: 38.71 TFLOPS

What is Wan-AI Hosting?

Wan-AI Hosting is the self-hosted deployment of Wan-AI’s multimodal generative models, including:

Wan2.1-T2V (Text-to-Video)

Wan2.1-I2V (Image-to-Video)

Wan2.1-VACE (Video Auto-Captioning & Editing)

These models are developed by Wan-AI and are available in 1.3B and 14B parameter sizes. Hosting them on your own GPU server enables you to run video generation, editing, and captioning pipelines without relying on external APIs or cloud platforms.

The Best GPU for Wan-AI Models from Hugging Face

To self-host the Wan-AI/Wan2.1-T2V 1.3B or 14B models from Hugging Face, the GPU requirements vary significantly depending on the version of the model you choose and your latency expectations. Below is a GPU recommendation:

Model Name	Size (4-bit Quantization)	Recommended GPUs
Wan-AI/Wan2.1-T2V-1.3B	17.5 GB	RTX4090 < A100-40gb < RTX5090
Wan-AI/Wan2.1-VACE-1.3B	19.05GB	RTX4090 < A100-40gb < RTX5090
Wan-AI/Wan2.1-T2V-1.3B-Diffusers	19.05GB	RTX4090 < A100-40gb < RTX5090
Wan-AI/Wan2.1-T2V-14B	69.06GB	2*A6000 < A100-80GB < H100
Wan-AI/Wan2.1-VACE-14B	75.16GB	2*A6000 < A100-80GB < H100
Wan-AI/Wan2.1-I2V-14B-720P	82.25GB	2A6000 < 2A100-80GB < 2*H100
Wan-AI/Wan2.1-I2V-14B-480P	82.25 GB	2A6000 < 2A100-80GB < 2*H100
Wan-AI/Wan2.1-VACE-14B-diffusers	82.25 GB	2A6000 < 2A100-80GB < 2*H100

Features of Wan-AI Hosting Service

Multimodal AI Support

Host advanced Text-to-Video (T2V), Image-to-Video (I2V), and Video Auto-Captioning & Editing (VACE) models with support for 1.3B and 14B parameter sizes.

High-Resolution Video Generation

Generate videos in 480p or 720p, with future expandability for higher resolutions depending on your GPU power.

Flexible Deployment Options

Supports PyTorch checkpoints and Hugging Face Diffusers format, giving you freedom to integrate with tools like ComfyUI, AUTOMATIC1111, or custom inference pipelines.

GPU Acceleration Ready

Optimized for A100, H100, RTX 4090, and similar GPUs—ideal for real-time or batch generation workloads.

Offline & Private Deployment

Self-hosted Wan-AI models give you full control of prompts, outputs, and API integrations, ensuring data privacy and independence from third-party servers.

Fine-Tuning & Extension Ready

Advanced users can fine-tune, extend, or chain outputs with other generative tools like LoRA, ControlNet, or video editing frameworks.

Several Common Ways to Deploy Wan-AI Service on GPU Servers

Deployment Method	Pros	Cons	Steps
Method 1: Diffusers Pipeline via Hugging Face + PyTorch	Full access, customizable, Hugging Face ecosystem	Requires coding and model management knowledge	1. Set up a GPU server with Python ≥ 3.9 and CUDA toolkit 2. Install transformers, diffusers, accelerate, torch, xformers 3. Load the model via Hugging Face’s from_pretrained() 4. Run generation with Diffusers pipeline (e.g., TextToVideoPipeline)
Method 2: ComfyUI Integration (For Diffusers Versions)	Visual interface, modular, community-supported	Needs optimization for large models (esp. 14B)	1. Install ComfyUI on your server 2. Load the Wan2.1-Diffusers versions (1.3B or 14B) 3. Connect nodes like Text Prompt → Model Loader → Video Output
Method 3: Custom FastAPI or Gradio Web UI	Web-accessible, scriptable, shareable	Needs backend development setup	1. Wrap the Hugging Face model loading and inference in FastAPI or Gradio 2. Host on the GPU server with nginx + uvicorn 3. Add endpoints for /generate-video, /generate-from-image, etc.
Method 4: Dockerized Inference Setup	Portable, deployable at scale, good for CI/CD	Slightly heavier setup, slower updates	1. Create a Dockerfile with preinstalled PyTorch, CUDA, and dependencies 2. Preload Wan-AI model weights into the image or volume 3. Use NVIDIA Docker runtime for GPU access

FAQs of Wan Service AI Hosting

What is Wan-AI Service?



Wan-AI Service is to the self-hosted deployment of Wan-AI's generative models — including text-to-video (T2V), image-to-video (I2V), and video auto-captioning/enhancement (VACE) — on dedicated GPU servers or VPS with compatible frameworks such as Hugging Face Diffusers or ComfyUI.

What GPU is recommended for Wan-AI hosting?



Minimum GPU requirements vary:

1.3B models: 12–16 GB VRAM (e.g., RTX 3080, A4000)

14B models: 24–48 GB VRAM (e.g., RTX 4090, A5000, A6000, A100)

High-speed inference: Use NVLink-enabled dual GPU or high-bandwidth memory GPUs

Can I use ComfyUI to run Wan2.1 Service?



Yes. Both Wan2.1-T2V-1.3B-Diffusers and Wan2.1-T2V-14B-Diffusers can be used with ComfyUI by loading the proper nodes and handling video output (MP4/WebM). This offers a visual node-based way to build workflows.

Which deployment methods are recommended?



Hugging Face Transformers + Diffusers (Python script)

ComfyUI (drag-and-drop workflows)

Dockerized environments (for production scaling)

FastAPI + Gradio for web API/UI

Do I need to pay for these models?



As of now, Wan-AI Service are free for research and non-commercial use, but always check the specific license on Hugging Face for each model version.

Which models can I host?



ou can self-host the following Wan-AI models:

Text-to-Video: Wan2.1-T2V-1.3B, Wan2.1-T2V-14B
Image-to-Video: Wan2.1-I2V-14B-480P, Wan2.1-I2V-14B-720P
Video-Audio Co-evolution (VACE): Wan2.1-VACE-14B, Wan2.1-VACE-1.3B
Diffusers-compatible variants for easier integration: -Diffusers

Do Wan-AI models require vLLM or TGI to run?



No. These are not LLMs. Wan2.1 models are Diffusers-based multimodal generation models and are best run via Hugging Face’s Diffusers, ComfyUI, or a custom FastAPI backend. vLLM, TGI, and Triton are generally not required unless adapting for advanced inference pipelines.

Is FFmpeg needed for video output?



Yes, FFmpeg is typically used:

To encode image sequences into MP4/WebM

To combine video and audio if using VACE models

Ensure FFmpeg is installed and callable in your server environment.

What is the difference between the Hugging Face 'Diffusers' and 'non-Diffusers' versions?



Diffusers version: Works with Hugging Face diffusers pipeline or ComfyUI.

Non-Diffusers version: May require custom integration, may not work out-of-box with from_pretrained() Diffusers pipeline.

Is this suitable for public video generation platforms?



Yes. With sufficient GPU resources, you can integrate these models into a platform or service offering text-to-video, image-to-video, or video+audio generation.