Wan Hosting Service: Self-Host Wan-AI T2V, I2V, and VACE Models (1.3B/14B)

Wan Hosting Service is to deploying and running Wan-AI’s cutting-edge multimodal models—including Wan2.1-T2V (text-to-video), I2V (image-to-video), and VACE (video auto-captioning and editing)—on your own GPU servers. These models are available in both 1.3B and 14B parameter variants, with support for standard PyTorch and Hugging Face Diffusers formats. By self-hosting, you gain full control over generation speed, resolution (e.g., 480p, 720p), prompt privacy, and integration with your custom pipelines.

The Best GPU Plans for Wan-AI Hosting Service

Choose the appropriate GPU model according to the Bark model size.

Enterprise GPU Dedicated Server - RTX A6000

409.00/mo
1mo3mo12mo24mo
Order Now
  • 256GB RAM
  • GPU: Nvidia Quadro RTX A6000
  • Dual 18-Core E5-2697v4
  • 240GB SSD + 2TB NVMe + 8TB SATA
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • Single GPU Specifications:
  • Microarchitecture: Ampere
  • CUDA Cores: 10,752
  • Tensor Cores: 336
  • GPU Memory: 48GB GDDR6
  • FP32 Performance: 38.71 TFLOPS

Enterprise GPU Dedicated Server - A100

639.00/mo
1mo3mo12mo24mo
Order Now
  • 256GB RAM
  • GPU: Nvidia A100
  • Dual 18-Core E5-2697v4
  • 240GB SSD + 2TB NVMe + 8TB SATA
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • Single GPU Specifications:
  • Microarchitecture: Ampere
  • CUDA Cores: 6912
  • Tensor Cores: 432
  • GPU Memory: 40GB HBM2
  • FP32 Performance: 19.5 TFLOPS

Enterprise GPU Dedicated Server - RTX 4090

409.00/mo
1mo3mo12mo24mo
Order Now
  • 256GB RAM
  • GPU: GeForce RTX 4090
  • Dual 18-Core E5-2697v4
  • 240GB SSD + 2TB NVMe + 8TB SATA
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • Single GPU Specifications:
  • Microarchitecture: Ada Lovelace
  • CUDA Cores: 16,384
  • Tensor Cores: 512
  • GPU Memory: 24 GB GDDR6X
  • FP32 Performance: 82.6 TFLOPS
New Arrival

Enterprise GPU Dedicated Server - RTX 5090

479.00/mo
1mo3mo12mo24mo
Order Now
  • 256GB RAM
  • GPU: GeForce RTX 5090
  • Dual 18-Core E5-2697v4
  • 240GB SSD + 2TB NVMe + 8TB SATA
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • Single GPU Specifications:
  • Microarchitecture: Blackwell 2.0
  • CUDA Cores: 21,760
  • Tensor Cores: 680
  • GPU Memory: 32 GB GDDR7
  • FP32 Performance: 109.7 TFLOPS

Multi-GPU Dedicated Server- 2xRTX 5090

859.00/mo
1mo3mo12mo24mo
Order Now
  • 256GB RAM
  • GPU: 2 x GeForce RTX 5090
  • Dual E5-2699v4
  • 240GB SSD + 2TB NVMe + 8TB SATA
  • 1Gbps
  • OS: Windows / Linux
  • Single GPU Specifications:
  • Microarchitecture: Blackwell 2.0
  • CUDA Cores: 21,760
  • Tensor Cores: 680
  • GPU Memory: 32 GB GDDR7
  • FP32 Performance: 109.7 TFLOPS

Enterprise GPU Dedicated Server - H100

2099.00/mo
1mo3mo12mo24mo
Order Now
  • 256GB RAM
  • GPU: Nvidia H100
  • Dual 18-Core E5-2697v4
  • 240GB SSD + 2TB NVMe + 8TB SATA
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • Single GPU Specifications:
  • Microarchitecture: Hopper
  • CUDA Cores: 14,592
  • Tensor Cores: 456
  • GPU Memory: 80GB HBM2e
  • FP32 Performance: 183TFLOPS
New Arrival

Enterprise GPU Dedicated Server - RTX PRO 6000

729.00/mo
1mo3mo12mo24mo
Order Now
  • 256GB RAM
  • GPU: Nvidia RTX PRO 6000
  • Dual 24-Core Platinum 8160
  • 240GB SSD + 2TB NVMe + 8TB SATA
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • Single GPU Specifications:
  • Microarchitecture: Blackwell
  • CUDA Cores: 24,064
  • Tensor Cores: 752
  • GPU Memory: 96GB GDDR7
  • FP32 Performance: 125.10 TFLOPS

Multi-GPU Dedicated Server - 4xRTX A6000

1199.00/mo
1mo3mo12mo24mo
Order Now
  • 512GB RAM
  • GPU: 4 x Quadro RTX A6000
  • Dual 22-Core E5-2699v4
  • 240GB SSD + 4TB NVMe + 16TB SATA
  • 1Gbps
  • OS: Windows / Linux
  • Single GPU Specifications:
  • Microarchitecture: Ampere
  • CUDA Cores: 10,752
  • Tensor Cores: 336
  • GPU Memory: 48GB GDDR6
  • FP32 Performance: 38.71 TFLOPS
What is Wan-AI Hosting?

What is Wan-AI Hosting?

Wan-AI Hosting is the self-hosted deployment of Wan-AI’s multimodal generative models, including:

  • Wan2.1-T2V (Text-to-Video)
  • Wan2.1-I2V (Image-to-Video)
  • Wan2.1-VACE (Video Auto-Captioning & Editing)
  • These models are developed by Wan-AI and are available in 1.3B and 14B parameter sizes. Hosting them on your own GPU server enables you to run video generation, editing, and captioning pipelines without relying on external APIs or cloud platforms.

    The Best GPU for Wan-AI Models from Hugging Face

    To self-host the Wan-AI/Wan2.1-T2V 1.3B or 14B models from Hugging Face, the GPU requirements vary significantly depending on the version of the model you choose and your latency expectations. Below is a GPU recommendation:
    Model NameSize (4-bit Quantization)Recommended GPUs
    Wan-AI/Wan2.1-T2V-1.3B17.5 GBRTX4090 < A100-40gb < RTX5090
    Wan-AI/Wan2.1-VACE-1.3B19.05GBRTX4090 < A100-40gb < RTX5090
    Wan-AI/Wan2.1-T2V-1.3B-Diffusers19.05GBRTX4090 < A100-40gb < RTX5090
    Wan-AI/Wan2.1-T2V-14B69.06GB2*A6000 < A100-80GB < H100
    Wan-AI/Wan2.1-VACE-14B75.16GB2*A6000 < A100-80GB < H100
    Wan-AI/Wan2.1-I2V-14B-720P82.25GB2*A6000 < 2*A100-80GB < 2*H100
    Wan-AI/Wan2.1-I2V-14B-480P82.25 GB2*A6000 < 2*A100-80GB < 2*H100
    Wan-AI/Wan2.1-VACE-14B-diffusers82.25 GB2*A6000 < 2*A100-80GB < 2*H100

    Features of Wan-AI Hosting Service

    Multimodal AI Support

    Multimodal AI Support

    Host advanced Text-to-Video (T2V), Image-to-Video (I2V), and Video Auto-Captioning & Editing (VACE) models with support for 1.3B and 14B parameter sizes.
    High-Resolution Video Generation

    High-Resolution Video Generation

    Generate videos in 480p or 720p, with future expandability for higher resolutions depending on your GPU power.
    Flexible Deployment Options

    Flexible Deployment Options

    Supports PyTorch checkpoints and Hugging Face Diffusers format, giving you freedom to integrate with tools like ComfyUI, AUTOMATIC1111, or custom inference pipelines.
    GPU Acceleration Ready

    GPU Acceleration Ready

    Optimized for A100, H100, RTX 4090, and similar GPUs—ideal for real-time or batch generation workloads.
    Offline & Private Deployment

    Offline & Private Deployment

    Self-hosted Wan-AI models give you full control of prompts, outputs, and API integrations, ensuring data privacy and independence from third-party servers.
    Fine-Tuning & Extension Ready

    Fine-Tuning & Extension Ready

    Advanced users can fine-tune, extend, or chain outputs with other generative tools like LoRA, ControlNet, or video editing frameworks.

    Several Common Ways to Deploy Wan-AI Service on GPU Servers

    Deployment Method Pros Cons Steps
    Method 1: Diffusers Pipeline via Hugging Face + PyTorch Full access, customizable, Hugging Face ecosystem Requires coding and model management knowledge 1. Set up a GPU server with Python ≥ 3.9 and CUDA toolkit
    2. Install transformers, diffusers, accelerate, torch, xformers
    3. Load the model via Hugging Face’s from_pretrained()
    4. Run generation with Diffusers pipeline (e.g., TextToVideoPipeline)
    Method 2: ComfyUI Integration (For Diffusers Versions) Visual interface, modular, community-supported Needs optimization for large models (esp. 14B) 1. Install ComfyUI on your server
    2. Load the Wan2.1-Diffusers versions (1.3B or 14B)
    3. Connect nodes like Text Prompt → Model Loader → Video Output
    Method 3: Custom FastAPI or Gradio Web UI Web-accessible, scriptable, shareable Needs backend development setup 1. Wrap the Hugging Face model loading and inference in FastAPI or Gradio
    2. Host on the GPU server with nginx + uvicorn
    3. Add endpoints for /generate-video, /generate-from-image, etc.
    Method 4: Dockerized Inference Setup Portable, deployable at scale, good for CI/CD Slightly heavier setup, slower updates 1. Create a Dockerfile with preinstalled PyTorch, CUDA, and dependencies
    2. Preload Wan-AI model weights into the image or volume
    3. Use NVIDIA Docker runtime for GPU access

    FAQs of Wan Service AI Hosting

    What is Wan-AI Service?

    Wan-AI Service is to the self-hosted deployment of Wan-AI's generative models — including text-to-video (T2V), image-to-video (I2V), and video auto-captioning/enhancement (VACE) — on dedicated GPU servers or VPS with compatible frameworks such as Hugging Face Diffusers or ComfyUI.

    What GPU is recommended for Wan-AI hosting?

    Minimum GPU requirements vary:
  • 1.3B models: 12–16 GB VRAM (e.g., RTX 3080, A4000)
  • 14B models: 24–48 GB VRAM (e.g., RTX 4090, A5000, A6000, A100)
  • High-speed inference: Use NVLink-enabled dual GPU or high-bandwidth memory GPUs
  • Can I use ComfyUI to run Wan2.1 Service?

    Yes. Both Wan2.1-T2V-1.3B-Diffusers and Wan2.1-T2V-14B-Diffusers can be used with ComfyUI by loading the proper nodes and handling video output (MP4/WebM). This offers a visual node-based way to build workflows.

    Which deployment methods are recommended?

  • Hugging Face Transformers + Diffusers (Python script)
  • ComfyUI (drag-and-drop workflows)
  • Dockerized environments (for production scaling)
  • FastAPI + Gradio for web API/UI
  • Do I need to pay for these models?

    As of now, Wan-AI Service are free for research and non-commercial use, but always check the specific license on Hugging Face for each model version.

    Which models can I host?

    ou can self-host the following Wan-AI models:
    • Text-to-Video: Wan2.1-T2V-1.3B, Wan2.1-T2V-14B
    • Image-to-Video: Wan2.1-I2V-14B-480P, Wan2.1-I2V-14B-720P
    • Video-Audio Co-evolution (VACE): Wan2.1-VACE-14B, Wan2.1-VACE-1.3B
    • Diffusers-compatible variants for easier integration: -Diffusers

    Do Wan-AI models require vLLM or TGI to run?

    No. These are not LLMs. Wan2.1 models are Diffusers-based multimodal generation models and are best run via Hugging Face’s Diffusers, ComfyUI, or a custom FastAPI backend. vLLM, TGI, and Triton are generally not required unless adapting for advanced inference pipelines.

    Is FFmpeg needed for video output?

    Yes, FFmpeg is typically used:
  • To encode image sequences into MP4/WebM
  • To combine video and audio if using VACE models
  • Ensure FFmpeg is installed and callable in your server environment.

    What is the difference between the Hugging Face 'Diffusers' and 'non-Diffusers' versions?

  • Diffusers version: Works with Hugging Face diffusers pipeline or ComfyUI.
  • Non-Diffusers version: May require custom integration, may not work out-of-box with from_pretrained() Diffusers pipeline.
  • Is this suitable for public video generation platforms?

  • Yes. With sufficient GPU resources, you can integrate these models into a platform or service offering text-to-video, image-to-video, or video+audio generation.