How to use Wan AI Video Generator (Alibaba)
Following in the footsteps of Goku and DeepSeek

AI video generators are reshaping 2025, with leading tech companies launching advanced models designed for both developers and large-scale applications. The rapid pace of competition is driving improvements in realism, including more lifelike motion, refined physics, and higher visual fidelity.
Alibaba’s Wan AI is one of the newest competitors in AI video generation, competing with ByteDance’s Goku AI and DeepSeek’s Janus Pro. Like its rivals, Wan AI focuses on producing highly realistic videos with smooth movement and enhanced physics.
To support its AI initiatives, Alibaba has committed over $50 million to cloud computing and infrastructure over the next three years. This article will guide you through using Wan AI, explore its capabilities, and explain how to edit the custom video generations.
Table of Contents
- What is Wan AI?
- Wan 2.1 Video Generation Specs
- How to Generate Videos Using Wan AI
- How to Edit Wan AI Videos
- Wan AI Pricing
What is Wan AI?
Wan AI marks Alibaba's entry into the video generation market, offering multiple models within the broader Wan AI Generator. More specifically, Wan 2.1 is a text-to-video and image-to-video generator capable of producing videos up to 30 frames per second and six seconds in length.
Wan AI has notable strengths and limitations that may impact its suitability for different applications. It excels at generating subtle movements between characters and objects while maintaining consistent patterns and shapes throughout a scene. This makes it a strong choice for users looking to add motion to still images or enhance existing video content.
Strengths:
- Realistic physics for moving objects
- Above-average pattern consistency
- Generous free credit system
- Supports multiple aspect ratios
- User-friendly generation interface
Limitations:
- Inconsistent results with moving cameras
- Difficulty maintaining character consistency across clips
- Cannot generate branded products
- Slow processing times
- Cannot edit, customize, or resize generations
Wan 2.1 Video Generation Specs
The Wan 2.1 video generation model produces short, high-quality clips with a standard set of specifications. Whether generating from text or images, users can expect the following output:
- Video length: 6 seconds
- Frame rate: 30 frames per second
- File type: MP4
- File size: Approximately 2.5MB per clip
- Resolution: Supports aspect ratios ranging from 3:4 to 16:9
- Audio options: Optional background music or sound effects based on prompt input
- Watermark: Wan AI videos will be generated with a watermark graphic in the bottom right corner of the video frame.
These specs are very creator-friendly, providing smooth, workable video that is especially helpful when merged together in larger projects.
How to Generate Videos Using Wan AI
Wan AI is an open-source video generator that can be run locally on your device or accessed through public hosting sites. Public hosting typically has longer processing times, while private hosting requires knowledge of the model’s codebase and API tools.
Likewise, not all hosting sites offer the same features. Some provide more control over aspect ratio, resolution, and prompt type, such as text-to-video or image-to-video. Certain platforms also offer side-by-side access to multiple other AI video generators like Google's VEO 2 and Kling, allowing for a direct comparison between models.
To generate videos using Wan 2.1, visit the official International Experience Page and select AI Videos from the left-hand menu. This will open the video generation interface, where you can choose between text-to-video and image-to-video.

Use the Text2Video and Image2Video toggles at the top of the prompt menu to switch between these options.
Both methods generate AI-powered videos but function slightly differently.

Text-to-Video
- Generate a video from scratch using a written prompt of up to 800 characters.
- Select an aspect ratio from the available options (16:9, 9:16, 1:1, 4:3, 3:4)
- Use optional tools:
- Inspiration Mode: Adds more expressive video features.
- Sound Effects: Generates sound effects if specified in the prompt. If no sounds are specified, background music is added.
- Prompt Enhancing: Rewrites the prompt to improve clarity and optimize it for the Wan 2.1 model.
Image-to-Video
- Upload one or two images to serve as the start and end frames. If only one image is uploaded, it will be used as the start frame.
- Optionally include a supplemental prompt of up to 800 characters.
- The generated video will match the aspect ratio of the reference image.
- Use the same optional tools as text-to-video: inspiration mode, sound effects, and prompt enhancing.
Each video generation, whether from a text or image prompt, costs 10 credits and produces a six-second clip. These credits function as Wan AI’s proprietary currency, though more detailed pricing details will be covered later.
Wan AI is not the fastest video generator, but its speed is expected to improve over time. Enterprise users will likely have a different experience since they can run the model locally or on dedicated servers, avoiding the bandwidth limitations of public access.
Currently, image-to-video generations take approximately 15 minutes to generate, while text-to-video is slightly quicker at around 10 minutes. Generation times vary significantly depending on the complexity of the prompt, concurrent site usage, and optional tools.
Resizing Your Reference Image
Since Wan AI’s image-to-video generator matches the aspect ratio of the uploaded reference image, it is important to size your image correctly before uploading. A free online image resizer can help with this process.
Kapwing makes it easy to resize images with over 15 preset aspect ratios or custom dimensions in just a few clicks. To resize an image, start by uploading it to the editor. Once uploaded, select the background of the image before choosing the Resize Project tool on the right-hand side.

Then, either enter custom image dimensions or choose from the available preset aspect ratios to resize your image.

To ensure your image looks exactly how you want it to after resizing, double-click on it to adjust its crop within the frame. This allows you to control the framing while maintaining the new aspect ratio.

Once completed, your image is ready to be exported and used as a reference for a Wan 2.1 video generation.
You can also use a video resizing tool to adjust the aspect ratio of any video generated by Wan AI if you need a different format than the default options: 16:9, 9:16, 1:1, 4:3, and 3:4.
How to Edit Wan AI Videos
Generating short AI video clips can be a great way to fill brief gaps in your content with relevant visuals. Many content creators also combine generated clips with real footage for a more dynamic result.
To edit your Wan AI videos online, open the Kapwing editor in your browser. Start a new project or make small adjustments to existing clips. Use the media sidebar to add videos, text, graphics, images, audio, or AI-powered features like automatic subtitles.
Easily arrange layers in the project timeline by dragging and dropping them into place.

To streamline your editing process, here are a few key shortcuts:
- Spacebar: Play/Pause the video
- Ctrl/Cmd + A: Select all layers in the scene
- Ctrl/Cmd + Z: Undo | Ctrl/Cmd + Shift + Z (or Y on Windows) – Redo
- S: Split selected layers at the playhead
- H: Hide selected layers
- Backspace/Delete: Remove selected layer or gap
- Arrow Keys: Move layers (Up/Down to switch tracks, Left/Right to adjust position)
- Ctrl/Cmd + ] or [:Bring layer forward/backward in order
Beyond filling gaps in longer videos, many creators use AI-generated clips to create short promotional content like trailers or teasers. These are especially useful for podcasters, YouTubers, and social media managers looking to build anticipation for an upcoming release without revealing actual footage.
To create a video in this style using Kapwing, upload and arrange your clips in the editor. Then, add automatic narration by selecting the AI Voice tool from the left-hand sidebar.

Enter or paste your script into the prompt box (up to 5,000 characters at a time). Once you've confirmed your script, selected a voice, and optionally chosen a persona, click Add Layer to generate the voice over.
Kapwing will automatically add subtitles that sync with the spoken content, giving your video a polished finish. To adjust the subtitles, use the editing menu on the right to change the font, size, color, border, and animations. If you need to modify the narration, reopen the AI Voice tool and update your script.

When you're finished, export your video by clicking Export in the top-right corner. For best compatibility, export as an MP4 and use the compression slider if you need to reduce file size.

The final video will be an effective promotional piece that integrates Wan AI-generated content into your production.
Example video trailer using a Wan AI generated video clip
Wan AI Pricing
For enterprise users, Wan AI offers a customized pricing structure based on usage and support needs. To learn more about enterprise pricing, visit the official Wan AI enterprise website.
For individual users, Wan AI operates on a credit system. New users receive 50 credits upon creating an account, which is enough for five video generations. Each video, whether created from a text or image prompt, costs 10 credits.

Currently, there is no option to purchase credits, but users can earn free credits through platform interactions:
- Daily Check-In: Clicking the Check In button once per day adds 50 credits.
- Publishing Videos: Sharing a generated video earns 20 credits. To publish, select the three-dot icon in the video window and choose Publish from the menu.
- Providing Feedback: Rating a generated video with a thumbs-up or thumbs-down gives 5 credits, up to 10 credits per day.

At the moment, credits cannot be purchased, but a paid option may be introduced later to increase generation limits.