Modes
Video Generation
Generate videos from text, images, or interpolate between keyframes.
Video Generation creates short video clips using AI. You can generate from text alone, use an image as a starting frame, or interpolate between two keyframes.
For pricing details, see the Pricing page.
Sub-Modes
| Sub-Mode | Description | Input Required |
|---|---|---|
| Text to Video | Generate from a text prompt | Text only |
| Start Frame | Animate from a source image | 1 image + text |
| Interpolation | Transition between two frames | 2 images (start + end) |
| References | Generate with reference guidance | Reference images + text |
| Edit | Modify an existing clip with a prompt | 1 video + text (+ optional reference images) |
| Continuation | Extend a clip from its final frame | 1 video + text |
Edit and Continuation sub-modes are available with Wan 2.7 models.
Providers
| Provider | Key Features |
|---|---|
| VEO (Google) | Up to 4K, 4-8 seconds |
| WAN (Alibaba) | Cost-effective, multiple sub-modes, optional audio |
| Kling (KlingAI) | Motion control, premium quality, 4K on Kling V3 and V3 Omni |
| SeedAnce (ByteDance) | Audio-inclusive generation |
| xAI (Grok) | Budget-friendly |
| LTX | Up to 4K, audio-driven mode |
| OmniHuman (ByteDance) | Human-focused video |
See Providers & Models for the full model list.
Settings
Resolution
The maximum video resolution you can render depends on your plan:
- 720p — Available on all paid plans
- 1080p — Plus, Pro, and Team plans
- 4K — Pro and Team plans (on models that support it, e.g. VEO, Kling V3/V3 Omni, and LTX)
Every model itself is available on every paid plan — the cap is purely on output resolution.
Duration
Duration varies by provider:
- VEO — 4, 6, or 8 seconds; 4K locks to 8 seconds
- WAN — 2 to 15 seconds (model-dependent)
- Kling — 3 to 15 seconds on Kling V3, 5 or 10 seconds on Kling V2.6
Aspect Ratio
- 16:9 — Standard landscape video
- 9:16 — Vertical video (stories, reels)
- 1:1 — Square
- Additional ratios available with some providers
Audio
Some providers support audio generation alongside video:
- WAN — Optional audio track
- SeedAnce — Audio-inclusive generation
- LTX Pro Audio — Audio-driven video
Tips
- Start Frame mode gives the best control — upload an image you like and describe the motion
- VEO 3.1 Fast is a good default for quick iterations
- WAN Flash models are cost-effective for high-volume work
- Video generation is slower than image gen — expect 30 seconds to several minutes