Modes
Video Generation
Generate videos from text, images, or interpolate between keyframes.
Video Generation creates short video clips using AI. You can generate from text alone, use an image as a starting frame, or interpolate between two keyframes.
For pricing details, see the Pricing page.
Sub-Modes
| Sub-Mode | Description | Input Required |
|---|---|---|
| Text to Video | Generate from a text prompt | Text only |
| Start Frame | Animate from a source image | 1 image + text |
| Interpolation | Transition between two frames | 2 images (start + end) |
| References | Generate with reference guidance | Reference images + text |
Providers
| Provider | Key Features |
|---|---|
| VEO (Google) | Up to 4K, 4-8 seconds |
| WAN (Alibaba) | Cost-effective, multiple sub-modes, optional audio |
| Kling (KlingAI) | Motion control, premium quality (Pro only) |
| SeedAnce (ByteDance) | Audio-inclusive generation |
| xAI (Grok) | Budget-friendly |
| LTX | Up to 4K, audio-driven mode |
| OmniHuman (ByteDance) | Human-focused video |
See Providers & Models for the full model list.
Settings
Resolution
- 720p — Standard definition (fastest, all plans)
- 1080p — Full HD (Basic+ plans)
- 4K — Ultra HD (Pro plan only, VEO and LTX)
Duration
Duration varies by provider:
- VEO — 4 or 8 seconds
- WAN — 2 to 15 seconds (model-dependent)
- Kling — 5 or 10 seconds
Aspect Ratio
- 16:9 — Standard landscape video
- 9:16 — Vertical video (stories, reels)
- 1:1 — Square
- Additional ratios available with some providers
Audio
Some providers support audio generation alongside video:
- WAN — Optional audio track
- SeedAnce — Audio-inclusive generation
- LTX Pro Audio — Audio-driven video
Tips
- Start Frame mode gives the best control — upload an image you like and describe the motion
- VEO 3.1 Fast is a good default for quick iterations
- WAN Flash models are cost-effective for high-volume work
- Video generation is slower than image gen — expect 30 seconds to several minutes