Modes

Video Generation

Generate videos from text, images, or interpolate between keyframes.

Video Generation creates short video clips using AI. You can generate from text alone, use an image as a starting frame, or interpolate between two keyframes.

For pricing details, see the Pricing page.

Sub-Modes

Sub-ModeDescriptionInput Required
Text to VideoGenerate from a text promptText only
Start FrameAnimate from a source image1 image + text
InterpolationTransition between two frames2 images (start + end)
ReferencesGenerate with reference guidanceReference images + text
EditModify an existing clip with a prompt1 video + text (+ optional reference images)
ContinuationExtend a clip from its final frame1 video + text

Edit and Continuation sub-modes are available with Wan 2.7 models.

Providers

ProviderKey Features
VEO (Google)Up to 4K, 4-8 seconds
WAN (Alibaba)Cost-effective, multiple sub-modes, optional audio
Kling (KlingAI)Motion control, premium quality, 4K on Kling V3 and V3 Omni
SeedAnce (ByteDance)Audio-inclusive generation
xAI (Grok)Budget-friendly
LTXUp to 4K, audio-driven mode
OmniHuman (ByteDance)Human-focused video

See Providers & Models for the full model list.

Settings

Resolution

The maximum video resolution you can render depends on your plan:

  • 720p — Available on all paid plans
  • 1080p — Plus, Pro, and Team plans
  • 4K — Pro and Team plans (on models that support it, e.g. VEO, Kling V3/V3 Omni, and LTX)

Every model itself is available on every paid plan — the cap is purely on output resolution.

Duration

Duration varies by provider:

  • VEO — 4, 6, or 8 seconds; 4K locks to 8 seconds
  • WAN — 2 to 15 seconds (model-dependent)
  • Kling — 3 to 15 seconds on Kling V3, 5 or 10 seconds on Kling V2.6

Aspect Ratio

  • 16:9 — Standard landscape video
  • 9:16 — Vertical video (stories, reels)
  • 1:1 — Square
  • Additional ratios available with some providers

Audio

Some providers support audio generation alongside video:

  • WAN — Optional audio track
  • SeedAnce — Audio-inclusive generation
  • LTX Pro Audio — Audio-driven video

Tips

  • Start Frame mode gives the best control — upload an image you like and describe the motion
  • VEO 3.1 Fast is a good default for quick iterations
  • WAN Flash models are cost-effective for high-volume work
  • Video generation is slower than image gen — expect 30 seconds to several minutes

On this page