Video Generation

Generate videos from text, a start frame, interpolated keyframes, references, motion control, or a talking avatar.

Video Generation creates a new clip from scratch — text, a starting image, two keyframes to interpolate between, reference media, a motion source, or an audio performance for a talking avatar. To transform a video you already have (edit, extend, or reframe it), that's the separate Video Edit mode.

For pricing details, see the Pricing page. Full model tables: Video Providers.

Sub-modes

Text to VideoPrompt-only generation with no visual input required.

Start FrameGuide the opening frame with a single image input.

Start + End FrameBlend from a starting image into a target ending frame.

ReferencesDrive the video with image or video reference inputs.

AvatarAnimate a character image with an audio performance.

Motion ControlTransfer movement from a motion clip onto a start frame.

Availability is per model — the picker only shows sub-modes that model supports.

Providers (current)

Provider	Highlights
Veo (Google)	Lite / Fast / Full · up to 4K · 4–8s
Gemini Omni Flash	Fast 720p generation · 3–10s
Wan (Alibaba)	2.7 suite: T2V, I2V, R2V · optional audio
HappyHorse	1.1 T2V / I2V / R2V
Kling	3.0 Turbo / 3.0 / 3.0 Omni · Avatar · motion control · up to 4K
Seedance	2.0, 2.0 Fast, 1.5 Pro · multimodal refs
OmniHuman	Image + audio → talking portrait
LTX	2.3 Fast / Pro · up to 4K · long Fast durations
Grok Imagine Video	1.0 full suite · 1.5 start-frame only
Luma Ray 3.2	HDR, loop, keyframe interpolation · 540p–1080p

See Video Providers for the full model list and sub-mode map.

Settings

Resolution

Depends on the model (not plan): commonly 480p, 720p, 1080p, 1440p, or 4K where supported.

Duration

Varies widely:

Veo — 4 / 6 / 8s (higher res often forces 8s)
Gemini Omni Flash — 3–10s
Wan / HappyHorse / Kling / Seedance — typically multi-second ranges up to ~15s
LTX Fast — up to 20s · LTX Pro — up to 10s
Grok — 1–15s
Luma — 5 or 10s (HDR/loop often force 5s)
OmniHuman / Kling Avatar — driven by audio length

Aspect Ratio

Common options include 16:9, 9:16, and 1:1. Many providers add 4:3, 3:4, 21:9, etc. Some start-frame or edit flows inherit ratio from the source media.

Audio

Optional generated audio on Wan, Seedance, LTX
Required audio input for OmniHuman and Kling Avatar

Tips

Start Frame usually gives the most control — pick a strong still, then describe motion
Veo 3.1 Lite or Wan 2.7 are good iteration defaults
Match the sub-mode to the model (e.g. Grok 1.5 is start-frame only)
Video jobs are slower than stills — expect tens of seconds to several minutes

On this page