Video Generator

Create short-form videos using natural-language text prompts. Choose from multiple AI models, guide generation with images or audio, and explore creative motion, all with flexible control over duration, resolution, and style.

What you can do:

Generate videos from text prompts (2-12 seconds depending on model)
Guide generation with start images, end images, or audio
Choose from 11 specialized models with different capabilities
Control aspect ratio, resolution, and duration
Enhance prompts for improved clarity
Explore creative variations with probabilistic generation

Use Video Generator for exploratory, AI-generated motion where variation and experimentation are part of the creative process.

Quick Start

Think of the Video Generator as describing what should happen over time.

You describe the motion or scene in text
You optionally guide it with images or audio (if the model allows)
A selected model generates a short video within its fixed limits

Each generation is independent and may produce different results.

Use Cases

Video Generator excels at creating short-form video content for exploration and ideation.

Ideal for:

Social media content: Create engaging short videos for Instagram, TikTok, or YouTube Shorts
Marketing and advertising: Generate quick promotional clips or product demonstrations
Concept visualization: Explore motion ideas before committing to full production
Creative experimentation: Test different visual directions and motion styles
Storyboarding: Visualize scene transitions and camera movements
Content variations: Generate multiple versions to compare and select the best

Best results when:

You focus on describing motion and progression
You're open to creative variations
You iterate and refine prompts based on results
You use the right model for your specific needs

How Video Generation Works

At a high level, video generation follows three stages:

Prompt interpretation: The text prompt is analyzed to infer motion, subject behavior, environment, and visual style.
Optional visual or audio grounding: A start image, end image, or audio input if supported by the selected model, is used to constrain or guide generation.
Model execution: The selected model applies its internal constraints to generate a video within its supported duration, resolution, and aspect ratio.

The available controls and outputs depend entirely on the chosen model.

Generation Modes

Users can select between two generation modes depending on their creative needs:

Text-to-Video: Generate videos from text prompts alone. The AI interprets your description and creates motion, scenes, and visual content based purely on the text input.

Image-to-Video: Generate videos using a start image as the foundation. The AI animates the provided image according to your text prompt, creating motion that extends from the initial visual.

The selected mode determines which input options are available and how the generation process interprets your creative intent.

Input Options

The Video Generator supports multiple input types that influence how a video is generated. Depending on the selected model, users can provide text prompts, images, or audio to guide the generation process.

Text Prompt

All models require a text prompt. The prompt describes what should happen in the video, including motion, scene progression, and stylistic intent. Prompt clarity directly affects output quality. Vague prompts may result in unpredictable or unfocused motion.

How to think about video prompts

Video prompts work differently from image prompts. Instead of describing how something looks, effective video prompts describe what changes over time.

A strong video prompt focuses on:

Motion and action
How the scene evolves
Transitions or progression from start to finish

Prompts that only describe static appearance often result in limited or repetitive motion.

note

Each video generation is unique, offering creative variations to explore. Small changes in wording can produce noticeably different motion, pacing, or framing.

Structuring an effective video prompt

You don't need a strict formula, but most successful video prompts naturally include:

Starting state: What the scene looks like at the beginning
Action or motion: What moves or happens
Progression: How the motion changes over time
End behavior (optional): How the scene settles or concludes

Focusing on motion and progression generally produces more coherent videos than adding visual detail alone.

Examples: static vs motion-aware prompts

Example 1: Static description (weak motion)

A futuristic city at night with neon lights and tall buildings.

This prompt describes appearance but gives little guidance on how the scene should move.

Example 1: Motion-aware description (stronger motion)

A futuristic city at night, with flying vehicles moving between skyscrapers as neon lights flicker and the camera slowly glides forward through the streets.

This version introduces movement, pacing, and camera progression.

Example 2: Vague action (unfocused motion)

A person walking through a forest.

The action is present, but the motion lacks direction or evolution.

Example 2: Progressive action (more coherent motion)

A person walking through a forest, leaves rustling as sunlight shifts through the trees, gradually transitioning from a wide shot to a closer view as the person moves forward.

This prompt guides how the scene changes over time.

Example 3: Overloaded prompt (conflicting motion)

A car driving fast, cinematic lighting, dramatic weather, explosions, slow motion, futuristic city, sunset, cyberpunk style.

Too many competing ideas can lead to inconsistent or unclear motion.

Example 3: Focused motion prompt (clear intent)

A car driving quickly through a futuristic city at sunset, with light rain and reflections on the road as the camera tracks smoothly alongside the vehicle.

This version prioritizes one main action and supports it with context.

Avoid image-style prompts

Prompts written like image descriptions often limit motion quality.

Common pitfalls include:

Listing objects, colors, or styles without actions
Describing a scene without verbs or transitions
Combining too many unrelated ideas in a single prompt

tip

Using verbs and temporal language such as “moves,” “gradually,” “transitions,” “shifts,” or “over time” helps guide motion.

Iterating on results

Video generation is designed for experimentation.

If the output isn't what you expect:

Adjust one idea at a time instead of rewriting everything
Simplify the prompt before adding more detail
Re-run the generation to explore variations

Treat prompts as instructions to refine, not commands with guaranteed outcomes.

Start Image

Many models support a start image, which defines the first frame of the video. When provided, the model treats the image as visual context rather than generating the scene from scratch.

Using a start image is useful when:

Visual consistency matters
A specific subject or composition must be preserved
The video should evolve from an existing asset

End Image

Some models allow an optional end image. In these cases, the video transitions from the start image toward the end image over the specified duration.

End images are best suited for:

Controlled transitions
Before-and-after style motion
Predictable visual endpoints

Audio Input

Certain models support audio input, either by enabling sound generation or by attaching a provided audio track iteration.

When audio files are provided:

Audio longer than the video is trimmed
Audio shorter than the video results in silence for the remaining duration

Prompt Enhancement

Users can optionally enhance their prompt before generation. Prompt enhancement restructures or expands the input text to improve descriptive clarity.

This feature is intended to reduce ambiguity, not to change the user’s intent. It may improve consistency for complex prompts but does not guarantee higher-quality results.

Video Generation Models

Each video generation model is optimized for a specific balance of quality, control, duration, and credit consumption. Model selection determines the supported inputs, output characteristics, and available advanced settings.

MiniMax Hailuo 02

MiniMax Hailuo 02 supports both start and end images, enabling controlled transitions within a fixed-duration video. Aspect ratio handling is managed internally by the model.

Capabilities:

Start image: Supported
End image: Supported
Supported aspect ratios: Model-defined
Duration: 5 seconds
Resolution: 1080p
Audio: Not supported

MiniMax Hailuo 2.3

MiniMax Hailuo 2.3 supports image-guided video generation with a fixed duration and resolution. Aspect ratio selection is determined by the model.

Capabilities:

Start image: Supported
End image: Not supported
Supported aspect ratios: Model-defined
Duration: 5 seconds
Resolution: 1080p
Audio: Not supported

Kling 2.1 Master

Kling 2.1 Master is designed for higher-fidelity video generation with controlled duration. It supports image-guided generation and is suited for scenarios where visual coherence is more important than generation speed or cost.

Capabilities:

Start image: Supported
Duration: 5-10 seconds
Resolution: Model-defined
Audio: Not supported

Kling 2.6

Kling 2.6 extends earlier Kling models with support for both start and end images, enabling more controlled visual transitions. It supports common aspect ratios and allows optional sound, making it suitable for guided motion-based generation.

Capabilities:

Start image: Supported
End image: Supported
Supported aspect ratios: 9:16, 16:9, 1:1
Duration: 5-10 seconds
Resolution: Model-defined
Audio: Optional

LTX-2

LTX-2 focuses on high-resolution video generation with controlled duration. It supports image-guided generation and optional sound, producing outputs suitable for higher-quality visual use cases.

Capabilities:

Start image: Supported
End image: Not supported
Supported aspect ratios: 16:9
Duration: 6-10 seconds
Resolution: 1080p-2160p
Audio: Optional

LTX-2 Fast

LTX-2 Fast prioritizes longer video duration while maintaining high-resolution output. It supports image-guided generation with optional sound and is optimized for faster generation.

Capabilities:

Start image: Supported
End image: Not supported
Supported aspect ratios: 16:9
Duration: 6-20 seconds
Resolution: 1080p-2160p
Audio: Optional

Seedance 1.0 Fast

Seedance 1.0 Fast emphasizes flexibility and iteration speed. It supports a wide range of aspect ratios and resolutions, making it suitable for generating videos across multiple formats with moderate credit usage.

Capabilities:

Start image: Supported
End image: Not supported
Aspect ratios: 21:9, 16:9, 4:3, 1:1, 3:4, 9:16, auto
Duration: 2-12 seconds
Resolution: 480p-1080p
Audio: Not supported

Seedance 1.0 Light

Seedance 1.0 Light is optimized for cost efficiency while retaining support for controlled transitions. It allows both start and optional end images, making it suitable for simple animations and constrained visual progressions.

Capabilities:

Start image: Supported
End image: Optional
Aspect ratios: 21:9, 16:9, 4:3, 1:1, 3:4, 9:16, auto
Duration: 2-12 seconds
Resolution: 480p-1080p
Audio: Not supported

Seedance 1.0 Pro

Seedance 1.0 Pro extends the Light variant with increased computational investment. It supports the same inputs and output ranges while consuming more credits to deliver improved motion consistency.

Capabilities:

Start image: Supported
End image: Optional
Aspect ratios: 21:9, 16:9, 4:3, 1:1, 3:4, 9:16, auto
Duration: 2-12 seconds
Resolution: 480p-1080p
Audio: Not supported

OpenAI Sora 2

Sora 2 supports image-guided video generation with fixed resolution output. It is suitable for straightforward text-to-video or image-to-video use cases without extensive configuration.

Capabilities:

Start image: Supported
End image: Not supported
Aspect ratios: 9:16, 16:9, auto
Duration: 4-12 seconds
Resolution: 720p
Audio: Not supported

Google Veo 2

Google Veo 2 focuses on visual quality within short durations. It supports image-guided generation and produces videos with consistent output characteristics at higher credit cost.

Capabilities:

Start image: Supported
End image: Not supported
Aspect ratios: 9:16, 16:9
Duration: 5-8 seconds
Resolution: 720p
Audio: Not supported

Google Veo 3

Google Veo 3 expands on Veo 2 by supporting additional aspect ratios, higher resolutions, and optional sound. It is designed for richer audiovisual outputs with higher computational requirements.

Capabilities:

Start image: Supported
End image: Not supported
Aspect ratios: 9:16, 16:9, 1:1, auto
Duration: 4-8 seconds
Resolution: 720p-1080p
Audio: Optional

Google Veo 3.1 Fast

Veo 3.1 Fast introduces support for both start and end images while maintaining shorter durations. It offers controlled transitions with optional sound at a reduced credit cost compared to Veo 3.

Capabilities:

Start image: Supported
End image: Supported
Aspect ratios: 9:16, 16:9, 1:1, auto
Duration: 4-8 seconds
Resolution: 720p
Audio: Optional

Google Veo 3 Fast

Google Veo 3 Fast prioritizes faster generation while retaining support for higher resolutions. It supports image-guided generation with optional sound.

Capabilities:

Start image: Supported
End image: Not supported
Aspect ratios: 9:16, 16:9
Duration: 4-8 seconds
Resolution: 720p-1080p
Audio: Optional

Wan 2.2

Wan 2.2 is a cost-efficient model that supports basic audiovisual generation with optional end images. It is suitable for constrained workflows requiring moderate control.

Capabilities:

Start image: Supported
End image: Optional
Aspect ratios: 9:16, 16:9, 1:1, auto
Duration: 4-8 seconds
Resolution: 480p-720p
Audio: Optional

Wan 2.5

Wan 2.5 extends Wan 2.2 by supporting external audio input. Provided audio is treated as background music and adjusted to match video duration.

Capabilities:

Start image: Supported
End image: Not supported
Audio input: Supported (URL or file)
Audio behavior:
- Longer than video: trimmed
- Shorter than video: remaining video plays without audio
Duration: 5-10 seconds
Resolution: 480p-1080p

Model Capability Summary

Model	Start Image	End Image	Audio Support	Duration (sec)	Resolution
Kling 2.1 Master	Yes	No	No	5-10	Model-defined
Kling 2.6	Yes	Yes	Optional	5-10	Model-defined
Seedance 1.0 Fast	Yes	No	No	2-12	480p-1080p
Seedance 1.0 Light	Yes	Optional	No	2-12	480p-1080p
Seedance 1.0 Pro	Yes	Optional	No	2-12	480p-1080p
OpenAI Sora 2	Yes	No	No	4-12	720p
Google Veo 2	Yes	No	No	5-8	720p
Google Veo 3	Yes	No	Optional	4-8	720p-1080p
Google Veo 3.1 Fast	Yes	Yes	Optional	4-8	720p
Google Veo 3 Fast	Yes	No	Optional	4-8	720p-1080p
LTX-2	Yes	No	Optional	6-10	1080p-2160p
LTX-2 Fast	Yes	No	Optional	6-20	1080p-2160p
MiniMax Hailuo 02	Yes	Yes	No	5	1080p
MiniMax Hailuo 2.3	Yes	No	No	5	1080p
Wan 2.2	Yes	Optional	Optional	4-8	480p-720p
Wan 2.5	Yes	No	Yes (external audio)	5-10	480p-1080p

Credit Consumption

Credit usage for video generation depends on the selected model and generation settings.

Credits scale based on:

Video duration
Resolution
Audio usage (when supported and enabled)

Changing the aspect ratio does not affect credit consumption.

Credit usage is calculated before generation begins. Re-running a generation consumes credits again, even when the same settings are used. Credit costs may change as models and capabilities evolve.