Video Generator
Create short-form videos using natural-language text prompts. Choose from multiple AI models, guide generation with images or audio, and explore creative motion, all with flexible control over duration, resolution, and style.
What you can do:
- Generate videos from text prompts (2-12 seconds depending on model)
- Guide generation with start images, end images, or audio
- Choose from 11 specialized models with different capabilities
- Control aspect ratio, resolution, and duration
- Enhance prompts for improved clarity
- Explore creative variations with probabilistic generation
Use Video Generator for exploratory, AI-generated motion where variation and experimentation are part of the creative process.
Quick Start
Think of the Video Generator as describing what should happen over time.
- You describe the motion or scene in text
- You optionally guide it with images or audio (if the model allows)
- A selected model generates a short video within its fixed limits
Each generation is independent and may produce different results.
Use Cases
Video Generator excels at creating short-form video content for exploration and ideation.
Ideal for:
- Social media content: Create engaging short videos for Instagram, TikTok, or YouTube Shorts
- Marketing and advertising: Generate quick promotional clips or product demonstrations
- Concept visualization: Explore motion ideas before committing to full production
- Creative experimentation: Test different visual directions and motion styles
- Storyboarding: Visualize scene transitions and camera movements
- Content variations: Generate multiple versions to compare and select the best
Best results when:
- You focus on describing motion and progression
- You're open to creative variations
- You iterate and refine prompts based on results
- You use the right model for your specific needs
How Video Generation Works
At a high level, video generation follows three stages:
- Prompt interpretation: The text prompt is analyzed to infer motion, subject behavior, environment, and visual style.
- Optional visual or audio grounding: A start image, end image, or audio input if supported by the selected model, is used to constrain or guide generation.
- Model execution: The selected model applies its internal constraints to generate a video within its supported duration, resolution, and aspect ratio.
The available controls and outputs depend entirely on the chosen model.
Generation Modes
Users can select between two generation modes depending on their creative needs:
Text-to-Video: Generate videos from text prompts alone. The AI interprets your description and creates motion, scenes, and visual content based purely on the text input.
Image-to-Video: Generate videos using a start image as the foundation. The AI animates the provided image according to your text prompt, creating motion that extends from the initial visual.
The selected mode determines which input options are available and how the generation process interprets your creative intent.
Input Options
The Video Generator supports multiple input types that influence how a video is generated. Depending on the selected model, users can provide text prompts, images, or audio to guide the generation process.
Text Prompt
All models require a text prompt. The prompt describes what should happen in the video, including motion, scene progression, and stylistic intent. Prompt clarity directly affects output quality. Vague prompts may result in unpredictable or unfocused motion.
How to think about video prompts
Video prompts work differently from image prompts. Instead of describing how something looks, effective video prompts describe what changes over time.
A strong video prompt focuses on:
- Motion and action
- How the scene evolves
- Transitions or progression from start to finish
Prompts that only describe static appearance often result in limited or repetitive motion.
Each video generation is unique, offering creative variations to explore. Small changes in wording can produce noticeably different motion, pacing, or framing.
Structuring an effective video prompt
You don't need a strict formula, but most successful video prompts naturally include:
- Starting state: What the scene looks like at the beginning
- Action or motion: What moves or happens
- Progression: How the motion changes over time
- End behavior (optional): How the scene settles or concludes
Focusing on motion and progression generally produces more coherent videos than adding visual detail alone.
Examples: static vs motion-aware prompts
Example 1: Static description (weak motion)
A futuristic city at night with neon lights and tall buildings.
This prompt describes appearance but gives little guidance on how the scene should move.
Example 1: Motion-aware description (stronger motion)
A futuristic city at night, with flying vehicles moving between skyscrapers as neon lights flicker and the camera slowly glides forward through the streets.
This version introduces movement, pacing, and camera progression.
Example 2: Vague action (unfocused motion)
A person walking through a forest.
The action is present, but the motion lacks direction or evolution.
Example 2: Progressive action (more coherent motion)
A person walking through a forest, leaves rustling as sunlight shifts through the trees, gradually transitioning from a wide shot to a closer view as the person moves forward.
This prompt guides how the scene changes over time.
Example 3: Overloaded prompt (conflicting motion)
A car driving fast, cinematic lighting, dramatic weather, explosions, slow motion, futuristic city, sunset, cyberpunk style.
Too many competing ideas can lead to inconsistent or unclear motion.
Example 3: Focused motion prompt (clear intent)
A car driving quickly through a futuristic city at sunset, with light rain and reflections on the road as the camera tracks smoothly alongside the vehicle.
This version prioritizes one main action and supports it with context.
Avoid image-style prompts
Prompts written like image descriptions often limit motion quality.
Common pitfalls include:
- Listing objects, colors, or styles without actions
- Describing a scene without verbs or transitions
- Combining too many unrelated ideas in a single prompt
Using verbs and temporal language such as “moves,” “gradually,” “transitions,” “shifts,” or “over time” helps guide motion.
Iterating on results
Video generation is designed for experimentation.
If the output isn't what you expect:
- Adjust one idea at a time instead of rewriting everything
- Simplify the prompt before adding more detail
- Re-run the generation to explore variations
Treat prompts as instructions to refine, not commands with guaranteed outcomes.
Start Image
Many models support a start image, which defines the first frame of the video. When provided, the model treats the image as visual context rather than generating the scene from scratch.
Using a start image is useful when:
- Visual consistency matters
- A specific subject or composition must be preserved
- The video should evolve from an existing asset
End Image
Some models allow an optional end image. In these cases, the video transitions from the start image toward the end image over the specified duration.
End images are best suited for:
- Controlled transitions
- Before-and-after style motion
- Predictable visual endpoints
Audio Input
Certain models support audio input, either by enabling sound generation or by attaching a provided audio track iteration.
When audio files are provided:
- Audio longer than the video is trimmed
- Audio shorter than the video results in silence for the remaining duration
Prompt Enhancement
Users can optionally enhance their prompt before generation. Prompt enhancement restructures or expands the input text to improve descriptive clarity.
This feature is intended to reduce ambiguity, not to change the user’s intent. It may improve consistency for complex prompts but does not guarantee higher-quality results.
Video Generation Models
Each video generation model is optimized for a specific balance of quality, control, duration, and credit consumption. Model selection determines the supported inputs, output characteristics, and available advanced settings.
MiniMax Hailuo 02
MiniMax Hailuo 02 supports both start and end images, enabling controlled transitions within a fixed-duration video. Aspect ratio handling is managed internally by the model.
Capabilities:
- Start image: Supported
- End image: Supported
- Supported aspect ratios: Model-defined
- Duration: 5 seconds
- Resolution: 1080p
- Audio: Not supported
MiniMax Hailuo 2.3
MiniMax Hailuo 2.3 supports image-guided video generation with a fixed duration and resolution. Aspect ratio selection is determined by the model.
Capabilities:
- Start image: Supported
- End image: Not supported
- Supported aspect ratios: Model-defined
- Duration: 5 seconds
- Resolution: 1080p
- Audio: Not supported
Kling 2.1 Master
Kling 2.1 Master is designed for higher-fidelity video generation with controlled duration. It supports image-guided generation and is suited for scenarios where visual coherence is more important than generation speed or cost.
Capabilities:
- Start image: Supported
- Duration: 5-10 seconds
- Resolution: Model-defined
- Audio: Not supported
Kling 2.6
Kling 2.6 extends earlier Kling models with support for both start and end images, enabling more controlled visual transitions. It supports common aspect ratios and allows optional sound, making it suitable for guided motion-based generation.
Capabilities:
- Start image: Supported
- End image: Supported
- Supported aspect ratios: 9:16, 16:9, 1:1
- Duration: 5-10 seconds
- Resolution: Model-defined
- Audio: Optional
LTX-2
LTX-2 focuses on high-resolution video generation with controlled duration. It supports image-guided generation and optional sound, producing outputs suitable for higher-quality visual use cases.
Capabilities:
- Start image: Supported
- End image: Not supported
- Supported aspect ratios: 16:9
- Duration: 6-10 seconds
- Resolution: 1080p-2160p
- Audio: Optional
LTX-2 Fast
LTX-2 Fast prioritizes longer video duration while maintaining high-resolution output. It supports image-guided generation with optional sound and is optimized for faster generation.
Capabilities:
- Start image: Supported
- End image: Not supported
- Supported aspect ratios: 16:9
- Duration: 6-20 seconds
- Resolution: 1080p-2160p
- Audio: Optional
Seedance 1.0 Fast
Seedance 1.0 Fast emphasizes flexibility and iteration speed. It supports a wide range of aspect ratios and resolutions, making it suitable for generating videos across multiple formats with moderate credit usage.
Capabilities:
- Start image: Supported
- End image: Not supported
- Aspect ratios: 21:9, 16:9, 4:3, 1:1, 3:4, 9:16, auto
- Duration: 2-12 seconds
- Resolution: 480p-1080p
- Audio: Not supported
Seedance 1.0 Light
Seedance 1.0 Light is optimized for cost efficiency while retaining support for controlled transitions. It allows both start and optional end images, making it suitable for simple animations and constrained visual progressions.
Capabilities:
- Start image: Supported
- End image: Optional
- Aspect ratios: 21:9, 16:9, 4:3, 1:1, 3:4, 9:16, auto
- Duration: 2-12 seconds
- Resolution: 480p-1080p
- Audio: Not supported
Seedance 1.0 Pro
Seedance 1.0 Pro extends the Light variant with increased computational investment. It supports the same inputs and output ranges while consuming more credits to deliver improved motion consistency.
Capabilities:
- Start image: Supported
- End image: Optional
- Aspect ratios: 21:9, 16:9, 4:3, 1:1, 3:4, 9:16, auto
- Duration: 2-12 seconds
- Resolution: 480p-1080p
- Audio: Not supported
OpenAI Sora 2
Sora 2 supports image-guided video generation with fixed resolution output. It is suitable for straightforward text-to-video or image-to-video use cases without extensive configuration.
Capabilities:
- Start image: Supported
- End image: Not supported
- Aspect ratios: 9:16, 16:9, auto
- Duration: 4-12 seconds
- Resolution: 720p
- Audio: Not supported
Google Veo 2
Google Veo 2 focuses on visual quality within short durations. It supports image-guided generation and produces videos with consistent output characteristics at higher credit cost.
Capabilities:
- Start image: Supported
- End image: Not supported
- Aspect ratios: 9:16, 16:9
- Duration: 5-8 seconds
- Resolution: 720p
- Audio: Not supported
Google Veo 3
Google Veo 3 expands on Veo 2 by supporting additional aspect ratios, higher resolutions, and optional sound. It is designed for richer audiovisual outputs with higher computational requirements.
Capabilities:
- Start image: Supported
- End image: Not supported
- Aspect ratios: 9:16, 16:9, 1:1, auto
- Duration: 4-8 seconds
- Resolution: 720p-1080p
- Audio: Optional
Google Veo 3.1 Fast
Veo 3.1 Fast introduces support for both start and end images while maintaining shorter durations. It offers controlled transitions with optional sound at a reduced credit cost compared to Veo 3.
Capabilities:
- Start image: Supported
- End image: Supported
- Aspect ratios: 9:16, 16:9, 1:1, auto
- Duration: 4-8 seconds
- Resolution: 720p
- Audio: Optional
Google Veo 3 Fast
Google Veo 3 Fast prioritizes faster generation while retaining support for higher resolutions. It supports image-guided generation with optional sound.
Capabilities:
- Start image: Supported
- End image: Not supported
- Aspect ratios: 9:16, 16:9
- Duration: 4-8 seconds
- Resolution: 720p-1080p
- Audio: Optional
Wan 2.2
Wan 2.2 is a cost-efficient model that supports basic audiovisual generation with optional end images. It is suitable for constrained workflows requiring moderate control.
Capabilities:
- Start image: Supported
- End image: Optional
- Aspect ratios: 9:16, 16:9, 1:1, auto
- Duration: 4-8 seconds
- Resolution: 480p-720p
- Audio: Optional
Wan 2.5
Wan 2.5 extends Wan 2.2 by supporting external audio input. Provided audio is treated as background music and adjusted to match video duration.
Capabilities:
- Start image: Supported
- End image: Not supported
- Audio input: Supported (URL or file)
- Audio behavior:
- Longer than video: trimmed
- Shorter than video: remaining video plays without audio
- Duration: 5-10 seconds
- Resolution: 480p-1080p
Model Capability Summary
| Model | Start Image | End Image | Audio Support | Duration (sec) | Resolution |
|---|---|---|---|---|---|
| Kling 2.1 Master | Yes | No | No | 5-10 | Model-defined |
| Kling 2.6 | Yes | Yes | Optional | 5-10 | Model-defined |
| Seedance 1.0 Fast | Yes | No | No | 2-12 | 480p-1080p |
| Seedance 1.0 Light | Yes | Optional | No | 2-12 | 480p-1080p |
| Seedance 1.0 Pro | Yes | Optional | No | 2-12 | 480p-1080p |
| OpenAI Sora 2 | Yes | No | No | 4-12 | 720p |
| Google Veo 2 | Yes | No | No | 5-8 | 720p |
| Google Veo 3 | Yes | No | Optional | 4-8 | 720p-1080p |
| Google Veo 3.1 Fast | Yes | Yes | Optional | 4-8 | 720p |
| Google Veo 3 Fast | Yes | No | Optional | 4-8 | 720p-1080p |
| LTX-2 | Yes | No | Optional | 6-10 | 1080p-2160p |
| LTX-2 Fast | Yes | No | Optional | 6-20 | 1080p-2160p |
| MiniMax Hailuo 02 | Yes | Yes | No | 5 | 1080p |
| MiniMax Hailuo 2.3 | Yes | No | No | 5 | 1080p |
| Wan 2.2 | Yes | Optional | Optional | 4-8 | 480p-720p |
| Wan 2.5 | Yes | No | Yes (external audio) | 5-10 | 480p-1080p |
Credit Consumption
Credit usage for video generation depends on the selected model and generation settings.
Credits scale based on:
- Video duration
- Resolution
- Audio usage (when supported and enabled)
Changing the aspect ratio does not affect credit consumption.
Credit usage is calculated before generation begins. Re-running a generation consumes credits again, even when the same settings are used. Credit costs may change as models and capabilities evolve.