Music Video Agent
Create complete music videos with AI Agent in one click
What is Music Video Agent?
Music Video Agent is your intelligent video director assistant. Simply upload your song audio, and AI will automatically analyze lyrics, plan scenes, generate visuals, and compile everything into a complete music video. No professional skills required—bring your music to life visually.
Core Workflow
Step 1: Create Project
- Enter your creative description (prompt) describing your desired visual style
- Upload your full song audio
- Choose aspect ratio, resolution, and duration
Step 2: AI Auto-Analysis
AI Agent will automatically:
- Transcribe the lyrics
- Divide scenes based on lyrical emotion and rhythm
- Generate professional-level prompts for each scene
Step 3: Generate Reference Images (Important)
[!IMPORTANT] Strongly recommended: Generate reference images first!
In the Storyboard panel, you can generate preview images for each scene. Confirm the visuals match your expectations before generating videos. This helps:
- Ensure visuals align with your creative vision
- Avoid wasting credits on mismatched results
- Achieve more precise control over video output
Step 4: One-Click Video Generation
After confirming all scene reference images, click "Generate All" and AI will automatically create video clips for each scene.
Step 5: Final Merge
Once all scene videos are generated, AI will automatically merge them with your audio to create the complete music video.
Mode Options
Auto Mode
When "Auto Mode" is enabled, AI will automatically start generating all scenes after analyzing the lyrics, without manual confirmation.
Best for:
- Confident in AI's creative output
- Want quick results
- Don't need to adjust each scene individually
Manual Mode (Recommended for Beginners)
When "Auto Mode" is disabled, you can review and adjust at each step:
Best for:
- Need precise control over each frame
- Want to confirm each scene's effect
- Have high visual quality requirements
Settings Guide
Choosing the Right Resolution
[!IMPORTANT] Resolution is set at project creation and applies to all scenes throughout the project. It cannot be changed mid-project.
| Resolution | Description | Credit Cost |
|---|---|---|
| 720p | Clear quality, cost-effective | 10 credits/second |
| 1080p | HD quality, outstanding output | 15 credits/second |
[!TIP] To save credits, we recommend choosing 720p resolution. 720p quality is sufficient for most use cases.
Choosing Aspect Ratio
| Aspect Ratio | Best For |
|---|---|
| Landscape (16:9) | YouTube, desktop viewing, traditional MV |
| Portrait (9:16) | TikTok, Instagram, Reels |
Setting Video Duration
You can select a specific segment of your audio or use the full length:
- 5-60 seconds: Quick preview or short clips
- Full length: Use the entire song
Prompt Writing Tips
When creating a project, your prompt should describe the overall visual style. For detailed prompt writing techniques, including:
- Core elements of a prompt (subject, action, environment, style)
- Good prompts vs bad prompts comparison
- Camera movement techniques
- Time markers (multi-shots mode)
- Narrative pacing guide
Please refer to AI Video Generator - Prompt Guide.
[!TIP] In the Storyboard panel, you can adjust prompts for each scene individually. Ensure visual style consistency across all scenes.
Using the Storyboard Panel
Preview and Adjust
Each scene card displays:
- Time range (e.g., 0:00 - 0:05)
- Corresponding lyrics snippet
- Image prompt
- Video motion prompt
You can:
- Click "Generate Image" to preview individual scene effects
- Edit prompts and regenerate
- Generate video only after confirming satisfaction
Batch Operations
- Generate All: One-click generation for all pending scene videos
- Merge Video: Combine all scenes into a complete music video
Credit Consumption
Agent feature credit costs include:
| Operation | Credit Cost |
|---|---|
| Audio Analysis | 1 credit |
| Generate Reference Image | 4 credits/image |
| Generate Video (720p) | 10 credits/second |
| Generate Video (1080p) | 15 credits/second |
| Final Merge | Free |
Example Calculation:
A 60-second song divided into 12 scenes (5 seconds each):
- Reference images: 12 × 4 = 48 credits
- Videos (720p): 60 × 10 = 600 credits
- Total: approximately 648 credits
FAQ
What if the generated video doesn't match expectations?
Recommendations:
- Generate reference images for each scene first
- Confirm image effects match your expectations
- If unsatisfied, adjust prompts and regenerate images
- Generate videos only after all reference images are confirmed
[!TIP] Using image-to-video is more stable and controllable than text-to-video.
How to avoid wasting credits?
- Use lower resolution: Choose 720p to save costs
- Preview with images first: Generating images is much cheaper than videos
- Avoid repeated generation: Confirm prompt satisfaction before generating
- Upload full audio: Let AI analyze automatically for better accuracy
How long does video generation take?
- Single scene: 1-2 minutes
- Complete project (e.g., 12 scenes): 15-30 minutes
- Final merge: 1-3 minutes
[!WARNING] Don't close the page during generation. Generated video links are valid for 1 hour only—download promptly!
Best Practices Summary
- ✅ Upload full audio and let AI analyze lyrics and auto-plan scenes
- ✅ Generate reference images first to confirm visuals before video generation
- ✅ Choose appropriate resolution: 720p for cost savings, 1080p for better quality
- ✅ Review each scene's prompts to ensure consistent visual style
- ✅ Generate all at once using the "Generate All" button for batch processing