Music Video Agent

Create complete music videos with AI Agent in one click

👉 Try Music Video Agent Now

What is Music Video Agent?

Music Video Agent is your intelligent video director assistant. Simply upload your song audio, and AI will automatically analyze lyrics, plan scenes, generate visuals, and compile everything into a complete music video. No professional skills required—bring your music to life visually.

Core Workflow

Step 1: Create Project

  1. Enter your creative description (prompt) describing your desired visual style
  2. Upload your full song audio
  3. Choose aspect ratio, resolution, and duration

Step 2: AI Auto-Analysis

AI Agent will automatically:

  • Transcribe the lyrics
  • Divide scenes based on lyrical emotion and rhythm
  • Generate professional-level prompts for each scene

Step 3: Generate Reference Images (Important)

[!IMPORTANT] Strongly recommended: Generate reference images first!

In the Storyboard panel, you can generate preview images for each scene. Confirm the visuals match your expectations before generating videos. This helps:

  • Ensure visuals align with your creative vision
  • Avoid wasting credits on mismatched results
  • Achieve more precise control over video output

Step 4: One-Click Video Generation

After confirming all scene reference images, click "Generate All" and AI will automatically create video clips for each scene.

Step 5: Final Merge

Once all scene videos are generated, AI will automatically merge them with your audio to create the complete music video.


Mode Options

Auto Mode

When "Auto Mode" is enabled, AI will automatically start generating all scenes after analyzing the lyrics, without manual confirmation.

Best for:

  • Confident in AI's creative output
  • Want quick results
  • Don't need to adjust each scene individually

When "Auto Mode" is disabled, you can review and adjust at each step:

Best for:

  • Need precise control over each frame
  • Want to confirm each scene's effect
  • Have high visual quality requirements

Settings Guide

Choosing the Right Resolution

[!IMPORTANT] Resolution is set at project creation and applies to all scenes throughout the project. It cannot be changed mid-project.

ResolutionDescriptionCredit Cost
720pClear quality, cost-effective10 credits/second
1080pHD quality, outstanding output15 credits/second

[!TIP] To save credits, we recommend choosing 720p resolution. 720p quality is sufficient for most use cases.

Choosing Aspect Ratio

Aspect RatioBest For
Landscape (16:9)YouTube, desktop viewing, traditional MV
Portrait (9:16)TikTok, Instagram, Reels

Setting Video Duration

You can select a specific segment of your audio or use the full length:

  • 5-60 seconds: Quick preview or short clips
  • Full length: Use the entire song

Prompt Writing Tips

When creating a project, your prompt should describe the overall visual style. For detailed prompt writing techniques, including:

  • Core elements of a prompt (subject, action, environment, style)
  • Good prompts vs bad prompts comparison
  • Camera movement techniques
  • Time markers (multi-shots mode)
  • Narrative pacing guide

Please refer to AI Video Generator - Prompt Guide.

[!TIP] In the Storyboard panel, you can adjust prompts for each scene individually. Ensure visual style consistency across all scenes.


Using the Storyboard Panel

Preview and Adjust

Each scene card displays:

  • Time range (e.g., 0:00 - 0:05)
  • Corresponding lyrics snippet
  • Image prompt
  • Video motion prompt

You can:

  • Click "Generate Image" to preview individual scene effects
  • Edit prompts and regenerate
  • Generate video only after confirming satisfaction

Batch Operations

  • Generate All: One-click generation for all pending scene videos
  • Merge Video: Combine all scenes into a complete music video

Credit Consumption

Agent feature credit costs include:

OperationCredit Cost
Audio Analysis1 credit
Generate Reference Image4 credits/image
Generate Video (720p)10 credits/second
Generate Video (1080p)15 credits/second
Final MergeFree

Example Calculation:

A 60-second song divided into 12 scenes (5 seconds each):

  • Reference images: 12 × 4 = 48 credits
  • Videos (720p): 60 × 10 = 600 credits
  • Total: approximately 648 credits

FAQ

What if the generated video doesn't match expectations?

Recommendations:

  1. Generate reference images for each scene first
  2. Confirm image effects match your expectations
  3. If unsatisfied, adjust prompts and regenerate images
  4. Generate videos only after all reference images are confirmed

[!TIP] Using image-to-video is more stable and controllable than text-to-video.

How to avoid wasting credits?

  1. Use lower resolution: Choose 720p to save costs
  2. Preview with images first: Generating images is much cheaper than videos
  3. Avoid repeated generation: Confirm prompt satisfaction before generating
  4. Upload full audio: Let AI analyze automatically for better accuracy

How long does video generation take?

  • Single scene: 1-2 minutes
  • Complete project (e.g., 12 scenes): 15-30 minutes
  • Final merge: 1-3 minutes

[!WARNING] Don't close the page during generation. Generated video links are valid for 1 hour only—download promptly!


Best Practices Summary

  1. Upload full audio and let AI analyze lyrics and auto-plan scenes
  2. Generate reference images first to confirm visuals before video generation
  3. Choose appropriate resolution: 720p for cost savings, 1080p for better quality
  4. Review each scene's prompts to ensure consistent visual style
  5. Generate all at once using the "Generate All" button for batch processing