Quick Start
Install the Musid AI skill for your AI coding agent and start generating music videos, images, and music with natural language.
Generate music videos, images, and music tracks directly from your AI coding agent — no code required.
Skill: github.com/musidai/skills
1. Get your API key
Sign in at musid.ai, then go to Settings → API Keys and create a key.
2. Install the skill
Pick your agent tool:
Claude Code
mkdir -p ~/.claude/skills/musidai
curl -fsSL https://raw.githubusercontent.com/musidai/skills/main/SKILL.md \
-o ~/.claude/skills/musidai/SKILL.mdThen set your API key:
claude config set MUSID_API_KEY your-key-hereUse it: type /musidai in any Claude Code session.
Cursor
Create .cursor/rules/musidai.mdc in your project:
curl -fsSL https://raw.githubusercontent.com/musidai/skills/main/SKILL.md \
-o .cursor/rules/musidai.mdcThen add to your .env or Cursor environment settings:
MUSID_API_KEY=your-key-hereCursor will automatically apply the rule when you ask about music video generation.
Codex
Global (applies to all projects):
mkdir -p ~/.codex
curl -fsSL https://raw.githubusercontent.com/musidai/skills/main/SKILL.md \
>> ~/.codex/instructions.mdProject-level (add to your repo's AGENTS.md):
curl -fsSL https://raw.githubusercontent.com/musidai/skills/main/SKILL.md >> AGENTS.mdSet the key in your environment:
export MUSID_API_KEY=your-key-hereGemini CLI
curl -fsSL https://raw.githubusercontent.com/musidai/skills/main/SKILL.md >> GEMINI.mdexport MUSID_API_KEY=your-key-hereOpenCode
mkdir -p ~/.opencode/skills/musidai
curl -fsSL https://raw.githubusercontent.com/musidai/skills/main/SKILL.md \
-o ~/.opencode/skills/musidai/SKILL.mdCline / OpenClaw (VS Code)
Open the Cline settings panel → Custom Instructions, paste the skill content:
curl -fsSL https://raw.githubusercontent.com/musidai/skills/main/SKILL.mdThen add MUSID_API_KEY=your-key-here to your VS Code environment or .env.
3. Use it
Just describe what you want in natural language. Your agent handles the API calls.
The full music video pipeline now defaults to Wan 2.7. If you provide audio, the agent will first wait for audio analysis to finish, then plan scenes, generate storyboard frames, and render scene videos.
Generate a music video: neon cyberpunk city at night, vertical formatCreate a music video with my track: https://example.com/song.mp3Create a lip-synced music video in Wan 2.7: moody rain-soaked rooftop performance, 16:9, cinematic lightingCreate a lower-cost non-lipsync video with Grok Imagine: surreal desert dance sequence, vertical formatGenerate an image: album cover, bold geometric shapes, 1:1 squareGenerate a music track: upbeat pop song about summer, no vocalsWhat it generates
| Command | Output |
|---|---|
| Music video (full) | Audio analysis + character design → scene planning → storyboard images → Wan 2.7 or Grok scene videos |
| Image | Single AI-generated image |
| Video clip | Single video scene with Wan 2.7 or Grok Imagine |
| Music track | Original AI-generated song |
Current model behavior
- Wan 2.7 is the default full-pipeline model and the recommended choice when you need lip-sync.
- Wan 2.7 supports
2-15sscene durations and can use multi-shot prompts with time markers like[0-3s],[3-8s]. - Grok Imagine is the lower-cost alternative for non-lip-sync scenes.
- Final merge still happens on the project page after all scene clips finish rendering.
View all results at musid.ai/my-creations.
