audiocraft-audio-generation — audiocraft-audio-generation install audiocraft-audio-generation, ai-project-network, community, audiocraft-audio-generation install, ide skills, text-to-music generation, audio generation tools, music generation applications, Claude Code, Cursor, Windsurf

v1.0.0
GitHub

About this Skill

Perfect for Creative Agents needing advanced music and audio generation capabilities from text descriptions audiocraft-audio-generation is a comprehensive guide to using Meta's AudioCraft for text-to-music and text-to-audio generation with MusicGen, AudioGen, and EnCodec.

Features

Generates music from text descriptions using MusicGen
Creates sound effects and environmental audio with AudioGen
Supports melody-conditioned music generation
Produces stereo audio output
Allows controllable music generation with style transfer using EnCodec

# Core Topics

mohammedatiaa mohammedatiaa
[0]
[0]
Updated: 2/27/2026

Agent Capability Analysis

The audiocraft-audio-generation skill by mohammedatiaa is an open-source community AI agent skill for Claude Code and other IDE workflows, helping agents execute tasks with better context, repeatability, and domain-specific guidance. Optimized for audiocraft-audio-generation install, text-to-music generation, audio generation tools.

Ideal Agent Persona

Perfect for Creative Agents needing advanced music and audio generation capabilities from text descriptions

Core Value

Empowers agents to generate music and audio from text descriptions using Meta's AudioCraft, enabling melody-conditioned music generation, style transfer, and stereo audio output with MusicGen, AudioGen, and EnCodec

Capabilities Granted for audiocraft-audio-generation

Generating background music for videos from descriptive text
Creating sound effects and environmental audio for immersive experiences
Building music generation applications with controllable style transfer

! Prerequisites & Limits

  • Requires text descriptions as input
  • Limited to music and audio generation tasks
  • Dependent on Meta's AudioCraft and its supported libraries
Labs Demo

Browser Sandbox Environment

⚡️ Ready to unleash?

Experience this Agent in a zero-setup browser environment powered by WebContainers. No installation required.

Boot Container Sandbox

audiocraft-audio-generation

Install audiocraft-audio-generation, an AI agent skill for AI agent workflows and automation. Works with Claude Code, Cursor, and Windsurf with one-command...

SKILL.md
Readonly

AudioCraft: Audio Generation

Comprehensive guide to using Meta's AudioCraft for text-to-music and text-to-audio generation with MusicGen, AudioGen, and EnCodec.

When to use AudioCraft

Use AudioCraft when:

  • Need to generate music from text descriptions
  • Creating sound effects and environmental audio
  • Building music generation applications
  • Need melody-conditioned music generation
  • Want stereo audio output
  • Require controllable music generation with style transfer

Key features:

  • MusicGen: Text-to-music generation with melody conditioning
  • AudioGen: Text-to-sound effects generation
  • EnCodec: High-fidelity neural audio codec
  • Multiple model sizes: Small (300M) to Large (3.3B)
  • Stereo support: Full stereo audio generation
  • Style conditioning: MusicGen-Style for reference-based generation

Use alternatives instead:

  • Stable Audio: For longer commercial music generation
  • Bark: For text-to-speech with music/sound effects
  • Riffusion: For spectogram-based music generation
  • OpenAI Jukebox: For raw audio generation with lyrics

Quick start

Installation

bash
1# From PyPI 2pip install audiocraft 3 4# From GitHub (latest) 5pip install git+https://github.com/facebookresearch/audiocraft.git 6 7# Or use HuggingFace Transformers 8pip install transformers torch torchaudio

Basic text-to-music (AudioCraft)

python
1import torchaudio 2from audiocraft.models import MusicGen 3 4# Load model 5model = MusicGen.get_pretrained('facebook/musicgen-small') 6 7# Set generation parameters 8model.set_generation_params( 9 duration=8, # seconds 10 top_k=250, 11 temperature=1.0 12) 13 14# Generate from text 15descriptions = ["happy upbeat electronic dance music with synths"] 16wav = model.generate(descriptions) 17 18# Save audio 19torchaudio.save("output.wav", wav[0].cpu(), sample_rate=32000)

Using HuggingFace Transformers

python
1from transformers import AutoProcessor, MusicgenForConditionalGeneration 2import scipy 3 4# Load model and processor 5processor = AutoProcessor.from_pretrained("facebook/musicgen-small") 6model = MusicgenForConditionalGeneration.from_pretrained("facebook/musicgen-small") 7model.to("cuda") 8 9# Generate music 10inputs = processor( 11 text=["80s pop track with bassy drums and synth"], 12 padding=True, 13 return_tensors="pt" 14).to("cuda") 15 16audio_values = model.generate( 17 **inputs, 18 do_sample=True, 19 guidance_scale=3, 20 max_new_tokens=256 21) 22 23# Save 24sampling_rate = model.config.audio_encoder.sampling_rate 25scipy.io.wavfile.write("output.wav", rate=sampling_rate, data=audio_values[0, 0].cpu().numpy())

Text-to-sound with AudioGen

python
1from audiocraft.models import AudioGen 2 3# Load AudioGen 4model = AudioGen.get_pretrained('facebook/audiogen-medium') 5 6model.set_generation_params(duration=5) 7 8# Generate sound effects 9descriptions = ["dog barking in a park with birds chirping"] 10wav = model.generate(descriptions) 11 12torchaudio.save("sound.wav", wav[0].cpu(), sample_rate=16000)

Core concepts

Architecture overview

AudioCraft Architecture:
┌──────────────────────────────────────────────────────────────┐
│                    Text Encoder (T5)                          │
│                         │                                     │
│                    Text Embeddings                            │
└────────────────────────┬─────────────────────────────────────┘
                         │
┌────────────────────────▼─────────────────────────────────────┐
│              Transformer Decoder (LM)                         │
│     Auto-regressively generates audio tokens                  │
│     Using efficient token interleaving patterns               │
└────────────────────────┬─────────────────────────────────────┘
                         │
┌────────────────────────▼─────────────────────────────────────┐
│                EnCodec Audio Decoder                          │
│        Converts tokens back to audio waveform                 │
└──────────────────────────────────────────────────────────────┘

Model variants

ModelSizeDescriptionUse Case
musicgen-small300MText-to-musicQuick generation
musicgen-medium1.5BText-to-musicBalanced
musicgen-large3.3BText-to-musicBest quality
musicgen-melody1.5BText + melodyMelody conditioning
musicgen-melody-large3.3BText + melodyBest melody
musicgen-stereo-*VariesStereo outputStereo generation
musicgen-style1.5BStyle transferReference-based
audiogen-medium1.5BText-to-soundSound effects

Generation parameters

ParameterDefaultDescription
duration8.0Length in seconds (1-120)
top_k250Top-k sampling
top_p0.0Nucleus sampling (0 = disabled)
temperature1.0Sampling temperature
cfg_coef3.0Classifier-free guidance

MusicGen usage

Text-to-music generation

python
1from audiocraft.models import MusicGen 2import torchaudio 3 4model = MusicGen.get_pretrained('facebook/musicgen-medium') 5 6# Configure generation 7model.set_generation_params( 8 duration=30, # Up to 30 seconds 9 top_k=250, # Sampling diversity 10 top_p=0.0, # 0 = use top_k only 11 temperature=1.0, # Creativity (higher = more varied) 12 cfg_coef=3.0 # Text adherence (higher = stricter) 13) 14 15# Generate multiple samples 16descriptions = [ 17 "epic orchestral soundtrack with strings and brass", 18 "chill lo-fi hip hop beat with jazzy piano", 19 "energetic rock song with electric guitar" 20] 21 22# Generate (returns [batch, channels, samples]) 23wav = model.generate(descriptions) 24 25# Save each 26for i, audio in enumerate(wav): 27 torchaudio.save(f"music_{i}.wav", audio.cpu(), sample_rate=32000)

Melody-conditioned generation

python
1from audiocraft.models import MusicGen 2import torchaudio 3 4# Load melody model 5model = MusicGen.get_pretrained('facebook/musicgen-melody') 6model.set_generation_params(duration=30) 7 8# Load melody audio 9melody, sr = torchaudio.load("melody.wav") 10 11# Generate with melody conditioning 12descriptions = ["acoustic guitar folk song"] 13wav = model.generate_with_chroma(descriptions, melody, sr) 14 15torchaudio.save("melody_conditioned.wav", wav[0].cpu(), sample_rate=32000)

Stereo generation

python
1from audiocraft.models import MusicGen 2 3# Load stereo model 4model = MusicGen.get_pretrained('facebook/musicgen-stereo-medium') 5model.set_generation_params(duration=15) 6 7descriptions = ["ambient electronic music with wide stereo panning"] 8wav = model.generate(descriptions) 9 10# wav shape: [batch, 2, samples] for stereo 11print(f"Stereo shape: {wav.shape}") # [1, 2, 480000] 12torchaudio.save("stereo.wav", wav[0].cpu(), sample_rate=32000)

Audio continuation

python
1from transformers import AutoProcessor, MusicgenForConditionalGeneration 2 3processor = AutoProcessor.from_pretrained("facebook/musicgen-medium") 4model = MusicgenForConditionalGeneration.from_pretrained("facebook/musicgen-medium") 5 6# Load audio to continue 7import torchaudio 8audio, sr = torchaudio.load("intro.wav") 9 10# Process with text and audio 11inputs = processor( 12 audio=audio.squeeze().numpy(), 13 sampling_rate=sr, 14 text=["continue with a epic chorus"], 15 padding=True, 16 return_tensors="pt" 17) 18 19# Generate continuation 20audio_values = model.generate(**inputs, do_sample=True, guidance_scale=3, max_new_tokens=512)

MusicGen-Style usage

Style-conditioned generation

python
1from audiocraft.models import MusicGen 2 3# Load style model 4model = MusicGen.get_pretrained('facebook/musicgen-style') 5 6# Configure generation with style 7model.set_generation_params( 8 duration=30, 9 cfg_coef=3.0, 10 cfg_coef_beta=5.0 # Style influence 11) 12 13# Configure style conditioner 14model.set_style_conditioner_params( 15 eval_q=3, # RVQ quantizers (1-6) 16 excerpt_length=3.0 # Style excerpt length 17) 18 19# Load style reference 20style_audio, sr = torchaudio.load("reference_style.wav") 21 22# Generate with text + style 23descriptions = ["upbeat dance track"] 24wav = model.generate_with_style(descriptions, style_audio, sr)

Style-only generation (no text)

python
1# Generate matching style without text prompt 2model.set_generation_params( 3 duration=30, 4 cfg_coef=3.0, 5 cfg_coef_beta=None # Disable double CFG for style-only 6) 7 8wav = model.generate_with_style([None], style_audio, sr)

AudioGen usage

Sound effect generation

python
1from audiocraft.models import AudioGen 2import torchaudio 3 4model = AudioGen.get_pretrained('facebook/audiogen-medium') 5model.set_generation_params(duration=10) 6 7# Generate various sounds 8descriptions = [ 9 "thunderstorm with heavy rain and lightning", 10 "busy city traffic with car horns", 11 "ocean waves crashing on rocks", 12 "crackling campfire in forest" 13] 14 15wav = model.generate(descriptions) 16 17for i, audio in enumerate(wav): 18 torchaudio.save(f"sound_{i}.wav", audio.cpu(), sample_rate=16000)

EnCodec usage

Audio compression

python
1from audiocraft.models import CompressionModel 2import torch 3import torchaudio 4 5# Load EnCodec 6model = CompressionModel.get_pretrained('facebook/encodec_32khz') 7 8# Load audio 9wav, sr = torchaudio.load("audio.wav") 10 11# Ensure correct sample rate 12if sr != 32000: 13 resampler = torchaudio.transforms.Resample(sr, 32000) 14 wav = resampler(wav) 15 16# Encode to tokens 17with torch.no_grad(): 18 encoded = model.encode(wav.unsqueeze(0)) 19 codes = encoded[0] # Audio codes 20 21# Decode back to audio 22with torch.no_grad(): 23 decoded = model.decode(codes) 24 25torchaudio.save("reconstructed.wav", decoded[0].cpu(), sample_rate=32000)

Common workflows

Workflow 1: Music generation pipeline

python
1import torch 2import torchaudio 3from audiocraft.models import MusicGen 4 5class MusicGenerator: 6 def __init__(self, model_name="facebook/musicgen-medium"): 7 self.model = MusicGen.get_pretrained(model_name) 8 self.sample_rate = 32000 9 10 def generate(self, prompt, duration=30, temperature=1.0, cfg=3.0): 11 self.model.set_generation_params( 12 duration=duration, 13 top_k=250, 14 temperature=temperature, 15 cfg_coef=cfg 16 ) 17 18 with torch.no_grad(): 19 wav = self.model.generate([prompt]) 20 21 return wav[0].cpu() 22 23 def generate_batch(self, prompts, duration=30): 24 self.model.set_generation_params(duration=duration) 25 26 with torch.no_grad(): 27 wav = self.model.generate(prompts) 28 29 return wav.cpu() 30 31 def save(self, audio, path): 32 torchaudio.save(path, audio, sample_rate=self.sample_rate) 33 34# Usage 35generator = MusicGenerator() 36audio = generator.generate( 37 "epic cinematic orchestral music", 38 duration=30, 39 temperature=1.0 40) 41generator.save(audio, "epic_music.wav")

Workflow 2: Sound design batch processing

python
1import json 2from pathlib import Path 3from audiocraft.models import AudioGen 4import torchaudio 5 6def batch_generate_sounds(sound_specs, output_dir): 7 """ 8 Generate multiple sounds from specifications. 9 10 Args: 11 sound_specs: list of {"name": str, "description": str, "duration": float} 12 output_dir: output directory path 13 """ 14 model = AudioGen.get_pretrained('facebook/audiogen-medium') 15 output_dir = Path(output_dir) 16 output_dir.mkdir(exist_ok=True) 17 18 results = [] 19 20 for spec in sound_specs: 21 model.set_generation_params(duration=spec.get("duration", 5)) 22 23 wav = model.generate([spec["description"]]) 24 25 output_path = output_dir / f"{spec['name']}.wav" 26 torchaudio.save(str(output_path), wav[0].cpu(), sample_rate=16000) 27 28 results.append({ 29 "name": spec["name"], 30 "path": str(output_path), 31 "description": spec["description"] 32 }) 33 34 return results 35 36# Usage 37sounds = [ 38 {"name": "explosion", "description": "massive explosion with debris", "duration": 3}, 39 {"name": "footsteps", "description": "footsteps on wooden floor", "duration": 5}, 40 {"name": "door", "description": "wooden door creaking and closing", "duration": 2} 41] 42 43results = batch_generate_sounds(sounds, "sound_effects/")

Workflow 3: Gradio demo

python
1import gradio as gr 2import torch 3import torchaudio 4from audiocraft.models import MusicGen 5 6model = MusicGen.get_pretrained('facebook/musicgen-small') 7 8def generate_music(prompt, duration, temperature, cfg_coef): 9 model.set_generation_params( 10 duration=duration, 11 temperature=temperature, 12 cfg_coef=cfg_coef 13 ) 14 15 with torch.no_grad(): 16 wav = model.generate([prompt]) 17 18 # Save to temp file 19 path = "temp_output.wav" 20 torchaudio.save(path, wav[0].cpu(), sample_rate=32000) 21 return path 22 23demo = gr.Interface( 24 fn=generate_music, 25 inputs=[ 26 gr.Textbox(label="Music Description", placeholder="upbeat electronic dance music"), 27 gr.Slider(1, 30, value=8, label="Duration (seconds)"), 28 gr.Slider(0.5, 2.0, value=1.0, label="Temperature"), 29 gr.Slider(1.0, 10.0, value=3.0, label="CFG Coefficient") 30 ], 31 outputs=gr.Audio(label="Generated Music"), 32 title="MusicGen Demo" 33) 34 35demo.launch()

Performance optimization

Memory optimization

python
1# Use smaller model 2model = MusicGen.get_pretrained('facebook/musicgen-small') 3 4# Clear cache between generations 5torch.cuda.empty_cache() 6 7# Generate shorter durations 8model.set_generation_params(duration=10) # Instead of 30 9 10# Use half precision 11model = model.half()

Batch processing efficiency

python
1# Process multiple prompts at once (more efficient) 2descriptions = ["prompt1", "prompt2", "prompt3", "prompt4"] 3wav = model.generate(descriptions) # Single batch 4 5# Instead of 6for desc in descriptions: 7 wav = model.generate([desc]) # Multiple batches (slower)

GPU memory requirements

ModelFP32 VRAMFP16 VRAM
musicgen-small~4GB~2GB
musicgen-medium~8GB~4GB
musicgen-large~16GB~8GB

Common issues

IssueSolution
CUDA OOMUse smaller model, reduce duration
Poor qualityIncrease cfg_coef, better prompts
Generation too shortCheck max duration setting
Audio artifactsTry different temperature
Stereo not workingUse stereo model variant

References

Resources

FAQ & Installation Steps

These questions and steps mirror the structured data on this page for better search understanding.

? Frequently Asked Questions

What is audiocraft-audio-generation?

Perfect for Creative Agents needing advanced music and audio generation capabilities from text descriptions audiocraft-audio-generation is a comprehensive guide to using Meta's AudioCraft for text-to-music and text-to-audio generation with MusicGen, AudioGen, and EnCodec.

How do I install audiocraft-audio-generation?

Run the command: npx killer-skills add mohammedatiaa/ai-project-network/audiocraft-audio-generation. It works with Cursor, Windsurf, VS Code, Claude Code, and 19+ other IDEs.

What are the use cases for audiocraft-audio-generation?

Key use cases include: Generating background music for videos from descriptive text, Creating sound effects and environmental audio for immersive experiences, Building music generation applications with controllable style transfer.

Which IDEs are compatible with audiocraft-audio-generation?

This skill is compatible with Cursor, Windsurf, VS Code, Trae, Claude Code, OpenClaw, Aider, Codex, OpenCode, Goose, Cline, Roo Code, Kiro, Augment Code, Continue, GitHub Copilot, Sourcegraph Cody, and Amazon Q Developer. Use the Killer-Skills CLI for universal one-command installation.

Are there any limitations for audiocraft-audio-generation?

Requires text descriptions as input. Limited to music and audio generation tasks. Dependent on Meta's AudioCraft and its supported libraries.

How To Install

  1. 1. Open your terminal

    Open the terminal or command line in your project directory.

  2. 2. Run the install command

    Run: npx killer-skills add mohammedatiaa/ai-project-network/audiocraft-audio-generation. The CLI will automatically detect your IDE or AI agent and configure the skill.

  3. 3. Start using the skill

    The skill is now active. Your AI agent can use audiocraft-audio-generation immediately in the current project.

Related Skills

Looking for an alternative to audiocraft-audio-generation or another community skill for your workflow? Explore these related open-source skills.

View All

widget-generator

Logo of f
f

f.k.a. Awesome ChatGPT Prompts. Share, discover, and collect prompts from the community. Free and open source — self-host for your organization with complete privacy.

149.6k
0
AI

flags

Logo of vercel
vercel

flags is a Next.js feature management skill that enables developers to efficiently add or modify framework feature flags, streamlining React application development.

138.4k
0
Browser

zustand

Logo of lobehub
lobehub

The ultimate space for work and life — to find, build, and collaborate with agent teammates that grow with you. We are taking agent harness to the next level — enabling multi-agent collaboration, effortless agent team design, and introducing agents as the unit of work interaction.

72.8k
0
AI

data-fetching

Logo of lobehub
lobehub

The ultimate space for work and life — to find, build, and collaborate with agent teammates that grow with you. We are taking agent harness to the next level — enabling multi-agent collaboration, effortless agent team design, and introducing agents as the unit of work interaction.

72.8k
0
AI