multimodal-io — community multimodal-io, Xorium-Stealer-Pulsar, community, ide skills, Claude Code, Cursor, Windsurf

v1.0.0
GitHub

About this Skill

Perfect for AI Agents needing unified multimodal content processing and generation via the Gemini 3 API. An Xorium Stealer Pulsar that I stole from the darknet and modified.

Trongdepzai-dev Trongdepzai-dev
[0]
[0]
Updated: 3/5/2026

Agent Capability Analysis

The multimodal-io skill by Trongdepzai-dev is an open-source community AI agent skill for Claude Code and other IDE workflows, helping agents execute tasks with better context, repeatability, and domain-specific guidance.

Ideal Agent Persona

Perfect for AI Agents needing unified multimodal content processing and generation via the Gemini 3 API.

Core Value

Empowers agents to process and generate multimodal content using the Gemini 3 API, leveraging libraries like google-genai and pillow, and supporting environment variable configuration through .env files.

Capabilities Granted for multimodal-io

Generating multimodal content for diverse applications
Processing multimodal inputs via the Gemini 3 API
Automating content creation with unified interface

! Prerequisites & Limits

  • Requires Gemini API Key
  • Dependent on google-genai and pillow libraries
  • Limited to Gemini 3 API compatibility
Labs Demo

Browser Sandbox Environment

⚡️ Ready to unleash?

Experience this Agent in a zero-setup browser environment powered by WebContainers. No installation required.

Boot Container Sandbox

multimodal-io

Install multimodal-io, an AI agent skill for AI agent workflows and automation. Works with Claude Code, Cursor, and Windsurf with one-command setup.

SKILL.md
Readonly

Multimodal I/O

Unified interface for processing and generating multimodal content via Gemini 3 API.

Setup

bash
1pip install google-genai pillow

API Key Configuration

The script auto-loads GEMINI_API_KEY from .env files in priority order:

  1. Skill dir: .gemini/extensions/multimodal-io/.env (highest)
  2. Project root: .env (searches up to git root)
  3. User home: ~/.gemini/.env, ~/.gemini.env, ~/.env

Environment variables set directly (export GEMINI_API_KEY=...) take precedence over all .env files.

CLI Usage

bash
1# Process any media 2python scripts/mmio.py process <file> --prompt "describe this" 3 4# Generate image 5python scripts/mmio.py imagine "sunset over mountains" --ratio 16:9 6 7# Generate video 8python scripts/mmio.py video "waves on beach" --duration 8 9 10# Transcribe audio/video 11python scripts/mmio.py transcribe <file> --timestamps 12 13# Convert document to markdown 14python scripts/mmio.py convert <pdf|docx> --output result.md

Python API

python
1from mmio import MMIO 2 3mm = MMIO() 4 5# Analyze any media 6result = mm.process("photo.jpg", "what objects are visible?") 7 8# Generate image 9img = mm.imagine("cyberpunk city", ratio="16:9", size="2K") 10 11# Generate video 12vid = mm.video("flying through clouds", resolution="1080p") 13 14# Transcribe with timestamps 15text = mm.transcribe("meeting.mp3", timestamps=True)

Models

TaskDefaultAlternatives
Analysisgemini-3-flash-previewgemini-3-pro-preview
Image Gengemini-2.5-flash-imageimagen-4.0-*, gemini-3-pro-preview-image
Video Genveo-3.1-generate-previewveo-3.1-fast-*
Transcriptiongemini-3-flash-previewgemini-3-pro-preview

Key Features

  • Media Resolution: Control quality/token tradeoff (low/medium/high)
  • Thinking Levels: Adjust reasoning depth (minimal/low/high)
  • Streaming: Real-time output for long operations
  • Auto-chunking: Handles large files automatically

References

TopicFile
Input Processingreferences/inputs.md
Image Generationreferences/image-gen.md
Video Generationreferences/video-gen.md
Streaming & Livereferences/streaming.md

Limits

  • Inline: 20MB | File API: 2GB
  • Audio: 9.5h max | Video: 1h max
  • Image gen: 4 per request | Video: 8s duration

FAQ & Installation Steps

These questions and steps mirror the structured data on this page for better search understanding.

? Frequently Asked Questions

What is multimodal-io?

Perfect for AI Agents needing unified multimodal content processing and generation via the Gemini 3 API. An Xorium Stealer Pulsar that I stole from the darknet and modified.

How do I install multimodal-io?

Run the command: npx killer-skills add Trongdepzai-dev/Xorium-Stealer-Pulsar/multimodal-io. It works with Cursor, Windsurf, VS Code, Claude Code, and 19+ other IDEs.

What are the use cases for multimodal-io?

Key use cases include: Generating multimodal content for diverse applications, Processing multimodal inputs via the Gemini 3 API, Automating content creation with unified interface.

Which IDEs are compatible with multimodal-io?

This skill is compatible with Cursor, Windsurf, VS Code, Trae, Claude Code, OpenClaw, Aider, Codex, OpenCode, Goose, Cline, Roo Code, Kiro, Augment Code, Continue, GitHub Copilot, Sourcegraph Cody, and Amazon Q Developer. Use the Killer-Skills CLI for universal one-command installation.

Are there any limitations for multimodal-io?

Requires Gemini API Key. Dependent on google-genai and pillow libraries. Limited to Gemini 3 API compatibility.

How To Install

  1. 1. Open your terminal

    Open the terminal or command line in your project directory.

  2. 2. Run the install command

    Run: npx killer-skills add Trongdepzai-dev/Xorium-Stealer-Pulsar/multimodal-io. The CLI will automatically detect your IDE or AI agent and configure the skill.

  3. 3. Start using the skill

    The skill is now active. Your AI agent can use multimodal-io immediately in the current project.

Related Skills

Looking for an alternative to multimodal-io or another community skill for your workflow? Explore these related open-source skills.

View All

widget-generator

Logo of f
f

f.k.a. Awesome ChatGPT Prompts. Share, discover, and collect prompts from the community. Free and open source — self-host for your organization with complete privacy.

149.6k
0
AI

flags

Logo of vercel
vercel

flags is a Next.js feature management skill that enables developers to efficiently add or modify framework feature flags, streamlining React application development.

138.4k
0
Browser

zustand

Logo of lobehub
lobehub

The ultimate space for work and life — to find, build, and collaborate with agent teammates that grow with you. We are taking agent harness to the next level — enabling multi-agent collaboration, effortless agent team design, and introducing agents as the unit of work interaction.

72.8k
0
AI

data-fetching

Logo of lobehub
lobehub

The ultimate space for work and life — to find, build, and collaborate with agent teammates that grow with you. We are taking agent harness to the next level — enabling multi-agent collaboration, effortless agent team design, and introducing agents as the unit of work interaction.

72.8k
0
AI