What is multimodal-io?

Perfect for AI Agents needing unified multimodal content processing and generation via the Gemini 3 API. An Xorium Stealer Pulsar that I stole from the darknet and modified.

How do I install multimodal-io?

Run the command: npx killer-skills add Trongdepzai-dev/Xorium-Stealer-Pulsar/multimodal-io. It works with Cursor, Windsurf, VS Code, Claude Code, and 19+ other IDEs.

What are the use cases for multimodal-io?

Key use cases include: Generating multimodal content for diverse applications, Processing multimodal inputs via the Gemini 3 API, Automating content creation with unified interface.

Which IDEs are compatible with multimodal-io?

This skill is compatible with Cursor, Windsurf, VS Code, Trae, Claude Code, OpenClaw, Aider, Codex, OpenCode, Goose, Cline, Roo Code, Kiro, Augment Code, Continue, GitHub Copilot, Sourcegraph Cody, and Amazon Q Developer. Use the Killer-Skills CLI for universal one-command installation.

Are there any limitations for multimodal-io?

Requires Gemini API Key. Dependent on google-genai and pillow libraries. Limited to Gemini 3 API compatibility.

multimodal-io

Install multimodal-io, an AI agent skill for AI agent workflows and automation. Works with Claude Code, Cursor, and Windsurf with one-command setup.

SKILL.md

Readonly

Multimodal I/O

Name: multimodal-io
Availability: InStock
Author: Trongdepzai-dev

Unified interface for processing and generating multimodal content via Gemini 3 API.

Setup

bash
1pip install google-genai pillow

API Key Configuration

The script auto-loads GEMINI_API_KEY from .env files in priority order:

Skill dir: .gemini/extensions/multimodal-io/.env (highest)
Project root: .env (searches up to git root)
User home: ~/.gemini/.env, ~/.gemini.env, ~/.env

Environment variables set directly (export GEMINI_API_KEY=...) take precedence over all .env files.

CLI Usage

bash
1# Process any media
2python scripts/mmio.py process <file> --prompt "describe this"
3
4# Generate image
5python scripts/mmio.py imagine "sunset over mountains" --ratio 16:9
6
7# Generate video
8python scripts/mmio.py video "waves on beach" --duration 8
9
10# Transcribe audio/video
11python scripts/mmio.py transcribe <file> --timestamps
12
13# Convert document to markdown
14python scripts/mmio.py convert <pdf|docx> --output result.md

Python API

python
1from mmio import MMIO
2
3mm = MMIO()
4
5# Analyze any media
6result = mm.process("photo.jpg", "what objects are visible?")
7
8# Generate image
9img = mm.imagine("cyberpunk city", ratio="16:9", size="2K")
10
11# Generate video
12vid = mm.video("flying through clouds", resolution="1080p")
13
14# Transcribe with timestamps
15text = mm.transcribe("meeting.mp3", timestamps=True)

Models

Task	Default	Alternatives
Analysis	gemini-3-flash-preview	gemini-3-pro-preview
Image Gen	gemini-2.5-flash-image	imagen-4.0-*, gemini-3-pro-preview-image
Video Gen	veo-3.1-generate-preview	veo-3.1-fast-*
Transcription	gemini-3-flash-preview	gemini-3-pro-preview

Key Features

Media Resolution: Control quality/token tradeoff (low/medium/high)
Thinking Levels: Adjust reasoning depth (minimal/low/high)
Streaming: Real-time output for long operations
Auto-chunking: Handles large files automatically

References

Topic	File
Input Processing	`references/inputs.md`
Image Generation	`references/image-gen.md`
Video Generation	`references/video-gen.md`
Streaming & Live	`references/streaming.md`

Limits

Inline: 20MB | File API: 2GB
Audio: 9.5h max | Video: 1h max
Image gen: 4 per request | Video: 8s duration

multimodal-io — community multimodal-io, Xorium-Stealer-Pulsar, community, ide skills, Claude Code, Cursor, Windsurf

Agent Capability Analysis

Ideal Agent Persona

Core Value

↓ Capabilities Granted for multimodal-io

! Prerequisites & Limits

Browser Sandbox Environment

⚡️ Ready to unleash?

multimodal-io

Multimodal I/O

Setup

API Key Configuration

CLI Usage

Python API

Models

Key Features

References

Limits

FAQ & Installation Steps

? Frequently Asked Questions

What is multimodal-io?

How do I install multimodal-io?

What are the use cases for multimodal-io?

Which IDEs are compatible with multimodal-io?

Are there any limitations for multimodal-io?

↓ How To Install

Related Skills

Looking for an alternative to multimodal-io or another community skill for your workflow? Explore these related open-source skills.

widget-generator

flags

zustand

data-fetching

multimodal-io — community multimodal-io, Xorium-Stealer-Pulsar, community, ide skills, Claude Code, Cursor, Windsurf

About this Skill

Agent Capability Analysis

Ideal Agent Persona

Core Value

↓ Capabilities Granted for multimodal-io

! Prerequisites & Limits

Browser Sandbox Environment

⚡️ Ready to unleash?

multimodal-io

Multimodal I/O

Setup

API Key Configuration

CLI Usage

Python API

Models

Key Features

References

Limits

FAQ & Installation Steps

? Frequently Asked Questions

What is multimodal-io?

How do I install multimodal-io?

What are the use cases for multimodal-io?

Which IDEs are compatible with multimodal-io?

Are there any limitations for multimodal-io?

↓ How To Install

Related Skills

Looking for an alternative to multimodal-io or another community skill for your workflow? Explore these related open-source skills.

widget-generator

flags

zustand

data-fetching