browser-use — community browser-use, agentic_learning_coach, community, ide skills, Claude Code, Cursor, Windsurf

v1.0.0
GitHub

About this Skill

Ideal for Automation Agents requiring persistent browser sessions and multi-step workflow capabilities. Intelligent multi-agent system for personalized coding education. Features 7 AI agents, adaptive learning, secure code execution, gamification, and social learning. Built with FastAPI, React, PostgreSQL, and Docker. 356 tests, 90%+ coverage. Spec-driven development with Kiro CLI.

khushparag khushparag
[0]
[0]
Updated: 3/5/2026

Agent Capability Analysis

The browser-use skill by khushparag is an open-source community AI agent skill for Claude Code and other IDE workflows, helping agents execute tasks with better context, repeatability, and domain-specific guidance.

Ideal Agent Persona

Ideal for Automation Agents requiring persistent browser sessions and multi-step workflow capabilities.

Core Value

Empowers agents to automate complex browser interactions using FastAPI and Docker, while maintaining sessions across commands with the browser-use CLI, enabling seamless execution of workflows involving Chromium browser dependencies.

Capabilities Granted for browser-use

Automating multi-step browser workflows
Debugging browser-based applications with persistent sessions
Generating browser automation scripts for Chromium

! Prerequisites & Limits

  • Requires installation of browser dependencies (Chromium)
  • Dependent on uvx and pip for installation and execution
Labs Demo

Browser Sandbox Environment

⚡️ Ready to unleash?

Experience this Agent in a zero-setup browser environment powered by WebContainers. No installation required.

Boot Container Sandbox

browser-use

Install browser-use, an AI agent skill for AI agent workflows and automation. Works with Claude Code, Cursor, and Windsurf with one-command setup.

SKILL.md
Readonly

Browser Automation with browser-use CLI

The browser-use command provides fast, persistent browser automation. It maintains browser sessions across commands, enabling complex multi-step workflows.

Installation

bash
1# Run without installing (recommended for one-off use) 2uvx browser-use[cli] open https://example.com 3 4# Or install permanently 5uv pip install browser-use[cli] 6 7# Install browser dependencies (Chromium) 8browser-use install

Quick Start

bash
1browser-use open https://example.com # Navigate to URL 2browser-use state # Get page elements with indices 3browser-use click 5 # Click element by index 4browser-use type "Hello World" # Type text 5browser-use screenshot # Take screenshot 6browser-use close # Close browser

Core Workflow

  1. Navigate: browser-use open <url> - Opens URL (starts browser if needed)
  2. Inspect: browser-use state - Returns clickable elements with indices
  3. Interact: Use indices from state to interact (browser-use click 5, browser-use input 3 "text")
  4. Verify: browser-use state or browser-use screenshot to confirm actions
  5. Repeat: Browser stays open between commands

Browser Modes

bash
1browser-use --browser chromium open <url> # Default: headless Chromium 2browser-use --browser chromium --headed open <url> # Visible Chromium window 3browser-use --browser real open <url> # User's Chrome with login sessions 4browser-use --browser remote open <url> # Cloud browser (requires API key)
  • chromium: Fast, isolated, headless by default
  • real: Uses your Chrome with cookies, extensions, logged-in sessions
  • remote: Cloud-hosted browser with proxy support (requires BROWSER_USE_API_KEY)

Commands

bash
1browser-use open <url> # Navigate to URL 2browser-use back # Go back in history 3browser-use scroll down # Scroll down 4browser-use scroll up # Scroll up

Page State

bash
1browser-use state # Get URL, title, and clickable elements 2browser-use screenshot # Take screenshot (outputs base64) 3browser-use screenshot path.png # Save screenshot to file 4browser-use screenshot --full path.png # Full page screenshot

Interactions (use indices from browser-use state)

bash
1browser-use click <index> # Click element 2browser-use type "text" # Type text into focused element 3browser-use input <index> "text" # Click element, then type text 4browser-use keys "Enter" # Send keyboard keys 5browser-use keys "Control+a" # Send key combination 6browser-use select <index> "option" # Select dropdown option

Tab Management

bash
1browser-use switch <tab> # Switch to tab by index 2browser-use close-tab # Close current tab 3browser-use close-tab <tab> # Close specific tab

JavaScript & Data

bash
1browser-use eval "document.title" # Execute JavaScript, return result 2browser-use extract "all product prices" # Extract data using LLM (requires API key)

Python Execution (Persistent Session)

bash
1browser-use python "x = 42" # Set variable 2browser-use python "print(x)" # Access variable (outputs: 42) 3browser-use python "print(browser.url)" # Access browser object 4browser-use python --vars # Show defined variables 5browser-use python --reset # Clear Python namespace 6browser-use python --file script.py # Execute Python file

The Python session maintains state across commands. The browser object provides:

  • browser.url - Current page URL
  • browser.title - Page title
  • browser.goto(url) - Navigate
  • browser.click(index) - Click element
  • browser.type(text) - Type text
  • browser.screenshot(path) - Take screenshot
  • browser.scroll() - Scroll page
  • browser.html - Get page HTML

Agent Tasks (Requires API Key)

bash
1browser-use run "Fill the contact form with test data" # Run AI agent 2browser-use run "Extract all product prices" --max-steps 50

Agent tasks use an LLM to autonomously complete complex browser tasks. Requires BROWSER_USE_API_KEY or configured LLM API key (OPENAI_API_KEY, ANTHROPIC_API_KEY, etc).

Session Management

bash
1browser-use sessions # List active sessions 2browser-use close # Close current session 3browser-use close --all # Close all sessions

Server Control

bash
1browser-use server status # Check if server is running 2browser-use server stop # Stop server 3browser-use server logs # View server logs

Setup

bash
1browser-use install # Install Chromium and system dependencies

Global Options

OptionDescription
--session NAMEUse named session (default: "default")
--browser MODEBrowser mode: chromium, real, remote
--headedShow browser window (chromium mode)
--profile NAMEChrome profile (real mode only)
--jsonOutput as JSON
--api-key KEYOverride API key

Session behavior: All commands without --session use the same "default" session. The browser stays open and is reused across commands. Use --session NAME to run multiple browsers in parallel.

API Key Configuration

Some features (run, extract, --browser remote) require an API key. The CLI checks these locations in order:

  1. --api-key command line flag
  2. BROWSER_USE_API_KEY environment variable
  3. ~/.config/browser-use/config.json file

To configure permanently:

bash
1mkdir -p ~/.config/browser-use 2echo '{"api_key": "your-key-here"}' > ~/.config/browser-use/config.json

Examples

Form Submission

bash
1browser-use open https://example.com/contact 2browser-use state 3# Shows: [0] input "Name", [1] input "Email", [2] textarea "Message", [3] button "Submit" 4browser-use input 0 "John Doe" 5browser-use input 1 "john@example.com" 6browser-use input 2 "Hello, this is a test message." 7browser-use click 3 8browser-use state # Verify success

Multi-Session Workflows

bash
1browser-use --session work open https://work.example.com 2browser-use --session personal open https://personal.example.com 3browser-use --session work state # Check work session 4browser-use --session personal state # Check personal session 5browser-use close --all # Close both sessions

Data Extraction with Python

bash
1browser-use open https://example.com/products 2browser-use python " 3products = [] 4for i in range(20): 5 browser.scroll('down') 6browser.screenshot('products.png') 7" 8browser-use python "print(f'Captured {len(products)} products')"

Using Real Browser (Logged-In Sessions)

bash
1browser-use --browser real open https://gmail.com 2# Uses your actual Chrome with existing login sessions 3browser-use state # Already logged in!

Tips

  1. Always run browser-use state first to see available elements and their indices
  2. Use --headed for debugging to see what the browser is doing
  3. Sessions persist - the browser stays open between commands
  4. Use --json for parsing output programmatically
  5. Python variables persist across browser-use python commands within a session
  6. Real browser mode preserves your login sessions and extensions
  7. CLI aliases: bu, browser, and browseruse all work identically to browser-use

Troubleshooting

Browser won't start?

bash
1browser-use install # Install/reinstall Chromium 2browser-use server stop # Stop any stuck server 3browser-use --headed open <url> # Try with visible window

Element not found?

bash
1browser-use state # Check current elements 2browser-use scroll down # Element might be below fold 3browser-use state # Check again

Session issues?

bash
1browser-use sessions # Check active sessions 2browser-use close --all # Clean slate 3browser-use open <url> # Fresh start

Cleanup

Always close the browser when done. Run this after completing browser automation:

bash
1browser-use close

FAQ & Installation Steps

These questions and steps mirror the structured data on this page for better search understanding.

? Frequently Asked Questions

What is browser-use?

Ideal for Automation Agents requiring persistent browser sessions and multi-step workflow capabilities. Intelligent multi-agent system for personalized coding education. Features 7 AI agents, adaptive learning, secure code execution, gamification, and social learning. Built with FastAPI, React, PostgreSQL, and Docker. 356 tests, 90%+ coverage. Spec-driven development with Kiro CLI.

How do I install browser-use?

Run the command: npx killer-skills add khushparag/agentic_learning_coach. It works with Cursor, Windsurf, VS Code, Claude Code, and 19+ other IDEs.

What are the use cases for browser-use?

Key use cases include: Automating multi-step browser workflows, Debugging browser-based applications with persistent sessions, Generating browser automation scripts for Chromium.

Which IDEs are compatible with browser-use?

This skill is compatible with Cursor, Windsurf, VS Code, Trae, Claude Code, OpenClaw, Aider, Codex, OpenCode, Goose, Cline, Roo Code, Kiro, Augment Code, Continue, GitHub Copilot, Sourcegraph Cody, and Amazon Q Developer. Use the Killer-Skills CLI for universal one-command installation.

Are there any limitations for browser-use?

Requires installation of browser dependencies (Chromium). Dependent on uvx and pip for installation and execution.

How To Install

  1. 1. Open your terminal

    Open the terminal or command line in your project directory.

  2. 2. Run the install command

    Run: npx killer-skills add khushparag/agentic_learning_coach. The CLI will automatically detect your IDE or AI agent and configure the skill.

  3. 3. Start using the skill

    The skill is now active. Your AI agent can use browser-use immediately in the current project.

Related Skills

Looking for an alternative to browser-use or another community skill for your workflow? Explore these related open-source skills.

View All

widget-generator

Logo of f
f

f.k.a. Awesome ChatGPT Prompts. Share, discover, and collect prompts from the community. Free and open source — self-host for your organization with complete privacy.

149.6k
0
AI

flags

Logo of vercel
vercel

flags is a Next.js feature management skill that enables developers to efficiently add or modify framework feature flags, streamlining React application development.

138.4k
0
Browser

zustand

Logo of lobehub
lobehub

The ultimate space for work and life — to find, build, and collaborate with agent teammates that grow with you. We are taking agent harness to the next level — enabling multi-agent collaboration, effortless agent team design, and introducing agents as the unit of work interaction.

72.8k
0
AI

data-fetching

Logo of lobehub
lobehub

The ultimate space for work and life — to find, build, and collaborate with agent teammates that grow with you. We are taking agent harness to the next level — enabling multi-agent collaboration, effortless agent team design, and introducing agents as the unit of work interaction.

72.8k
0
AI