promptfoo — community promptfoo, route-agent, community, ide skills, Claude Code, Cursor, Windsurf

v1.0.0
GitHub

About this Skill

Ideal for LLM Analysis Agents requiring advanced output testing and comparison capabilities. Agent for building cycling routes

bendrucker bendrucker
[0]
[0]
Updated: 3/5/2026

Agent Capability Analysis

The promptfoo skill by bendrucker is an open-source community AI agent skill for Claude Code and other IDE workflows, helping agents execute tasks with better context, repeatability, and domain-specific guidance.

Ideal Agent Persona

Ideal for LLM Analysis Agents requiring advanced output testing and comparison capabilities.

Core Value

Empowers agents to test and compare LLM outputs using a CLI tool with support for configuration files in .yaml, .json, and .js formats, enabling prompt substitution with variables and auto-discovery of config files.

Capabilities Granted for promptfoo

Testing LLM outputs against predefined prompts
Comparing performance of different LLM models
Debugging prompt engineering using inline prompts with variable substitution

! Prerequisites & Limits

  • Requires a config file in a supported format (.yaml, .json, .js)
  • CLI tool only, may require additional setup for integration with other agents
Labs Demo

Browser Sandbox Environment

⚡️ Ready to unleash?

Experience this Agent in a zero-setup browser environment powered by WebContainers. No installation required.

Boot Container Sandbox

promptfoo

Install promptfoo, an AI agent skill for AI agent workflows and automation. Works with Claude Code, Cursor, and Windsurf with one-command setup.

SKILL.md
Readonly

Promptfoo

Promptfoo is a CLI tool for testing and comparing LLM outputs.

Config File

The CLI auto-discovers promptfooconfig.yaml in the current directory. Use -c path for other locations.

Supported extensions: .yaml, .json, .js

Configuration

yaml
1# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json 2description: "What this eval tests" 3 4prompts: 5 - file://prompt.txt 6 - | 7 Inline prompt with {{variable}} substitution 8 9providers: 10 - anthropic:messages:claude-sonnet-4-5-20250929 11 12defaultTest: 13 options: 14 provider: 15 config: 16 temperature: 0.0 17 max_tokens: 4096 18 19tests: 20 - description: "What this case tests" 21 vars: 22 variable: "value" 23 from_file: file://data/input.txt 24 assert: 25 - type: contains 26 value: "expected substring" 27 28# Or load tests from files 29tests: file://cases/all.yaml 30 31outputPath: ./results.json 32 33evaluateOptions: 34 maxConcurrency: 4

Provider IDs

ModelID
Opus 4.5anthropic:messages:claude-opus-4-5-20251101
Sonnet 4.5anthropic:messages:claude-sonnet-4-5-20250929
Haiku 4.5anthropic:messages:claude-haiku-4-5-20251001

Provider config: temperature, max_tokens, top_p, top_k, tools, tool_choice

Prompts

  • file://path.txt — load from file (path relative to config)
  • Inline string with {{variable}} Nunjucks substitution
  • Chat format via JSON: [{"role": "system", "content": "..."}, {"role": "user", "content": "{{input}}"}]

Assertion Types

TypeUseValue
containsSubstring match"expected text"
icontainsCase-insensitive substring"expected text"
equalsExact match"exact value"
regexPattern match"\\d{4}-\\d{2}-\\d{2}"
is-jsonValid JSON output
contains-jsonOutput contains JSON
starts-withPrefix match"prefix"
costMax costthreshold: 0.01
latencyMax response time (ms)threshold: 5000
javascriptCustom JS expressionoutput.includes('x')
pythonCustom Pythonfile://check.py:fn_name
llm-rubricLLM-as-judgerubric text
similarSemantic similarityvalue: "text", threshold: 0.8
model-graded-factualityFact checking

Prefix any assertion with not- to negate (e.g., not-contains).

llm-rubric

Uses an LLM to grade output against a rubric:

yaml
1assert: 2 - type: llm-rubric 3 value: | 4 The response should: 5 - Mention at least 3 factors 6 - Include specific examples 7 threshold: 0.7 8 provider: anthropic:messages:claude-sonnet-4-5-20250929

javascript

Inline expressions or functions. Access output (string) and context (with vars, prompt):

yaml
1assert: 2 - type: javascript 3 value: output.length > 100 && output.includes('route') 4 - type: javascript 5 value: | 6 const data = JSON.parse(output); 7 return data.calories >= 200 && data.calories <= 300;

Test Organization

Split cases into separate files and reference them:

yaml
1tests: 2 - file://cases/basic.yaml 3 - file://cases/edge-cases.yaml

Each case file contains a YAML array of test objects.

CLI

bash
1npx promptfoo eval # Run with auto-discovered config 2npx promptfoo eval -c path/to/config.yaml # Specific config 3npx promptfoo eval --filter-metadata key=v # Filter tests 4npx promptfoo view # Web UI for results 5npx promptfoo cache clear # Clear result cache

References

Consult the configuration reference and Anthropic provider docs for full details.

FAQ & Installation Steps

These questions and steps mirror the structured data on this page for better search understanding.

? Frequently Asked Questions

What is promptfoo?

Ideal for LLM Analysis Agents requiring advanced output testing and comparison capabilities. Agent for building cycling routes

How do I install promptfoo?

Run the command: npx killer-skills add bendrucker/route-agent/promptfoo. It works with Cursor, Windsurf, VS Code, Claude Code, and 19+ other IDEs.

What are the use cases for promptfoo?

Key use cases include: Testing LLM outputs against predefined prompts, Comparing performance of different LLM models, Debugging prompt engineering using inline prompts with variable substitution.

Which IDEs are compatible with promptfoo?

This skill is compatible with Cursor, Windsurf, VS Code, Trae, Claude Code, OpenClaw, Aider, Codex, OpenCode, Goose, Cline, Roo Code, Kiro, Augment Code, Continue, GitHub Copilot, Sourcegraph Cody, and Amazon Q Developer. Use the Killer-Skills CLI for universal one-command installation.

Are there any limitations for promptfoo?

Requires a config file in a supported format (.yaml, .json, .js). CLI tool only, may require additional setup for integration with other agents.

How To Install

  1. 1. Open your terminal

    Open the terminal or command line in your project directory.

  2. 2. Run the install command

    Run: npx killer-skills add bendrucker/route-agent/promptfoo. The CLI will automatically detect your IDE or AI agent and configure the skill.

  3. 3. Start using the skill

    The skill is now active. Your AI agent can use promptfoo immediately in the current project.

Related Skills

Looking for an alternative to promptfoo or another community skill for your workflow? Explore these related open-source skills.

View All

widget-generator

Logo of f
f

f.k.a. Awesome ChatGPT Prompts. Share, discover, and collect prompts from the community. Free and open source — self-host for your organization with complete privacy.

149.6k
0
AI

flags

Logo of vercel
vercel

flags is a Next.js feature management skill that enables developers to efficiently add or modify framework feature flags, streamlining React application development.

138.4k
0
Browser

zustand

Logo of lobehub
lobehub

The ultimate space for work and life — to find, build, and collaborate with agent teammates that grow with you. We are taking agent harness to the next level — enabling multi-agent collaboration, effortless agent team design, and introducing agents as the unit of work interaction.

72.8k
0
AI

data-fetching

Logo of lobehub
lobehub

The ultimate space for work and life — to find, build, and collaborate with agent teammates that grow with you. We are taking agent harness to the next level — enabling multi-agent collaboration, effortless agent team design, and introducing agents as the unit of work interaction.

72.8k
0
AI