Ideal for LLM Analysis Agents requiring advanced output testing and comparison capabilities. Agent for building cycling routes

How do I install promptfoo?

Run the command: npx killer-skills add bendrucker/route-agent/promptfoo. It works with Cursor, Windsurf, VS Code, Claude Code, and 19+ other IDEs.

What are the use cases for promptfoo?

Key use cases include: Testing LLM outputs against predefined prompts, Comparing performance of different LLM models, Debugging prompt engineering using inline prompts with variable substitution.

Which IDEs are compatible with promptfoo?

This skill is compatible with Cursor, Windsurf, VS Code, Trae, Claude Code, OpenClaw, Aider, Codex, OpenCode, Goose, Cline, Roo Code, Kiro, Augment Code, Continue, GitHub Copilot, Sourcegraph Cody, and Amazon Q Developer. Use the Killer-Skills CLI for universal one-command installation.

Are there any limitations for promptfoo?

Requires a config file in a supported format (.yaml, .json, .js). CLI tool only, may require additional setup for integration with other agents.

promptfoo

Install promptfoo, an AI agent skill for AI agent workflows and automation. Works with Claude Code, Cursor, and Windsurf with one-command setup.

SKILL.md

Readonly

Promptfoo

Name: promptfoo
Availability: InStock
Author: bendrucker

Promptfoo is a CLI tool for testing and comparing LLM outputs.

Config File

The CLI auto-discovers promptfooconfig.yaml in the current directory. Use -c path for other locations.

Supported extensions: .yaml, .json, .js

Configuration

yaml
1# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
2description: "What this eval tests"
3
4prompts:
5  - file://prompt.txt
6  - |
7    Inline prompt with {{variable}} substitution
8
9providers:
10  - anthropic:messages:claude-sonnet-4-5-20250929
11
12defaultTest:
13  options:
14    provider:
15      config:
16        temperature: 0.0
17        max_tokens: 4096
18
19tests:
20  - description: "What this case tests"
21    vars:
22      variable: "value"
23      from_file: file://data/input.txt
24    assert:
25      - type: contains
26        value: "expected substring"
27
28# Or load tests from files
29tests: file://cases/all.yaml
30
31outputPath: ./results.json
32
33evaluateOptions:
34  maxConcurrency: 4

Provider IDs

Model	ID
Opus 4.5	`anthropic:messages:claude-opus-4-5-20251101`
Sonnet 4.5	`anthropic:messages:claude-sonnet-4-5-20250929`
Haiku 4.5	`anthropic:messages:claude-haiku-4-5-20251001`

Provider config: temperature, max_tokens, top_p, top_k, tools, tool_choice

Prompts

file://path.txt — load from file (path relative to config)
Inline string with {{variable}} Nunjucks substitution
Chat format via JSON: [{"role": "system", "content": "..."}, {"role": "user", "content": "{{input}}"}]

Assertion Types

Type	Use	Value
`contains`	Substring match	`"expected text"`
`icontains`	Case-insensitive substring	`"expected text"`
`equals`	Exact match	`"exact value"`
`regex`	Pattern match	`"\\d{4}-\\d{2}-\\d{2}"`
`is-json`	Valid JSON output	—
`contains-json`	Output contains JSON	—
`starts-with`	Prefix match	`"prefix"`
`cost`	Max cost	`threshold: 0.01`
`latency`	Max response time (ms)	`threshold: 5000`
`javascript`	Custom JS expression	`output.includes('x')`
`python`	Custom Python	`file://check.py:fn_name`
`llm-rubric`	LLM-as-judge	rubric text
`similar`	Semantic similarity	`value: "text"`, `threshold: 0.8`
`model-graded-factuality`	Fact checking	—

Prefix any assertion with not- to negate (e.g., not-contains).

llm-rubric

Uses an LLM to grade output against a rubric:

yaml
1assert:
2  - type: llm-rubric
3    value: |
4      The response should:
5      - Mention at least 3 factors
6      - Include specific examples
7    threshold: 0.7
8    provider: anthropic:messages:claude-sonnet-4-5-20250929

javascript

Inline expressions or functions. Access output (string) and context (with vars, prompt):

yaml
1assert:
2  - type: javascript
3    value: output.length > 100 && output.includes('route')
4  - type: javascript
5    value: |
6      const data = JSON.parse(output);
7      return data.calories >= 200 && data.calories <= 300;

Test Organization

Split cases into separate files and reference them:

yaml
1tests:
2  - file://cases/basic.yaml
3  - file://cases/edge-cases.yaml

Each case file contains a YAML array of test objects.

CLI

bash
1npx promptfoo eval                         # Run with auto-discovered config
2npx promptfoo eval -c path/to/config.yaml  # Specific config
3npx promptfoo eval --filter-metadata key=v # Filter tests
4npx promptfoo view                         # Web UI for results
5npx promptfoo cache clear                  # Clear result cache

References

Consult the configuration reference and Anthropic provider docs for full details.

promptfoo — community promptfoo, route-agent, community, ide skills, Claude Code, Cursor, Windsurf

Agent Capability Analysis

Ideal Agent Persona

Core Value

↓ Capabilities Granted for promptfoo

! Prerequisites & Limits

Browser Sandbox Environment

⚡️ Ready to unleash?

promptfoo

Promptfoo

Config File

Configuration

Provider IDs

Prompts

Assertion Types

llm-rubric

javascript

Test Organization

CLI

References

FAQ & Installation Steps

? Frequently Asked Questions

What is promptfoo?

How do I install promptfoo?

What are the use cases for promptfoo?

Which IDEs are compatible with promptfoo?

Are there any limitations for promptfoo?

↓ How To Install

Related Skills

Looking for an alternative to promptfoo or another community skill for your workflow? Explore these related open-source skills.

widget-generator

flags

zustand

data-fetching

promptfoo — community promptfoo, route-agent, community, ide skills, Claude Code, Cursor, Windsurf

About this Skill

Agent Capability Analysis

Ideal Agent Persona

Core Value

↓ Capabilities Granted for promptfoo

! Prerequisites & Limits

Browser Sandbox Environment

⚡️ Ready to unleash?

promptfoo

Promptfoo

Config File

Configuration

Provider IDs

Prompts

Assertion Types

llm-rubric

javascript

Test Organization

CLI

References

FAQ & Installation Steps

? Frequently Asked Questions

What is promptfoo?

How do I install promptfoo?

What are the use cases for promptfoo?

Which IDEs are compatible with promptfoo?

Are there any limitations for promptfoo?

↓ How To Install

Related Skills

Looking for an alternative to promptfoo or another community skill for your workflow? Explore these related open-source skills.

widget-generator

flags

zustand

data-fetching