evaluation
Evaluation is a process for assessing agent systems, requiring approaches that account for dynamic decision-making and non-deterministic behavior.
Browse AI and ML workflow skills for model integration, prompt engineering, evaluations, and LLM automation across major IDEs.
This directory brings installable AI Agent skills into one place so you can filter by search, category, topic, and official source, then install them directly into Claude Code, Cursor, Windsurf, and other supported environments.
Evaluation is a process for assessing agent systems, requiring approaches that account for dynamic decision-making and non-deterministic behavior.
sop-code-review is a code review workflow that utilizes parallel automated testing and specialized reviewers to ensure high-quality code, following a 4-phase process: automated checks, specialized reviews, integration review, and final approval.
mcp-client is a universal client for connecting to MCP servers, allowing for efficient browser session management and multi-step operations.
SurrealDB Expert is a high-risk, high-reward skill for elite developers, specializing in multi-model database systems, SurrealQL, graph modeling, and advanced security features.
validate-historical is a skill that audits historical data integrity across a date range, identifying gaps and providing specific remediation steps to resolve cascade effects.
agent-browser is a browser automation tool that allows developers to navigate, snapshot, and interact with web pages using specific commands and references
skill-improver is a technical skill that researches current best practices and improves existing Claude skills through analysis and recommendations
organon-tools-developer is a Claude skill for agents developing organon-tools, ensuring that the tools follow the methodology specification.
mcp-cloudflare is a skill that allows interaction with Cloudflare services via MCP, facilitating file-based pipeline management and workflow integration.
User-defined-command is a skill that allows developers to create custom YAML-based command definitions stored as markdown files in the Steward/Commands folder.
review-prompt is a skill that generates smart App Store review prompting with configurable trigger conditions, platform detection, and proper timing logic.
Planning is a skill that generates and optimizes Product Requirements Documents and Implementation Plans as AI artifacts, using file-based context caches.