typescript-sdk
Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.
Browse and install thousands of AI Agent skills in the Killer-Skills directory. Supports Claude Code, Windsurf, Cursor, and more.
Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.
Add and manage evaluation results in Hugging Face model cards. Supports extracting eval tables from README content, importing scores from Artificial Analysis API, and running custom model evaluations with vLLM/lighteval. Works with the model-index metadata format.
The platform for LLM evaluations and AI agent testing
No description available
agent-evaluation is a LLM-as-judge evaluation framework that assesses AI-generated content quality using a weighted composite score and structured verdict with evidence citations.
Evaluation is a process for assessing agent systems, requiring approaches that account for dynamic decision-making and non-deterministic behavior.
Comprehensive UI/UX and Backend component design skills for AI-assisted development with Claude
deep-research is a skill that utilizes firecrawl and exa MCPs to synthesize findings from multiple sources, delivering comprehensive reports with source attribution.
Open source agents, skills, and lore for AI-powered creative work. Transform your AI assistant into a creative companion.
Centralized LLM configuration and documentation management system. Tools for building skills, commands, agents, prompts, and managing MCP servers. Multi-LLM support (Claude Code, Codex, OpenCode).
Advisor template - Read-only advisory agents with persona differentiation. Extends Worker archetype
Worker template - Foundation archetype for all AGET agents. Implements 35 CAP-* capabilities