lamport-distributed-systems
Distributed adversarial behavioral security evaluation framework for LLMs - Swarm-based parallel probing with cryptographic consensus
Browse and install thousands of AI Agent skills in the Killer-Skills directory. Supports Claude Code, Windsurf, Cursor, and more.
Distributed adversarial behavioral security evaluation framework for LLMs - Swarm-based parallel probing with cryptographic consensus
Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.
A Multi-Agent System (MAS) evaluation framework using PydanticAI that generates and evaluates scientific paper reviews through a three-tiered assessment approach: traditional metrics, LLM-as-a-Judge, and graph-based complexity analysis.
review-paper is a specialized AI agent skill designed for editors to review and evaluate manuscripts using a dual-agent approach
Benchmark Manager is a skill that manages AILANG evaluation benchmarks with features like prompt integration, debugging, and best practices for AI model evaluation.
manuscript-reviewer is a specialized AI agent skill designed for evaluating manuscripts based on specific platform requirements, applying weighted assessments and generating detailed review reports
ck:research is a research methodology that follows a phased approach, including scope definition and systematic information gathering, to provide concise and honest research results.
dspy-expert is an AI Agent skill designed to help developers build and optimize programs using the DSPy framework. It emphasizes simple program structure, measurable iteration through a dataset→metric→compile loop, and compatibility with local OpenAI endpoints like Ollama, vLLM, and LM Studio.
Taiga-api is a Python-based interface for querying the Taiga evaluation platform API, enabling developers to retrieve job results, transcripts, and problem runs.
TEST - A modular Rust-based self-learning episodic memory system for AI agents, featuring hybrid storage with Turso (SQL) and redb (KV), async execution tracking, reward scoring, reflection, and pattern-based skill evolution. Designed for real-world applicability, maintainability, and scalable agent workflows.