Best Eval AI Skills & MCP Servers
64 curated Eval skills and MCP servers — install any of them into Claude, Cursor, ChatGPT, n8n, or any AI stack with one command.
Pdf Reader
MCP server for efficient PDF text extraction, search, and metadata retrieval for Claude Code
Paper Search Agent
MCP server for paper-search-agent: academic paper discovery, access planning, and full-text retrieval via campus network
Skar
Skar turns a captured AI agent trace into a committed pytest regression test. MCP server + CLI. Use when a tool-using agent run fails and you want to lock the failure as an executable test.
Recourse Cli
MCP server for AI agents to evaluate consequences before destructive actions. Analyzes Terraform plans, shell commands, and MCP tool calls.
Prism
Prism Coder — Cognitive memory + tool-calling intelligence for AI agents. Mind Palace persistent memory (BFCL Gold Certified, 100% Tool-Call Accuracy, 54 Agent Skills, Zero-Search HDC/HRR retrieval, HRR Semantic Drift Detection across BCBA/Coding/AAC doma
Mnemo
Structured fact memory MCP server — SQLite + FTS5, trust scoring, entity graph, bilingual retrieval for Claude Code & Codex
Tuningengines Cli
Tuning Engines CLI, MCP server, and Python agent runtime adapters for governed model, agent, skill, and MCP workflows. Fine-tune open-source LLMs, run inference, manage datasets/evaluations, and connect LangGraph or Temporal while Tuning Engines handles p
Superlocalmemory
Information-geometric agent memory with mathematical guarantees. 4-channel retrieval, Fisher-Rao similarity, zero-LLM mode, EU AI Act compliant. Works with Claude, Cursor, Windsurf, and 17+ AI tools.
Calculator
Evaluate, simplify, and differentiate mathematical expressions via MCP. STDIO or Streamable HTTP.
Ai Agent Guidelines
MCP server exposing public instruction workflows as tools, backed by hidden AI agent skills for requirements, orchestration, quality, research, evaluation, governance, resilience, and physics-inspired analysis
Clawmem
On-device memory layer for AI agents. Claude Code, OpenClaw, and Hermes. Hooks + MCP server + hybrid RAG search.
Sciverse
Sciverse MCP server — exposes academic paper retrieval (search_papers / semantic_search / read_content) to MCP-compatible coding agents (Claude Code, Cursor, Codex CLI, Windsurf, ...).
Enquire
MCP server giving AI agents (Claude Code, Claude Desktop, Cursor, ChatGPT, Codex, OpenClaw) persistent long-term memory backed by your local Obsidian markdown vault. Hybrid retrieval (BM25 + ML embeddings + BGE reranker, RRF-fused), HNSW + int8 quantizati
Workpaper
WorkPaper API, CLI evaluator, and MCP server for headless spreadsheet formulas in Node.js services and agents.
Cogmemai
CogmemAi: Autonomous Cognitive Memory for Any Ai System. 95.10% on LongMemEval (top published score on the field's hardest long-term memory benchmark) and 91% on LoCoMo (above human performance). Autonomous memory capture: your Ai's work is saved even whe
Server
The agent eval standard for MCP. Score every agent output for quality, safety, and cost.
Judges
45 specialized judges that evaluate AI-generated code for security, cost, and quality.
Md Feedback
MCP server for markdown plan review — companion to the MD Feedback VS Code extension. AI agents read annotations, mark tasks done, evaluate quality gates, and generate session handoffs. 27 tools for Claude Code, Cursor, and other MCP-compatible clients.
Formulon
MCP server for Formulon Excel-compatible formula and workbook evaluation
Vulcn
Security evals for the AI era. Probes · Targets · Graders · Proof. Confirmed XSS / SQLi / BOLA / prompt-injection / MCP-RCE with reproducible proof attached to every finding.
Ori Memory
Cognitive architecture for persistent AI agent memory. Knowledge graph with learning retrieval, ACT-R decay, and spreading activation. Markdown-native, local-first, zero cloud. MCP server + CLI.
Sigil
Persistent memory for AI coding agents. Local-first knowledge engine with atomic facts, entity graph, and hybrid retrieval. Auto-integrated with Claude Code via hooks; MCP-native for Cursor, Continue, Cline, Windsurf, and any other MCP client.
Lightrag
Model Context Protocol (MCP) server for LightRAG - 30 fully working tools with complete RAG and Knowledge Graph integration
Memory Lancedb
MCP server for LanceDB-backed long-term memory with hybrid retrieval (Vector + BM25), cross-encoder rerank, multi-scope isolation, and memory lifecycle management
About Eval skills on iClaude
iClaude is the universal install layer for AI skills. Every Eval skill on this page can be installed into Claude Code, Claude Desktop, Cursor, ChatGPT, n8n, Codex, and more — using a single copy-paste command. No config drift, no per-stack adapters, no manual MCP wiring.