Agent Architecture Rubric

All agents scored on 7 architectural dimensions. Scores are based on published benchmarks, official documentation, and community data. ยฑ1 point tolerance applies to all scores โ€” they reflect best-effort research, not hard metrics.

๐Ÿ† Agent Scores Overview

Click [src] to expand sources for each score.

๐Ÿ“ The 7 Dimensions

Each dimension measures a specific aspect of agent architecture quality. Max score varies by dimension importance.

๐Ÿง  Multi-Agent Orchestration

Max: 20pts

Task decomposition, parallel sub-agents, Coordinator mode, agent isolation. Higher impact on complex multi-file tasks. Sources: Requesty 2026, Dev.to 2026, Packmind matrix.

๐Ÿ’พ Memory & Context

Max: 15pts

Cross-session persistence, context window size, memory types, retrieval quality, auto-consolidation.

๐Ÿ”ง Tool System

Max: 20pts

Number and quality of tools, MCP support, lifecycle management, extensibility. Source: Packmind coding-agents-matrix, MCP Atlas leaderboard.

๐Ÿ’ฐ Prompt Cache & Cost

Max: 10pts

Token optimization, prompt cache strategy, BYOK flexibility, cost efficiency, cache-break tracking.

๐Ÿ›ก๏ธ Safety & Permissions

Max: 15pts

Permission chain depth, sandboxing, side-model classification, command vetting, attestation. Sources: Anthropic docs, OpenAI Codex sandbox docs.

โšก Reliability & Recovery

Max: 10pts

Error handling, retry logic, timeout management, git-based recovery, graceful degradation.

๐Ÿ“Š Community & Ecosystem

Max: 10pts

GitHub stars, update frequency, documentation quality, plugin/MCP ecosystem, community responsiveness.

Note: Agent scores reflect architectural quality only โ€” they do not include LLM benchmark performance. All scores carry a ยฑ1 tolerance. Open-source agent scores verified through source code analysis; proprietary agent scores estimated from published documentation. Removed: Claw Code (experimental concept, no public data), OpenClaude (no verifiable source).