[ agent + llm stack leaderboard ]

updated / 0 stacks / about_scoring →

🤖

🧩

📐

📖

🎯

new Opus 4.8 released: SWE-bench 88.6% ▸

all_stacks

coding_agents

by_llm

open_source

image

video

voice

#	agent + llm	score ▼	ctx	cost_i/o	tier

// latest_news

Jun 09, 2026GPT-5.6 "iris-alpha" leaked via Codex — reportedly 85% SWE-bench with 1.5M context, expected this month

Jun 03, 2026DeepSeek V4 Flash price cut by Tencent Cloud — now $0.14/$0.28, up to 97% off, 79% SWE-bench

Jun 01, 2026Nvidia unveils RTX Spark superchip at Computex — 20-core Arm + Blackwell, 1 petaflop AI, 120B param on-device

May 28, 2026Claude Opus 4.8 launches at $5/$25 — 88.6% SWE-bench, 1M context, Anthropic's best model

May 26, 2026DeepSWE audit reveals Opus 4.7/4.6 SWE-bench Pro scores inflated 12%+ via git history contamination

May 20, 2026Google launches Gemini Spark at I/O — personal AI agent, $100/mo Ultra, runs tasks 24/7 on Google Cloud

May 19, 2026Gemini 3.5 Pro announced at Google I/O — expected June release, competes with Opus 4.8

May 07, 2026AMD launches MI350P PCIe AI card — 144GB HBM3e, 4.6 petaFLOPS, fits standard servers for on-prem AI

Apr 24, 2026DeepSeek V4 Flash preview released at $0.14/$0.28 — 79% SWE-bench, 429x cheaper than Opus