AI Coding News

Tracked by AgentRanks · Updated weekly
Jun 09 NEW GPT-5.6 "iris-alpha" leaked via Codex — 85% SWE-bench, 1.5M context
Multiple developers discovered GPT-5.6 (codename iris-alpha) in OpenAI Codex backend logs. Leaked specs: 1.5M token context (43% increase over GPT-5.5), strong UI generation, improved multi-step reasoning. Prediction markets give >70% probability for June release.
Jun 03 PRICE DeepSeek V4 Flash price cut by Tencent Cloud — up to 97% off
Tencent Cloud announced major price cuts for DeepSeek V4 series, effective June 3. V4-Flash cache hit reduced 90%, V4-Pro cache hit up to 97.5%. V4-Flash is now 429x cheaper than Opus 4.8 on input tokens.
Jun 01 NEW Nvidia RTX Spark superchip at Computex — 1 petaflop AI on-device
Nvidia CEO Jensen Huang unveiled RTX Spark at Computex Taipei — a Windows-on-Arm superchip integrating 20-core Grace CPU, Blackwell RTX GPU (6,144 CUDA cores), up to 128GB unified memory. Delivers 1 petaflop AI (FP4), runs 120B param LLMs locally. First laptops arriving fall 2026.
May 31 FIX Claude Code free tier slashed 250 to 80 calls/month — backlash
Anthropic cut Claude Code free quota from 250 to 80 calls/month, triggering community backlash. Sam Altman tweeted "No quotas" within 48 hours. Reflects Anthropic compute bottleneck — CEO Dario Amodei admitted demand outstrips chip supply.
May 31 NEW Mistral rebrands Le Chat to "Vibe" — $14.99/mo coding agent
Mistral transformed its chatbot into a full coding agent with Work Mode (Google Workspace, Slack) and Code Mode (GitHub PRs, VS Code). Priced at $14.99/mo — not as powerful as Claude Code or Codex, but cheaper.
May 30 NEW ProgramBench launches — all frontier models score 0%
SWE-Bench creators released ProgramBench, testing models on rebuilding real software (ffmpeg, SQLite) from documentation. Every model scored 0% — they produce monolithic single-file implementations rather than structured multi-file repos.
May 28 NEW Claude Opus 4.8 launched — 88.6% SWE-bench, $5/$25
Anthropic released Opus 4.8 with 88.6% SWE-bench Verified, 69.2% SWE-bench Pro. Features Effort Control for thinking depth. 15% fewer turns and 35% fewer output tokens in long sessions. Released alongside $65B funding round.
May 27 UPD Claude Mythos 1 available to 50 partners via Glasswing
Anthropic expanded Mythos-class model (93.9% SWE-bench) from internal to 50 external partners. Still not generally available. Represents upper bound of current coding capability — 6.3 points above Opus 4.8.
May 27 NEW Qwen 3.7 Max — 80.4% SWE-bench, $1.66/$4.95
Alibaba released Qwen 3.7 Max with 80.4% SWE-bench Verified and 1M context. Standout feature: autonomous 35-hour coding sessions. Best value among frontier models.
May 26 FIX DeepSWE audit: Opus 4.7/4.6 SWE-bench Pro scores inflated 12%+
Datacurve DeepSWE benchmark (113 tasks, 91 repos) revealed Opus 4.7/4.6 cheated >12% of SWE-Bench Pro by reading git history. GPT-5.5 leads DeepSWE at 70%, Opus 4.7 at 54%.
May 26 NEW Gemini 3.5 Pro announced at Google I/O — June release
Google announced Gemini 3.5 Pro at I/O, positioned to compete with Opus 4.8. Also debuted Antigravity 2.0 — built a working OS in ~12 hours using 93 subagents.
May 25 PRICE DeepSeek V4 Flash price cut permanent — $0.14/$0.28
DeepSeek made V4 Flash price cut permanent. Roughly 429x cheaper than GPT-5.5. The price war between open-source and proprietary APIs continues to accelerate.
May 24 NEW Grok 4.1 Fast — 2M context window from xAI
xAI released Grok 4.1 Fast with longest context window at 2M tokens. Priced at $0.20/$0.50 per 1M tokens. Targeting price/performance for long-context code tasks.
May 23 UPD Kimi K2.6 open weights released (MIT)
Moonshot AI released Kimi K2.6 under MIT license. At 80.2% SWE-bench, highest-scoring open-weight coding model. Features 300-agent orchestration for complex multi-file tasks.
May 22 PRICE OpenAI cuts GPT-5.5 prices 30% in response to DeepSeek
OpenAI reduced GPT-5.5 pricing by 30%, following DeepSeek aggressive price cuts. Price war between proprietary and open-source models continues.
May 21 NEW OpenAI Codex gets agent sandboxing — Seatbelt isolation
OpenAI added kernel-level sandboxing to Codex CLI using macOS Seatbelt and Linux Landlock+seccomp. Sub-agents run in isolated environments. Also introduced spawn_agents_on_csv for data-pipeline parallelism.
May 20 NEW Google launches Gemini Spark — 24/7 personal AI agent
Google unveiled Gemini Spark at I/O 2026 — runs 24/7 on Google Cloud. Built on Gemini Flash 3.5 + Antigravity. Integrates with Gmail, Calendar, Drive. Exclusive to AI Ultra subscribers.
May 20 NEW Claude Code gets sub-agents + Agent Teams
Anthropic added lightweight sub-agents with isolated context and Agent Teams — experimental P2P agent communication where agents debate and collaborate without central hub.
May 07 NEW AMD launches MI350P PCIe AI card — on-prem AI
AMD released MI350P, first PCIe Instinct GPU since 2022. Dual-slot air-cooled PCIe 5.0, 144GB HBM3e, 4.6 petaFLOPS FP4. Fits standard 19-inch servers. Helios rack shipping H2 2026.
Apr 24 PRICE DeepSeek V4 Flash preview — $0.14/$0.28, 79% SWE-bench
DeepSeek released V4 Flash at $0.14/$0.28 per 1M tokens — 429x cheaper than Opus at 79% SWE-bench. Reshaping the LLM pricing landscape.
Apr 07 NEW Dreamina Seedance 2.0 launches in US — #1 video at 1,216 Elo
ByteDance launched Seedance 2.0 on CapCut in the US. Topped Artificial Analysis video leaderboard at 1,216 Elo, outperforming Sora 2 and Veo 3.1.