2026-05-22
- New Models Tested: Qwen3.7 Max Added benchmark coverage for Qwen 3.7 Max.
- New Tests Added: Added a new Coding test category focused on bug-finding in C++ solutions.
AI BENCHY
A simple log of product and benchmark updates, grouped by date. We use it to note newly tested models, re-tests, benchmark changes, and shipped UX/product work.
Changelog page created
We started this changelog after launch, so some older updates are missing.