AI BENCHY Category
Coding Ranking
See which AI models perform best on Coding, which ones stay reliable, and where the biggest gaps appear. Sort by: Tests Correct ↓.
| Rank | Model | Company | Coding Score | Score | Tests Correct | Response Time (avg) |
|---|---|---|---|---|---|---|
| #24 | Gemini 3.5 Flash minimal | 7.0 | 7.9 | 1/2 | 3.39s | |
| #25 | Qwen3.5-27B medium | Qwen | 7.0 | 7.9 | 1/2 | 123.9s |
| #26 | Qwen3.7 Max none | Qwen | 6.8 | 7.9 | 1/2 | 1.39s |
| #27 | GPT-5.4 medium | OpenAI | 8.2 | 7.9 | 1/2 | 55.0s |
| #28 | GLM 5 Turbo medium | Z.ai | 7.3 | 7.9 | 1/2 | 53.9s |
| #30 | Qwen3.6 35B A3B medium | Qwen | 6.6 | 7.8 | 1/2 | 59.3s |
| #31 | Grok 4.3 medium | X AI | 7.4 | 7.8 | 1/2 | 55.3s |
| #34 | Gemini 3.1 Flash Lite Preview medium | 6.8 | 7.7 | 1/2 | 3.98s | |
| #35 | Gemini 3.1 Flash Lite medium | 6.8 | 7.7 | 1/2 | 3.59s | |
| #36 | Gemini 2.5 Flash medium | 6.6 | 7.7 | 1/2 | 54.6s | |
| #39 | Gemini 3 Flash Preview none | 6.8 | 7.7 | 1/2 | 2.19s | |
| #40 | MiMo-V2.5-Pro medium | Xiaomi | 7.0 | 7.6 | 1/2 | 81.7s |
| #41 | Gemini 3.1 Flash Lite Preview low | 6.8 | 7.6 | 1/2 | 1.56s | |
| #42 | Qwen3.5 Plus 2026-04-20 medium | Qwen | 5.4 | 7.6 | 1/2 | 137.5s |
| #43 | GPT-5.2 Chat none | OpenAI | 8.2 | 7.6 | 1/2 | 8.05s |