AI BENCHY Category
Coding Ranking
See which AI models perform best on Coding, which ones stay reliable, and where the biggest gaps appear. Sort by: Tests Correct ↑.
| Rank | Model | Company | Coding Score | Score | Tests Correct | Response Time (avg) |
|---|---|---|---|---|---|---|
| #21 | Seed-2.0-Lite medium | Bytedance Seed | 7.0 | 8.1 | 1/2 | 107.7s |
| #24 | Gemini 3.5 Flash minimal | 7.0 | 7.9 | 1/2 | 3.39s | |
| #25 | Qwen3.5-27B medium | Qwen | 7.0 | 7.9 | 1/2 | 123.9s |
| #26 | Qwen3.7 Max none | Qwen | 6.8 | 7.9 | 1/2 | 1.39s |
| #27 | GPT-5.4 medium | OpenAI | 8.2 | 7.9 | 1/2 | 55.0s |
| #28 | GLM 5 Turbo medium | Z.ai | 7.3 | 7.9 | 1/2 | 53.9s |
| #30 | Qwen3.6 35B A3B medium | Qwen | 6.6 | 7.8 | 1/2 | 59.3s |
| #31 | Grok 4.3 medium | X AI | 7.4 | 7.8 | 1/2 | 55.3s |
| #34 | Gemini 3.1 Flash Lite Preview medium | 6.8 | 7.7 | 1/2 | 3.98s | |
| #35 | Gemini 3.1 Flash Lite medium | 6.8 | 7.7 | 1/2 | 3.59s | |
| #36 | Gemini 2.5 Flash medium | 6.6 | 7.7 | 1/2 | 54.6s | |
| #37 | Grok Build 0.1 medium | X AI | 7.2 | 7.7 | 1/2 | 61.2s |
| #40 | Gemini 3 Flash Preview none | 6.8 | 7.7 | 1/2 | 2.19s | |
| #41 | MiMo-V2.5-Pro medium | Xiaomi | 7.0 | 7.6 | 1/2 | 81.7s |
| #42 | Gemini 3.1 Flash Lite Preview low | 6.8 | 7.6 | 1/2 | 1.56s |