AI BENCHY Categorie
Programmeren-ranglijst
Zie welke AI-modellen het best presteren op Programmeren, welke betrouwbaar blijven en waar de grootste verschillen zitten. Sorteren op: Correcte tests โ.
| Rang | Model | Bedrijf | Programmeren-score | Score | Correcte tests | Responstijd (gem.) |
|---|---|---|---|---|---|---|
| #25 | Gemini 3.5 Flash minimal | 7.0 | 7.9 | 1/2 | 3.39s | |
| #26 | Qwen3.5-27B medium | Qwen | 7.0 | 7.9 | 1/2 | 123.9s |
| #27 | Qwen3.7 Max none | Qwen | 6.8 | 7.9 | 1/2 | 1.39s |
| #28 | GPT-5.4 medium | OpenAI | 8.2 | 7.9 | 1/2 | 55.0s |
| #29 | GLM 5 Turbo medium | Z.ai | 7.3 | 7.9 | 1/2 | 53.9s |
| #30 | GPT-5.2 Chat none | OpenAI | 8.2 | 7.9 | 1/2 | 8.05s |
| #32 | Qwen3.6 35B A3B medium | Qwen | 6.6 | 7.8 | 1/2 | 59.3s |
| #33 | Grok 4.3 medium | X AI | 7.4 | 7.8 | 1/2 | 55.3s |
| #36 | Gemini 3.1 Flash Lite Preview medium | 6.8 | 7.7 | 1/2 | 3.98s | |
| #37 | Gemini 3.1 Flash Lite medium | 6.8 | 7.7 | 1/2 | 3.59s | |
| #38 | Gemini 2.5 Flash medium | 6.6 | 7.7 | 1/2 | 54.6s | |
| #41 | Gemini 3 Flash Preview none | 6.8 | 7.7 | 1/2 | 2.19s | |
| #42 | Grok Build 0.1 medium | X AI | 7.0 | 7.7 | 1/2 | 62.6s |
| #44 | DeepSeek V4 Flash high | DeepSeek | 6.8 | 7.6 | 1/2 | 58.1s |
| #45 | MiMo-V2.5-Pro medium | Xiaomi | 7.0 | 7.6 | 1/2 | 81.7s |