AI BENCHY Categorie
Puzzeloplossing-ranglijst
Zie welke AI-modellen het best presteren op Puzzeloplossing, welke betrouwbaar blijven en waar de grootste verschillen zitten. Sorteren op: Correcte tests โ.
| Rang | Model | Bedrijf | Puzzeloplossing-score | Score | Correcte tests | Responstijd (gem.) |
|---|---|---|---|---|---|---|
| #19 | Seed-2.0-Lite medium | Bytedance Seed | 9.0 | 8.2 | 2/3 | 10.2s |
| #21 | GPT-5.4 medium | OpenAI | 8.2 | 8.0 | 2/3 | 9.14s |
| #23 | GLM 5 Turbo medium | Z.ai | 8.7 | 8.0 | 2/3 | 5.23s |
| #24 | GPT-5.2 Chat none | OpenAI | 7.7 | 7.9 | 2/3 | 4.10s |
| #28 | Gemini 2.5 Flash medium | 7.7 | 7.8 | 2/3 | 3.18s | |
| #30 | Qwen3.5-27B medium | Qwen | 8.2 | 7.8 | 2/3 | 59.6s |
| #31 | DeepSeek V4 Flash high | DeepSeek | 8.2 | 7.7 | 2/3 | 26.1s |
| #33 | Hy3 preview medium | Tencent | 7.7 | 7.7 | 2/3 | 11.1s |
| #36 | Qwen3.5 Plus 2026-04-20 medium | Qwen | 8.2 | 7.6 | 2/3 | 17.7s |
| #39 | Qwen3.6 Flash medium | Qwen | 8.2 | 7.5 | 2/3 | 6.29s |
| #40 | Gemini 3.1 Flash Lite Preview medium | 7.7 | 7.5 | 2/3 | 5.30s | |
| #42 | GPT-5.2 medium | OpenAI | 7.5 | 7.5 | 2/3 | 5.80s |
| #44 | Gemini 3.1 Flash Lite medium | 7.6 | 7.5 | 2/3 | 1.95s | |
| #45 | GPT-5.4 Mini medium | OpenAI | 7.8 | 7.5 | 2/3 | 4.37s |
| #46 | Qwen3.6 35B A3B medium | Qwen | 8.0 | 7.4 | 2/3 | 5.95s |