AI BENCHY Categorie
Instructies opvolgen-ranglijst
Zie welke AI-modellen het best presteren op Instructies opvolgen, welke betrouwbaar blijven en waar de grootste verschillen zitten. Sorteren op: Responstijd (gem.) โ.
| Rang | Model | Bedrijf | Instructies opvolgen-score | Score | Correcte tests | Responstijd (gem.) |
|---|---|---|---|---|---|---|
| #115 | Qwen3.5-27B none | Qwen | 6.3 | 5.7 | 1/2 | 1.03s |
| #137 | Elephant Alpha none | Openrouter | 9.8 | 5.1 | 2/2 | 1.03s |
| #110 | Seed-2.0-Lite none | Bytedance Seed | 10.0 | 5.8 | 2/2 | 1.06s |
| #81 | Mercury 2 medium | Inception | 10.0 | 6.6 | 2/2 | 1.07s |
| #125 | GPT-5.4 none | OpenAI | 6.5 | 5.5 | 1/2 | 1.07s |
| #128 | Qwen3.6 Flash none | Qwen | 6.3 | 5.4 | 1/2 | 1.10s |
| #147 | GPT-4o-mini none | OpenAI | 6.3 | 4.8 | 1/2 | 1.11s |
| #58 | Gemini 3.1 Flash Lite Preview none | 10.0 | 7.2 | 2/2 | 1.13s | |
| #91 | GPT-5.5 none | OpenAI | 6.2 | 6.4 | 1/2 | 1.15s |
| #114 | Qwen3.5 Plus 2026-04-20 none | Qwen | 6.2 | 5.7 | 1/2 | 1.17s |
| #68 | Claude Opus 4.8 none | Anthropic | 9.9 | 7.0 | 2/2 | 1.37s |
| #149 | Nemotron 3 Nano Omni 30b A3b Reasoning medium | NVIDIA | 7.3 | 4.6 | 1/2 | 1.37s |
| #132 | Mistral Small 4 medium | Mistral | 7.3 | 5.3 | 1/2 | 1.38s |
| #74 | Qwen3.6 Max Preview none | Qwen | 9.8 | 6.9 | 2/2 | 1.40s |
| #8 | Claude Opus 4.7 none | Anthropic | 10.0 | 8.9 | 2/2 | 1.46s |