Kushindwa kwa kategoria za AI BENCHY
Akili ya jumla: Hakufuata maelekezo
Akili ya jumla
Hakufuata maelekezo
Ona ni modeli gani za AI zina uwezekano mkubwa wa kupata Hakufuata maelekezo katika Akili ya jumla, ili uone udhaifu haraka.
Sababu za kushindwa
| Nafasi | Modeli | Kampuni | Idadi ya Hakufuata maelekezo | Alama ya kategoria | Majaribio sahihi | Muda wa majibu (wastani) |
|---|---|---|---|---|---|---|
| #111 | Owl Alpha medium | Openrouter | 1 | 4.3 | 0/1 | 58.6s |
| #113 | DeepSeek V4 Pro none | DeepSeek | 1 | 4.3 | 0/1 | 3.75s |
| #114 | Qwen3.5 Plus 2026-04-20 none | Qwen | 1 | 4.8 | 0/1 | 1.41s |
| #115 | Qwen3.5-27B none | Qwen | 1 | 5.0 | 0/1 | 2.51s |
| #116 | Hunter Alpha none | OpenRouter | 1 | 6.1 | 0/1 | 2.71s |
| #117 | Qwen3.5-35B-A3B none | Qwen | 1 | 6.5 | 0/1 | 1.19s |
| #118 | Qwen3.6 27B none | Qwen | 1 | 5.2 | 0/1 | 1.07s |
| #119 | Cobuddy medium | Baidu | 1 | 4.2 | 0/1 | 23.2s |
| #120 | Mimo V2 PRO none | Xiaomi | 1 | 4.3 | 0/1 | 2.44s |
| #121 | Owl Alpha none | Openrouter | 1 | 4.3 | 0/1 | 4.61s |
| #124 | Kimi K2.6 none | Moonshot AI | 1 | 5.4 | 0/1 | 1.55s |
| #129 | MiniMax M2.5 medium | Minimax | 1 | 3.8 | 0/1 | 6.63s |
| #130 | MiniMax M2.7 medium | Minimax | 1 | 3.9 | 0/1 | 38.7s |
| #131 | Qwen3.5-122B-A10B none | Qwen | 1 | 5.0 | 0/1 | 1.12s |
| #132 | Mistral Small 4 medium | Mistral | 1 | 4.8 | 0/1 | 2.05s |