Kushindwa kwa kategoria za AI BENCHY
Akili ya jumla: Hakufuata maelekezo
Akili ya jumla
Hakufuata maelekezo
Ona ni modeli gani za AI zina uwezekano mkubwa wa kupata Hakufuata maelekezo katika Akili ya jumla, ili uone udhaifu haraka.
Sababu za kushindwa
| Nafasi | Modeli | Kampuni | Idadi ya Hakufuata maelekezo | Alama ya kategoria | Majaribio sahihi | Muda wa majibu (wastani) |
|---|---|---|---|---|---|---|
| #38 | GPT-5.4 Nano medium | OpenAI | 1 | 4.5 | 0/1 | 4.15s |
| #39 | Seed-2.0-Mini medium | Bytedance Seed | 1 | 5.1 | 0/1 | 36.7s |
| #40 | GPT-5.2 medium | OpenAI | 1 | 3.7 | 0/1 | 4.32s |
| #41 | MiMo-V2-Flash medium | Xiaomi | 1 | 4.0 | 0/1 | 4.20s |
| #42 | Claude Sonnet 4.6 none | Anthropic | 1 | 6.1 | 0/1 | 2.56s |
| #44 | GPT-5.4 Mini medium | OpenAI | 1 | 4.5 | 0/1 | 3.72s |
| #45 | GPT-5 Mini medium | OpenAI | 1 | 4.5 | 0/1 | 13.5s |
| #46 | Kimi K2.5 medium | Moonshot AI | 1 | 6.5 | 0/1 | 69.7s |
| #47 | Grok 4.20 medium | X AI | 1 | 5.8 | 0/1 | 7.09s |
| #50 | Hunter Alpha medium | OpenRouter | 1 | 7.0 | 0/1 | 6.44s |
| #51 | Nemotron 3 Super medium | NVIDIA | 1 | 3.8 | 0/1 | 27.9s |
| #52 | Grok 4.1 Fast medium | X AI | 1 | 4.2 | 0/1 | 16.2s |
| #54 | Mercury 2 medium | Inception | 1 | 4.8 | 0/1 | 821ms |
| #55 | MiMo-V2-Omni none | Xiaomi | 1 | 4.5 | 0/1 | 1.19s |
| #56 | Grok 4.20 Multi Agent Beta medium | X AI | 1 | 5.8 | 0/1 | 6.40s |