Kushindwa kwa kategoria za AI BENCHY
Ufuataji wa maagizo: Hakufuata maelekezo
Ufuataji wa maagizo
Hakufuata maelekezo
Ona ni modeli gani za AI zina uwezekano mkubwa wa kupata Hakufuata maelekezo katika Ufuataji wa maagizo, ili uone udhaifu haraka.
Sababu za kushindwa
| Nafasi | Modeli | Kampuni | Idadi ya Hakufuata maelekezo | Alama ya kategoria | Majaribio sahihi | Muda wa majibu (wastani) |
|---|---|---|---|---|---|---|
| #62 | Step 3.5 Flash medium | Stepfun | 1 | 8.3 | 1/2 | 4.78s |
| #80 | Mimo V2 Omni medium | Xiaomi | 1 | 8.3 | 1/2 | 4.99s |
| #86 | Grok 4.1 Fast medium | X AI | 1 | 6.5 | 1/2 | 4.63s |
| #105 | Nemotron 3 Super medium | NVIDIA | 1 | 7.3 | 1/2 | 6.97s |
| #129 | MiniMax M2.5 medium | Minimax | 1 | 7.5 | 1/2 | 621ms |
| #130 | MiniMax M2.7 medium | Minimax | 1 | 3.8 | 0/2 | 12.8s |
| #149 | Nemotron 3 Nano Omni 30b A3b Reasoning medium | NVIDIA | 1 | 7.3 | 1/2 | 1.37s |
| #151 | Trinity Large Preview none | Arcee AI | 1 | 3.5 | 0/2 | 822ms |
| #157 | Grok 4.1 Fast none | X AI | 1 | 3.0 | 0/2 | 685ms |
| #162 | Nemotron 3 Nano Omni 30b A3b Reasoning none | NVIDIA | 1 | 4.8 | 0/2 | 541ms |
| #163 | Granite 4.1 8B none | IBM Granite | 1 | 3.6 | 0/2 | 344ms |