Kushindwa kwa kategoria za AI BENCHY
Ufuataji wa maagizo
Hakufuata maelekezo
Ufuataji wa maagizo
Hakufuata maelekezo
Ona ni modeli gani za AI zina uwezekano mkubwa wa kupata Hakufuata maelekezo katika Ufuataji wa maagizo, ili uone udhaifu haraka. Panga kwa: Muda wa majibu (wastani) ↓.
Modeli zilizoonyeshwa
9
Jumla ya kushindwa
9
Modeli iliyoathirika zaidi
Gemini 3.1 Flash Lite Preview 1Sababu zinazohusiana za kushindwa
Kategoria zinazohusiana
| Nafasi | Modeli | Kampuni | Idadi ya Hakufuata maelekezo | Alama ya kategoria | Majaribio sahihi | Muda wa majibu (wastani) |
|---|---|---|---|---|---|---|
| #8 | Gemini 3.1 Flash Lite Preview high | 1 | 9.0 | 1/2 | 70.1s | |
| #32 | GPT-5 Mini medium | OpenAI | 1 | 7.5 | 1/2 | 15.7s |
| #34 | GPT-5 Nano medium | OpenAI | 1 | 9.0 | 1/2 | 11.9s |
| #50 | Qwen3 Coder Next medium | Qwen | 1 | 4.5 | 0/2 | 7.34s |
| #30 | Grok 4.1 Fast medium | X AI | 1 | 5.5 | 1/2 | 5.30s |
| #13 | Step 3.5 Flash medium | Stepfun | 1 | 9.0 | 1/2 | 4.98s |
| #43 | MiniMax M2.5 medium | Minimax | 1 | 8.0 | 1/2 | 4.64s |
| #47 | GPT-4o-mini none | OpenAI | 1 | 4.5 | 0/2 | 1.27s |
| #45 | Trinity Large Preview none | Arcee AI | 1 | 3.5 | 0/2 | 1.09s |