Kushindwa kwa kategoria za AI BENCHY
Ufuataji wa maagizo
Hakufuata maelekezo
Ufuataji wa maagizo
Hakufuata maelekezo
Ona ni modeli gani za AI zina uwezekano mkubwa wa kupata Hakufuata maelekezo katika Ufuataji wa maagizo, ili uone udhaifu haraka. Panga kwa: Idadi ya kushindwa ↑.
Modeli zilizoonyeshwa
9
Jumla ya kushindwa
9
Modeli iliyoathirika zaidi
Gemini 3.1 Flash Lite Preview 1Sababu zinazohusiana za kushindwa
Kategoria zinazohusiana
| Nafasi | Modeli | Kampuni | Idadi ya Hakufuata maelekezo | Alama ya kategoria | Majaribio sahihi | Muda wa majibu (wastani) |
|---|---|---|---|---|---|---|
| #8 | Gemini 3.1 Flash Lite Preview high | 1 | 9.0 | 1/2 | 70.1s | |
| #13 | Step 3.5 Flash medium | Stepfun | 1 | 9.0 | 1/2 | 4.98s |
| #30 | Grok 4.1 Fast medium | X AI | 1 | 5.5 | 1/2 | 5.30s |
| #32 | GPT-5 Mini medium | OpenAI | 1 | 7.5 | 1/2 | 15.7s |
| #34 | GPT-5 Nano medium | OpenAI | 1 | 9.0 | 1/2 | 11.9s |
| #43 | MiniMax M2.5 medium | Minimax | 1 | 8.0 | 1/2 | 4.64s |
| #45 | Trinity Large Preview none | Arcee AI | 1 | 3.5 | 0/2 | 1.09s |
| #47 | GPT-4o-mini none | OpenAI | 1 | 4.5 | 0/2 | 1.27s |
| #50 | Qwen3 Coder Next medium | Qwen | 1 | 4.5 | 0/2 | 7.34s |