Kushindwa kwa kategoria za AI BENCHY
Utatuzi wa mafumbo
Hakufuata maelekezo
Utatuzi wa mafumbo
Hakufuata maelekezo
Ona ni modeli gani za AI zina uwezekano mkubwa wa kupata Hakufuata maelekezo katika Utatuzi wa mafumbo, ili uone udhaifu haraka. Panga kwa: Muda wa majibu (wastani) ↓.
Sababu zinazohusiana za kushindwa
Kategoria zinazohusiana
| Nafasi | Modeli | Kampuni | Idadi ya Hakufuata maelekezo | Alama ya kategoria | Majaribio sahihi | Muda wa majibu (wastani) |
|---|---|---|---|---|---|---|
| #7 | Qwen3.5-27B medium | Qwen | 1 | 8.3 | 2/3 | 64.6s |
| #28 | Kimi K2.5 medium | Moonshot AI | 1 | 4.0 | 1/3 | 45.4s |
| #34 | GPT-5 Nano medium | OpenAI | 1 | 4.0 | 1/3 | 19.8s |
| #32 | GPT-5 Mini medium | OpenAI | 1 | 4.3 | 1/3 | 14.1s |
| #52 | GLM 4.7 Flash medium | Z.ai | 1 | 10.0 | 0/3 | 12.9s |
| #39 | gpt-oss-120b medium | OpenAI | 2 | 1.7 | 0/3 | 11.8s |
| #9 | GPT-5.4 medium | OpenAI | 1 | 7.0 | 2/3 | 9.13s |
| #30 | Grok 4.1 Fast medium | X AI | 1 | 4.0 | 1/3 | 8.08s |
| #13 | Step 3.5 Flash medium | Stepfun | 1 | 4.0 | 1/3 | 7.72s |
| #37 | Qwen3.5-Flash none | Qwen | 1 | 1.3 | 0/3 | 5.90s |
| #27 | GPT-5.2 medium | OpenAI | 1 | 7.0 | 2/3 | 5.47s |
| #3 | GPT-5.3-Codex medium | OpenAI | 1 | 9.3 | 2/3 | 5.12s |
| #50 | Qwen3 Coder Next medium | Qwen | 2 | 10.0 | 0/3 | 2.30s |
| #55 | LFM2-24B-A2B none | Liquid | 1 | 3.3 | 0/3 | 1.69s |
| #44 | GPT-5.4 none | OpenAI | 1 | 4.0 | 1/3 | 1.52s |
| #41 | Qwen3.5-27B none | Qwen | 1 | 6.3 | 1/3 | 1.37s |
| #42 | Qwen3.5-35B-A3B none | Qwen | 1 | 1.7 | 0/3 | 1.34s |
| #49 | GLM 4.7 Flash none | Z.ai | 2 | 3.7 | 0/3 | 1.00s |
| #36 | Mercury 2 medium | Inception | 2 | 1.7 | 0/3 | 934ms |
| #38 | Gemini 2.5 Flash none | 1 | 4.7 | 1/3 | 576ms |