Kushindwa kwa kategoria za AI BENCHY
Mbinu za kupinga AI: Hakufuata maelekezo
Mbinu za kupinga AI
Hakufuata maelekezo
Ona ni modeli gani za AI zina uwezekano mkubwa wa kupata Hakufuata maelekezo katika Mbinu za kupinga AI, ili uone udhaifu haraka. Panga kwa: Muda wa majibu (wastani) ↓.
Sababu za kushindwa
| Nafasi | Modeli | Kampuni | Idadi ya Hakufuata maelekezo | Alama ya kategoria | Majaribio sahihi | Muda wa majibu (wastani) |
|---|---|---|---|---|---|---|
| #80 | MiniMax M2.7 medium | Minimax | 1 | 7.9 | 2/4 | 40.3s |
| #71 | MiniMax M2.5 medium | Minimax | 1 | 7.9 | 2/4 | 20.8s |
| #93 | GLM 4.7 Flash medium | Z.ai | 1 | 4.7 | 1/4 | 15.0s |
| #45 | GPT-5 Mini medium | OpenAI | 1 | 7.1 | 2/4 | 13.9s |
| #34 | Kimi K2.6 medium | Moonshot AI | 1 | 7.0 | 2/4 | 11.6s |
| #31 | GLM 5V Turbo medium | Z.ai | 1 | 7.2 | 2/4 | 10.8s |
| #68 | gpt-oss-120b medium | OpenAI | 1 | 6.7 | 2/4 | 10.2s |
| #92 | Qwen3 Coder Next medium | Qwen | 1 | 3.5 | 0/4 | 8.64s |
| #40 | GPT-5.2 medium | OpenAI | 1 | 6.5 | 2/4 | 7.81s |
| #84 | gpt-oss-120b none | OpenAI | 1 | 6.6 | 2/4 | 6.03s |
| #36 | GPT-5.3 Chat none | OpenAI | 1 | 6.7 | 2/4 | 3.86s |
| #87 | Qwen3 Coder Next none | Qwen | 1 | 3.6 | 0/4 | 3.31s |
| #17 | Gemini 3.1 Flash Lite Preview medium | 1 | 9.1 | 3/4 | 2.33s | |
| #54 | Mercury 2 medium | Inception | 1 | 6.9 | 2/4 | 1.12s |
| #95 | Grok 4.1 Fast none | X AI | 1 | 3.2 | 0/4 | 1.07s |