Kushindwa kwa kategoria za AI BENCHY
Mbinu za kupinga AI
Hakufuata maelekezo
Mbinu za kupinga AI
Hakufuata maelekezo
Ona ni modeli gani za AI zina uwezekano mkubwa wa kupata Hakufuata maelekezo katika Mbinu za kupinga AI, ili uone udhaifu haraka. Panga kwa: Muda wa majibu (wastani) ↑.
Modeli zilizoonyeshwa
12
Jumla ya kushindwa
12
Modeli iliyoathirika zaidi
Gemini 3.1 Flash Lite Preview 1Sababu zinazohusiana za kushindwa
Kategoria zinazohusiana
| Nafasi | Modeli | Kampuni | Idadi ya Hakufuata maelekezo | Alama ya kategoria | Majaribio sahihi | Muda wa majibu (wastani) |
|---|---|---|---|---|---|---|
| #22 | Gemini 3.1 Flash Lite Preview none | 1 | 6.0 | 1/3 | 1.16s | |
| #36 | Mercury 2 medium | Inception | 1 | 7.3 | 2/3 | 1.30s |
| #53 | Grok 4.1 Fast none | X AI | 1 | 1.3 | 0/3 | 1.73s |
| #12 | Gemini 3.1 Flash Lite Preview medium | 1 | 9.0 | 2/3 | 2.53s | |
| #48 | Qwen3 Coder Next none | Qwen | 1 | 2.3 | 0/3 | 4.39s |
| #19 | GPT-5.3 Chat none | OpenAI | 1 | 7.3 | 2/3 | 4.72s |
| #27 | GPT-5.2 medium | OpenAI | 1 | 7.0 | 2/3 | 14.3s |
| #50 | Qwen3 Coder Next medium | Qwen | 1 | 1.3 | 0/3 | 15.3s |
| #32 | GPT-5 Mini medium | OpenAI | 1 | 7.0 | 2/3 | 16.5s |
| #39 | gpt-oss-120b medium | OpenAI | 1 | 7.0 | 2/3 | 19.8s |
| #52 | GLM 4.7 Flash medium | Z.ai | 1 | 4.0 | 1/3 | 27.1s |
| #43 | MiniMax M2.5 medium | Minimax | 1 | 9.3 | 2/3 | 32.4s |