Kushindwa kwa kategoria za AI BENCHY
Mbinu za kupinga AI: Jibu lisilo sahihi
Mbinu za kupinga AI
Jibu lisilo sahihi
Ona ni modeli gani za AI zina uwezekano mkubwa wa kupata Jibu lisilo sahihi katika Mbinu za kupinga AI, ili uone udhaifu haraka.
Sababu za kushindwa
| Nafasi | Modeli | Kampuni | Idadi ya Jibu lisilo sahihi | Alama ya kategoria | Majaribio sahihi | Muda wa majibu (wastani) |
|---|---|---|---|---|---|---|
| #42 | Claude Sonnet 4.6 none | Anthropic | 1 | 4.8 | 1/4 | 2.94s |
| #44 | GPT-5.4 Mini medium | OpenAI | 1 | 8.6 | 3/4 | 4.05s |
| #45 | GPT-5 Mini medium | OpenAI | 1 | 7.1 | 2/4 | 13.9s |
| #46 | Kimi K2.5 medium | Moonshot AI | 1 | 7.3 | 2/4 | 51.4s |
| #47 | Grok 4.20 medium | X AI | 1 | 8.2 | 3/4 | 3.36s |
| #52 | Grok 4.1 Fast medium | X AI | 1 | 8.7 | 3/4 | 3.81s |
| #54 | Mercury 2 medium | Inception | 1 | 6.9 | 2/4 | 1.12s |
| #56 | Grok 4.20 Multi Agent Beta medium | X AI | 1 | 6.9 | 2/4 | 3.46s |
| #60 | Gemma 4 26B A4B none | 1 | 8.3 | 3/4 | 1.28s | |
| #68 | gpt-oss-120b medium | OpenAI | 1 | 6.7 | 2/4 | 10.2s |
| #80 | MiniMax M2.7 medium | Minimax | 1 | 7.9 | 2/4 | 40.3s |
| #84 | gpt-oss-120b none | OpenAI | 1 | 6.6 | 2/4 | 6.03s |
| #85 | Elephant none | Openrouter | 1 | 6.6 | 2/4 | 963ms |
| #97 | Qwen3.5-9B medium | Qwen | 1 | 5.1 | 1/4 | 34.4s |