Kushindwa kwa kategoria za AI BENCHY
Mbinu za kupinga AI: Jibu lisilo sahihi
Mbinu za kupinga AI
Jibu lisilo sahihi
Ona ni modeli gani za AI zina uwezekano mkubwa wa kupata Jibu lisilo sahihi katika Mbinu za kupinga AI, ili uone udhaifu haraka. Panga kwa: Muda wa majibu (wastani) ↓.
Sababu za kushindwa
| Nafasi | Modeli | Kampuni | Idadi ya Jibu lisilo sahihi | Alama ya kategoria | Majaribio sahihi | Muda wa majibu (wastani) |
|---|---|---|---|---|---|---|
| #39 | Seed-2.0-Mini medium | Bytedance Seed | 1 | 6.6 | 2/4 | 74.7s |
| #46 | Kimi K2.5 medium | Moonshot AI | 1 | 7.3 | 2/4 | 51.4s |
| #8 | Qwen3.5 Plus 2026-02-15 medium | Qwen | 1 | 8.2 | 3/4 | 45.8s |
| #80 | MiniMax M2.7 medium | Minimax | 1 | 7.9 | 2/4 | 40.3s |
| #97 | Qwen3.5-9B medium | Qwen | 1 | 5.1 | 1/4 | 34.4s |
| #27 | DeepSeek V3.2 medium | DeepSeek | 1 | 8.4 | 3/4 | 30.7s |
| #57 | GPT-5 Nano medium | OpenAI | 2 | 6.5 | 2/4 | 25.5s |
| #6 | Seed-2.0-Lite medium | Bytedance Seed | 1 | 8.3 | 3/4 | 18.0s |
| #93 | GLM 4.7 Flash medium | Z.ai | 2 | 4.7 | 1/4 | 15.0s |
| #45 | GPT-5 Mini medium | OpenAI | 1 | 7.1 | 2/4 | 13.9s |
| #34 | Kimi K2.6 medium | Moonshot AI | 1 | 7.0 | 2/4 | 11.6s |
| #31 | GLM 5V Turbo medium | Z.ai | 1 | 7.2 | 2/4 | 10.8s |
| #68 | gpt-oss-120b medium | OpenAI | 1 | 6.7 | 2/4 | 10.2s |
| #92 | Qwen3 Coder Next medium | Qwen | 3 | 3.5 | 0/4 | 8.64s |
| #40 | GPT-5.2 medium | OpenAI | 1 | 6.5 | 2/4 | 7.81s |