Kushindwa kwa kategoria za AI BENCHY
Mbinu za kupinga AI
Jibu lisilo sahihi
Mbinu za kupinga AI
Jibu lisilo sahihi
Ona ni modeli gani za AI zina uwezekano mkubwa wa kupata Jibu lisilo sahihi katika Mbinu za kupinga AI, ili uone udhaifu haraka. Panga kwa: Muda wa majibu (wastani) ↓.
Sababu zinazohusiana za kushindwa
| Nafasi | Modeli | Kampuni | Idadi ya Jibu lisilo sahihi | Alama ya kategoria | Majaribio sahihi | Muda wa majibu (wastani) |
|---|---|---|---|---|---|---|
| #34 | GPT-5 Nano medium | OpenAI | 1 | 7.0 | 2/3 | 37.7s |
| #18 | DeepSeek V3.2 medium | DeepSeek | 1 | 7.0 | 2/3 | 33.4s |
| #52 | GLM 4.7 Flash medium | Z.ai | 1 | 4.0 | 1/3 | 27.1s |
| #50 | Qwen3 Coder Next medium | Qwen | 2 | 1.3 | 0/3 | 15.3s |
| #46 | Kimi K2.5 none | Moonshot AI | 3 | 2.7 | 0/3 | 11.4s |
| #33 | DeepSeek V3.2 none | DeepSeek | 1 | 10.0 | 0/3 | 8.79s |
| #16 | Gemini 2.5 Flash medium | 1 | 7.3 | 2/3 | 6.98s | |
| #49 | GLM 4.7 Flash none | Z.ai | 3 | 10.0 | 0/3 | 6.59s |
| #48 | Qwen3 Coder Next none | Qwen | 1 | 2.3 | 0/3 | 4.39s |
| #45 | Trinity Large Preview none | Arcee AI | 3 | 10.0 | 0/3 | 3.59s |
| #31 | GLM 5 none | Z.ai | 2 | 4.0 | 1/3 | 3.39s |
| #29 | Qwen3.5 Plus 2026-02-15 none | Qwen | 2 | 4.0 | 1/3 | 2.74s |
| #17 | Gemini 3.1 Flash Lite Preview low | 1 | 7.0 | 2/3 | 2.18s | |
| #47 | GPT-4o-mini none | OpenAI | 2 | 4.0 | 1/3 | 1.83s |
| #42 | Qwen3.5-35B-A3B none | Qwen | 3 | 10.0 | 0/3 | 1.76s |
| #53 | Grok 4.1 Fast none | X AI | 2 | 1.3 | 0/3 | 1.73s |
| #37 | Qwen3.5-Flash none | Qwen | 3 | 2.3 | 0/3 | 1.62s |
| #20 | Gemini 3 Flash Preview none | 1 | 7.0 | 2/3 | 1.59s | |
| #44 | GPT-5.4 none | OpenAI | 3 | 10.0 | 0/3 | 1.41s |
| #54 | MiMo-V2-Flash none | Xiaomi | 3 | 10.0 | 0/3 | 1.36s |
| #22 | Gemini 3.1 Flash Lite Preview none | 1 | 6.0 | 1/3 | 1.16s | |
| #40 | Qwen3.5-122B-A10B none | Qwen | 2 | 4.0 | 1/3 | 927ms |
| #41 | Qwen3.5-27B none | Qwen | 2 | 4.0 | 1/3 | 796ms |
| #38 | Gemini 2.5 Flash none | 3 | 10.0 | 0/3 | 668ms | |
| #55 | LFM2-24B-A2B none | Liquid | 3 | 10.0 | 0/3 | 471ms |
| #51 | Mercury 2 none | Inception | 3 | 10.0 | 0/3 | 466ms |