Kushindwa kwa kategoria za AI BENCHY
Mahususi kwa domeni
Muda umeisha
Mahususi kwa domeni
Muda umeisha
Ona ni modeli gani za AI zina uwezekano mkubwa wa kupata Muda umeisha katika Mahususi kwa domeni, ili uone udhaifu haraka. Panga kwa: Majaribio sahihi ↓.
Sababu zinazohusiana za kushindwa
Kategoria zinazohusiana
| Nafasi | Modeli | Kampuni | Idadi ya Muda umeisha | Alama ya kategoria | Majaribio sahihi | Muda wa majibu (wastani) |
|---|---|---|---|---|---|---|
| #4 | Qwen3.5 Plus 2026-02-15 medium | Qwen | 1 | 4.0 | 1/3 | 17.5s |
| #7 | Qwen3.5-27B medium | Qwen | 1 | 4.0 | 1/3 | 79.5s |
| #18 | DeepSeek V3.2 medium | DeepSeek | 1 | 4.0 | 1/3 | 39.3s |
| #24 | Qwen3.5-Flash medium | Qwen | 1 | 4.0 | 1/3 | 146.5s |
| #27 | GPT-5.2 medium | OpenAI | 1 | 4.0 | 1/3 | 77.8s |
| #30 | Grok 4.1 Fast medium | X AI | 1 | 4.0 | 1/3 | 121.8s |
| #34 | GPT-5 Nano medium | OpenAI | 1 | 4.0 | 1/3 | 204.0s |
| #11 | Claude Sonnet 4.6 medium | Anthropic | 1 | 10.0 | 0/3 | 0ms |
| #14 | GLM 5 medium | Z.ai | 1 | 10.0 | 0/3 | 0ms |
| #23 | Seed-2.0-Mini medium | Bytedance Seed | 3 | 10.0 | 0/3 | 0ms |
| #28 | Kimi K2.5 medium | Moonshot AI | 1 | 10.0 | 0/3 | 137.3s |
| #32 | GPT-5 Mini medium | OpenAI | 1 | 10.0 | 0/3 | 44.6s |
| #35 | Qwen3.5-35B-A3B medium | Qwen | 2 | 10.0 | 0/3 | 88.3s |
| #43 | MiniMax M2.5 medium | Minimax | 1 | 10.0 | 0/3 | 237.3s |