Kushindwa kwa kategoria za AI BENCHY
Akili ya jumla: Hitilafu ya API
Akili ya jumla
Hitilafu ya API
Ona ni modeli gani za AI zina uwezekano mkubwa wa kupata Hitilafu ya API katika Akili ya jumla, ili uone udhaifu haraka.
Modeli zilizoonyeshwa
12
Jumla ya kushindwa
12
Modeli iliyoathirika zaidi
Nemotron 3 Ultra 550b A55b 1Sababu za kushindwa
| Nafasi | Modeli | Kampuni | Idadi ya Hitilafu ya API | Alama ya kategoria | Majaribio sahihi | Muda wa majibu (wastani) |
|---|---|---|---|---|---|---|
| #41 | Nemotron 3 Ultra 550b A55b medium | NVIDIA | 1 | 3.7 | 0/1 | 2.52s |
| #72 | DeepSeek V3.2 medium | DeepSeek | 1 | 3.4 | 0/1 | 58.3s |
| #82 | Hy3 preview high | Tencent | 1 | 3.0 | 0/1 | 0ms |
| #89 | Hy3 preview low | Tencent | 1 | 3.0 | 0/1 | 0ms |
| #92 | Laguna M.1 medium | Poolside | 1 | 3.0 | 0/1 | 0ms |
| #93 | Qwen3.6 Plus Preview medium | Qwen | 1 | 3.0 | 0/1 | 0ms |
| #107 | Laguna Xs.2 medium | Poolside | 1 | 3.0 | 0/1 | 0ms |
| #133 | DeepSeek V3.2 none | DeepSeek | 1 | 4.7 | 0/1 | 9.32s |
| #145 | Laguna M.1 none | Poolside | 1 | 3.0 | 0/1 | 0ms |
| #146 | Laguna Xs.2 none | Poolside | 1 | 3.0 | 0/1 | 0ms |
| #149 | Nemotron 3 Nano Omni 30b A3b Reasoning medium | NVIDIA | 1 | 3.0 | 0/1 | 0ms |
| #162 | Nemotron 3 Nano Omni 30b A3b Reasoning none | NVIDIA | 1 | 3.0 | 0/1 | 0ms |