Kushindwa kwa AI BENCHY
Kushindwa kwa Hitilafu ya API
Ona ni modeli gani za AI hukutana na Hitilafu ya API mara nyingi zaidi ili utambue hatari za utegemevu kabla ya kuchagua. Panga kwa: Majaribio sahihi ↑.
Modeli zilizoonyeshwa
15
Jumla ya kushindwa
144
Modeli iliyoathirika zaidi
Nemotron 3 Nano Omni 30b A3b Reasoning 6Kategoria
Katika kategoria Uandishi wa msimbo43 Katika kategoria Uchanganuzi na uchimbaji wa data16 Katika kategoria Mwito wa zana15 Katika kategoria Mbinu za kupinga AI13 Katika kategoria Mchanganyiko13 Katika kategoria Utatuzi wa mafumbo13 Katika kategoria Akili ya jumla12 Katika kategoria Maarifa ya jumla12 Katika kategoria Mahususi kwa domeni6 Katika kategoria Ufuataji wa maagizo1
| Nafasi | Modeli | Kampuni | Idadi ya Hitilafu ya API | Alama | Majaribio sahihi | Muda wa majibu (wastani) |
|---|---|---|---|---|---|---|
| #136 | Elephant Alpha medium | Openrouter | 3 | 5.1 | 6/21 | 1.27s |
| #138 | Ling-2.6-flash none | Inclusionai | 2 | 5.0 | 6/21 | 9.34s |
| #107 | Laguna Xs.2 medium | Poolside | 4 | 5.8 | 6/19 | 6.73s |
| #126 | gpt-oss-120b none | OpenAI | 3 | 5.4 | 6/19 | 21.6s |
| #113 | DeepSeek V4 Pro none | DeepSeek | 1 | 5.7 | 7/21 | 12.4s |
| #116 | Hunter Alpha none | OpenRouter | 1 | 5.7 | 6/18 | 4.70s |
| #119 | Cobuddy medium | Baidu | 1 | 5.6 | 7/21 | 39.9s |
| #120 | Mimo V2 PRO none | Xiaomi | 1 | 5.6 | 7/21 | 2.27s |
| #100 | Grok Build 0.1 none | X AI | 3 | 6.0 | 7/19 | 28.7s |
| #101 | Mimo V2 Omni none | Xiaomi | 1 | 6.0 | 8/21 | 2.44s |
| #103 | DeepSeek V4 Pro high | DeepSeek | 5 | 6.0 | 8/21 | 65.2s |
| #105 | Nemotron 3 Super medium | NVIDIA | 3 | 5.8 | 8/21 | 32.0s |
| #111 | Owl Alpha medium | Openrouter | 1 | 5.7 | 8/21 | 11.9s |
| #96 | Ring-2.6-1T none | Inclusionai | 5 | 6.2 | 9/21 | 55.1s |
| #79 | Hunter Alpha medium | OpenRouter | 1 | 6.7 | 8/18 | 10.3s |