Kushindwa kwa kategoria za AI BENCHY
Utatuzi wa mafumbo: Hitilafu ya API
Utatuzi wa mafumbo
Hitilafu ya API
Ona ni modeli gani za AI zina uwezekano mkubwa wa kupata Hitilafu ya API katika Utatuzi wa mafumbo, ili uone udhaifu haraka. Panga kwa: Muda wa majibu (wastani) ↑.
Modeli zilizoonyeshwa
12
Jumla ya kushindwa
13
Modeli iliyoathirika zaidi
Nemotron 3 Nano Omni 30b A3b Reasoning 1Sababu za kushindwa
| Nafasi | Modeli | Kampuni | Idadi ya Hitilafu ya API | Alama ya kategoria | Majaribio sahihi | Muda wa majibu (wastani) |
|---|---|---|---|---|---|---|
| #162 | Nemotron 3 Nano Omni 30b A3b Reasoning none | NVIDIA | 1 | 3.0 | 0/3 | 532ms |
| #146 | Laguna Xs.2 none | Poolside | 1 | 5.3 | 1/3 | 650ms |
| #145 | Laguna M.1 none | Poolside | 1 | 3.0 | 0/3 | 891ms |
| #149 | Nemotron 3 Nano Omni 30b A3b Reasoning medium | NVIDIA | 1 | 2.9 | 0/3 | 1.40s |
| #160 | LFM2-24B-A2B none | Liquid | 1 | 3.8 | 0/3 | 1.78s |
| #107 | Laguna Xs.2 medium | Poolside | 1 | 5.3 | 1/3 | 1.93s |
| #133 | DeepSeek V3.2 none | DeepSeek | 1 | 7.6 | 2/3 | 6.91s |
| #89 | Hy3 preview low | Tencent | 1 | 5.3 | 1/3 | 7.51s |
| #93 | Qwen3.6 Plus Preview medium | Qwen | 2 | 5.3 | 1/3 | 7.52s |
| #92 | Laguna M.1 medium | Poolside | 1 | 5.3 | 1/3 | 10.2s |
| #82 | Hy3 preview high | Tencent | 1 | 7.7 | 2/3 | 27.9s |
| #103 | DeepSeek V4 Pro high | DeepSeek | 1 | 5.9 | 1/3 | 34.8s |