Kushindwa kwa AI BENCHY
Kushindwa kwa Hitilafu ya API
Ona ni modeli gani za AI hukutana na Hitilafu ya API mara nyingi zaidi ili utambue hatari za utegemevu kabla ya kuchagua. Panga kwa: Alama ↑.
| Nafasi | Modeli | Kampuni | Idadi ya Hitilafu ya API | Alama | Majaribio sahihi | Muda wa majibu (wastani) |
|---|---|---|---|---|---|---|
| #99 | Step 3.5 Flash none | Stepfun | 1 | 3.0 | 0/1 | 0ms |
| #98 | LFM2-24B-A2B none | Liquid | 4 | 4.1 | 1/16 | 811ms |
| #94 | MiMo-V2-Flash none | Xiaomi | 1 | 4.5 | 3/18 | 2.79s |
| #84 | gpt-oss-120b none | OpenAI | 3 | 5.2 | 4/18 | 12.0s |
| #73 | Mistral Small 4 medium | Mistral | 2 | 5.7 | 5/18 | 5.64s |
| #72 | Hunter Alpha none | OpenRouter | 1 | 5.7 | 6/18 | 4.58s |
| #56 | Grok 4.20 Multi Agent Beta medium | X AI | 2 | 6.4 | 7/18 | 9.80s |
| #51 | Nemotron 3 Super medium | NVIDIA | 1 | 6.7 | 9/18 | 19.1s |
| #50 | Hunter Alpha medium | OpenRouter | 1 | 6.7 | 8/18 | 10.3s |
| #48 | Gemma 4 31B none | 2 | 6.9 | 10/18 | 4.02s | |
| #47 | Grok 4.20 medium | X AI | 1 | 7.0 | 9/18 | 10.3s |
| #43 | Qwen3.5-35B-A3B medium | Qwen | 1 | 7.4 | 10/18 | 44.5s |
| #41 | MiMo-V2-Flash medium | Xiaomi | 1 | 7.5 | 11/18 | 23.4s |
| #33 | GLM 5.1 medium | Z.ai | 1 | 7.8 | 12/18 | 24.1s |
| #32 | Qwen3.5-Flash medium | Qwen | 1 | 7.8 | 11/18 | 66.7s |