Kushindwa kwa AI BENCHY
Kushindwa kwa Hitilafu ya API
Ona ni modeli gani za AI hukutana na Hitilafu ya API mara nyingi zaidi ili utambue hatari za utegemevu kabla ya kuchagua.
| Nafasi | Modeli | Kampuni | Idadi ya Hitilafu ya API | Alama | Majaribio sahihi | Muda wa majibu (wastani) |
|---|---|---|---|---|---|---|
| #98 | LFM2-24B-A2B none | Liquid | 4 | 4.1 | 1/16 | 811ms |
| #84 | gpt-oss-120b none | OpenAI | 3 | 5.2 | 4/18 | 12.0s |
| #14 | Gemma 4 31B medium | 2 | 8.3 | 13/18 | 24.9s | |
| #48 | Gemma 4 31B none | 2 | 6.9 | 10/18 | 4.02s | |
| #56 | Grok 4.20 Multi Agent Beta medium | X AI | 2 | 6.4 | 7/18 | 9.80s |
| #73 | Mistral Small 4 medium | Mistral | 2 | 5.7 | 5/18 | 5.64s |
| #12 | Gemini 3 PRO Preview medium | 1 | 8.4 | 14/18 | 9.06s | |
| #20 | Qwen3.6 Plus medium | Qwen | 1 | 8.1 | 13/18 | 15.3s |
| #32 | Qwen3.5-Flash medium | Qwen | 1 | 7.8 | 11/18 | 66.7s |
| #33 | GLM 5.1 medium | Z.ai | 1 | 7.8 | 12/18 | 24.1s |
| #41 | MiMo-V2-Flash medium | Xiaomi | 1 | 7.5 | 11/18 | 23.4s |
| #43 | Qwen3.5-35B-A3B medium | Qwen | 1 | 7.4 | 10/18 | 44.5s |
| #47 | Grok 4.20 medium | X AI | 1 | 7.0 | 9/18 | 10.3s |
| #50 | Hunter Alpha medium | OpenRouter | 1 | 6.7 | 8/18 | 10.3s |
| #51 | Nemotron 3 Super medium | NVIDIA | 1 | 6.7 | 9/18 | 19.1s |