Fallos AI BENCHY
Fallos por Error de API
Mira qué modelos de IA se encuentran con Error de API con más frecuencia para detectar riesgos de fiabilidad antes de elegir. Ordenar por: Cantidad de fallos ↑.
| Rango | Modelo | Empresa | Cantidad de Error de API | Puntuación | Pruebas correctas | Tiempo de respuesta (promedio) |
|---|---|---|---|---|---|---|
| #12 | Gemini 3 PRO Preview medium | 1 | 8.4 | 14/18 | 9.06s | |
| #20 | Qwen3.6 Plus medium | Qwen | 1 | 8.1 | 13/18 | 15.3s |
| #32 | Qwen3.5-Flash medium | Qwen | 1 | 7.8 | 11/18 | 66.7s |
| #33 | GLM 5.1 medium | Z.ai | 1 | 7.8 | 12/18 | 24.1s |
| #41 | MiMo-V2-Flash medium | Xiaomi | 1 | 7.5 | 11/18 | 23.4s |
| #43 | Qwen3.5-35B-A3B medium | Qwen | 1 | 7.4 | 10/18 | 44.5s |
| #47 | Grok 4.20 medium | X AI | 1 | 7.0 | 9/18 | 10.3s |
| #50 | Hunter Alpha medium | OpenRouter | 1 | 6.7 | 8/18 | 10.3s |
| #51 | Nemotron 3 Super medium | NVIDIA | 1 | 6.7 | 9/18 | 19.1s |
| #72 | Hunter Alpha none | OpenRouter | 1 | 5.7 | 6/18 | 4.58s |
| #94 | MiMo-V2-Flash none | Xiaomi | 1 | 4.5 | 3/18 | 2.79s |
| #99 | Step 3.5 Flash none | Stepfun | 1 | 3.0 | 0/1 | 0ms |
| #14 | Gemma 4 31B medium | 2 | 8.3 | 13/18 | 24.9s | |
| #48 | Gemma 4 31B none | 2 | 6.9 | 10/18 | 4.02s | |
| #56 | Grok 4.20 Multi Agent Beta medium | X AI | 2 | 6.4 | 7/18 | 9.80s |