Fallos AI BENCHY
Fallos por Error de API
Mira qué modelos de IA se encuentran con Error de API con más frecuencia para detectar riesgos de fiabilidad antes de elegir. Ordenar por: Pruebas correctas ↑.
| Rango | Modelo | Empresa | Cantidad de Error de API | Puntuación | Pruebas correctas | Tiempo de respuesta (promedio) |
|---|---|---|---|---|---|---|
| #99 | Step 3.5 Flash none | Stepfun | 1 | 3.0 | 0/1 | 0ms |
| #98 | LFM2-24B-A2B none | Liquid | 4 | 4.1 | 1/16 | 811ms |
| #94 | MiMo-V2-Flash none | Xiaomi | 1 | 4.5 | 3/18 | 2.79s |
| #84 | gpt-oss-120b none | OpenAI | 3 | 5.2 | 4/18 | 12.0s |
| #73 | Mistral Small 4 medium | Mistral | 2 | 5.7 | 5/18 | 5.64s |
| #72 | Hunter Alpha none | OpenRouter | 1 | 5.7 | 6/18 | 4.58s |
| #56 | Grok 4.20 Multi Agent Beta medium | X AI | 2 | 6.4 | 7/18 | 9.80s |
| #50 | Hunter Alpha medium | OpenRouter | 1 | 6.7 | 8/18 | 10.3s |
| #47 | Grok 4.20 medium | X AI | 1 | 7.0 | 9/18 | 10.3s |
| #51 | Nemotron 3 Super medium | NVIDIA | 1 | 6.7 | 9/18 | 19.1s |
| #43 | Qwen3.5-35B-A3B medium | Qwen | 1 | 7.4 | 10/18 | 44.5s |
| #48 | Gemma 4 31B none | 2 | 6.9 | 10/18 | 4.02s | |
| #32 | Qwen3.5-Flash medium | Qwen | 1 | 7.8 | 11/18 | 66.7s |
| #41 | MiMo-V2-Flash medium | Xiaomi | 1 | 7.5 | 11/18 | 23.4s |
| #33 | GLM 5.1 medium | Z.ai | 1 | 7.8 | 12/18 | 24.1s |