Fallos AI BENCHY
Fallos por Llamada de herramienta no válida
Mira qué modelos de IA se encuentran con Llamada de herramienta no válida con más frecuencia para detectar riesgos de fiabilidad antes de elegir. Ordenar por: Tiempo de respuesta (promedio) ↓.
| Rango | Modelo | Empresa | Cantidad de Llamada de herramienta no válida | Puntuación | Pruebas correctas | Tiempo de respuesta (promedio) |
|---|---|---|---|---|---|---|
| #129 | MiniMax M2.5 medium | Minimax | 1 | 5.3 | 5/21 | 65.4s |
| #78 | Qwen3.6 27B medium | Qwen | 1 | 6.8 | 10/21 | 59.7s |
| #119 | Cobuddy medium | Baidu | 1 | 5.6 | 7/21 | 39.9s |
| #130 | MiniMax M2.7 medium | Minimax | 1 | 5.3 | 5/21 | 38.2s |
| #158 | GLM 4.7 Flash medium | Z.ai | 1 | 4.4 | 4/21 | 35.1s |
| #139 | DeepSeek V4 Flash none | DeepSeek | 1 | 5.0 | 5/21 | 26.8s |
| #59 | GLM 5V Turbo medium | Z.ai | 2 | 7.2 | 11/21 | 23.1s |
| #133 | DeepSeek V3.2 none | DeepSeek | 1 | 5.2 | 6/21 | 13.8s |
| #138 | Ling-2.6-flash none | Inclusionai | 2 | 5.0 | 6/21 | 9.34s |
| #159 | Ling-2.6-1T none | Inclusionai | 1 | 4.3 | 3/21 | 7.72s |
| #107 | Laguna Xs.2 medium | Poolside | 1 | 5.8 | 6/19 | 6.73s |
| #112 | GLM 5.1 none | Z.ai | 1 | 5.7 | 7/21 | 4.10s |
| #118 | Qwen3.6 27B none | Qwen | 1 | 5.6 | 7/21 | 3.72s |
| #145 | Laguna M.1 none | Poolside | 1 | 4.8 | 4/19 | 2.89s |
| #122 | GLM 4.7 Flash none | Z.ai | 1 | 5.5 | 6/21 | 2.86s |