Fallos AI BENCHY
Fallos por Formato extra
Mira qué modelos de IA se encuentran con Formato extra con más frecuencia para detectar riesgos de fiabilidad antes de elegir.
| Rango | Modelo | Empresa | Cantidad de Formato extra | Puntuación | Pruebas correctas | Tiempo de respuesta (promedio) |
|---|---|---|---|---|---|---|
| #37 | Claude Opus 4.6 medium | Anthropic | 4 | 7.6 | 12/18 | 21.1s |
| #42 | Claude Sonnet 4.6 none | Anthropic | 3 | 7.4 | 11/18 | 4.98s |
| #26 | Claude Sonnet 4.6 medium | Anthropic | 2 | 8.0 | 13/18 | 12.7s |
| #56 | Grok 4.20 Multi Agent Beta medium | X AI | 2 | 6.4 | 7/18 | 9.80s |
| #64 | DeepSeek V3.2 none | DeepSeek | 2 | 6.1 | 7/18 | 12.1s |
| #10 | Qwen3.5-27B medium | Qwen | 1 | 8.4 | 13/18 | 53.0s |
| #23 | MiMo-V2-Pro medium | Xiaomi | 1 | 8.1 | 12/18 | 12.3s |
| #35 | MiMo-V2-Omni medium | Xiaomi | 1 | 7.7 | 11/18 | 16.8s |
| #41 | MiMo-V2-Flash medium | Xiaomi | 1 | 7.5 | 11/18 | 23.4s |
| #47 | Grok 4.20 medium | X AI | 1 | 7.0 | 9/18 | 10.3s |
| #50 | Hunter Alpha medium | OpenRouter | 1 | 6.7 | 8/18 | 10.3s |
| #82 | Grok 4.20 none | X AI | 1 | 5.2 | 5/18 | 1.11s |
| #87 | Qwen3 Coder Next none | Qwen | 1 | 5.1 | 4/18 | 10.2s |
| #94 | MiMo-V2-Flash none | Xiaomi | 1 | 4.5 | 3/18 | 2.79s |
| #97 | Qwen3.5-9B medium | Qwen | 1 | 4.4 | 3/18 | 73.6s |