Kegagalan kategori AI BENCHY
Spesifik domain
Kedaluwarsa
Spesifik domain
Kedaluwarsa
Lihat model AI mana yang paling mungkin mengalami Kedaluwarsa di Spesifik domain, agar Anda bisa menemukan titik lemahnya lebih cepat. Urutkan berdasarkan: Jumlah kegagalan ↑.
Alasan kegagalan terkait
Kategori terkait
| Peringkat | Model | Perusahaan | Jumlah Kedaluwarsa | Skor kategori | Tes benar | Waktu respons (rata-rata) |
|---|---|---|---|---|---|---|
| #4 | Qwen3.5 Plus 2026-02-15 medium | Qwen | 1 | 4.0 | 1/3 | 17.5s |
| #7 | Qwen3.5-27B medium | Qwen | 1 | 4.0 | 1/3 | 79.5s |
| #11 | Claude Sonnet 4.6 medium | Anthropic | 1 | 10.0 | 0/3 | 0ms |
| #14 | GLM 5 medium | Z.ai | 1 | 10.0 | 0/3 | 0ms |
| #18 | DeepSeek V3.2 medium | DeepSeek | 1 | 4.0 | 1/3 | 39.3s |
| #24 | Qwen3.5-Flash medium | Qwen | 1 | 4.0 | 1/3 | 146.5s |
| #27 | GPT-5.2 medium | OpenAI | 1 | 4.0 | 1/3 | 77.8s |
| #28 | Kimi K2.5 medium | Moonshot AI | 1 | 10.0 | 0/3 | 137.3s |
| #30 | Grok 4.1 Fast medium | X AI | 1 | 4.0 | 1/3 | 121.8s |
| #32 | GPT-5 Mini medium | OpenAI | 1 | 10.0 | 0/3 | 44.6s |
| #34 | GPT-5 Nano medium | OpenAI | 1 | 4.0 | 1/3 | 204.0s |
| #43 | MiniMax M2.5 medium | Minimax | 1 | 10.0 | 0/3 | 237.3s |
| #35 | Qwen3.5-35B-A3B medium | Qwen | 2 | 10.0 | 0/3 | 88.3s |
| #23 | Seed-2.0-Mini medium | Bytedance Seed | 3 | 10.0 | 0/3 | 0ms |