Kegagalan kategori AI BENCHY
Pemecahan teka-teki: Jawaban salah
Pemecahan teka-teki
Jawaban salah
Lihat model AI mana yang paling mungkin mengalami Jawaban salah di Pemecahan teka-teki, agar Anda bisa menemukan titik lemahnya lebih cepat. Urutkan berdasarkan: Waktu respons (rata-rata) ↓.
Model yang ditampilkan
15
Total kegagalan
85
Model yang paling terdampak
Gemini 3.1 Flash Lite Preview 1Alasan kegagalan
| Peringkat | Model | Perusahaan | Jumlah Jawaban salah | Skor kategori | Tes benar | Waktu respons (rata-rata) |
|---|---|---|---|---|---|---|
| #11 | Gemini 3.1 Flash Lite Preview high | 1 | 7.7 | 2/3 | 46.3s | |
| #46 | Kimi K2.5 medium | Moonshot AI | 1 | 5.3 | 1/3 | 45.4s |
| #27 | DeepSeek V3.2 medium | DeepSeek | 1 | 8.2 | 2/3 | 36.9s |
| #43 | Qwen3.5-35B-A3B medium | Qwen | 1 | 6.4 | 1/3 | 31.6s |
| #39 | Seed-2.0-Mini medium | Bytedance Seed | 1 | 8.2 | 2/3 | 25.9s |
| #80 | MiniMax M2.7 medium | Minimax | 1 | 3.8 | 0/3 | 25.6s |
| #34 | Kimi K2.6 medium | Moonshot AI | 1 | 5.0 | 0/3 | 25.6s |
| #33 | GLM 5.1 medium | Z.ai | 1 | 8.2 | 2/3 | 23.8s |
| #87 | Qwen3 Coder Next none | Qwen | 3 | 3.2 | 0/3 | 22.9s |
| #57 | GPT-5 Nano medium | OpenAI | 1 | 5.3 | 1/3 | 19.8s |
| #45 | GPT-5 Mini medium | OpenAI | 1 | 5.6 | 1/3 | 14.1s |
| #93 | GLM 4.7 Flash medium | Z.ai | 2 | 2.9 | 0/3 | 12.9s |
| #68 | gpt-oss-120b medium | OpenAI | 1 | 3.2 | 0/3 | 11.8s |
| #71 | MiniMax M2.5 medium | Minimax | 1 | 5.3 | 1/3 | 11.5s |
| #51 | Nemotron 3 Super medium | NVIDIA | 1 | 3.5 | 0/3 | 8.39s |