Kegagalan kategori AI BENCHY
Kecerdasan umum: Tidak mengikuti instruksi
Kecerdasan umum
Tidak mengikuti instruksi
Lihat model AI mana yang paling mungkin mengalami Tidak mengikuti instruksi di Kecerdasan umum, agar Anda bisa menemukan titik lemahnya lebih cepat.
Alasan kegagalan
| Peringkat | Model | Perusahaan | Jumlah Tidak mengikuti instruksi | Skor kategori | Tes benar | Waktu respons (rata-rata) |
|---|---|---|---|---|---|---|
| #6 | Seed-2.0-Lite medium | Bytedance Seed | 1 | 6.7 | 0/1 | 18.2s |
| #7 | GPT-5.3-Codex medium | OpenAI | 1 | 4.6 | 0/1 | 4.87s |
| #9 | Qwen3.6 Plus Preview medium | Qwen | 1 | 5.1 | 0/1 | 27.1s |
| #10 | Qwen3.5-27B medium | Qwen | 1 | 6.1 | 0/1 | 101.4s |
| #13 | GLM 5 medium | Z.ai | 1 | 6.1 | 0/1 | 14.7s |
| #15 | Gemini 2.5 Flash medium | 1 | 4.8 | 0/1 | 4.86s | |
| #16 | GPT-5.4 medium | OpenAI | 1 | 4.7 | 0/1 | 4.92s |
| #20 | Qwen3.6 Plus medium | Qwen | 1 | 5.1 | 0/1 | 27.1s |
| #22 | Gemini 3.1 Flash Lite Preview low | 1 | 4.0 | 0/1 | 1.54s | |
| #27 | DeepSeek V3.2 medium | DeepSeek | 1 | 5.4 | 0/1 | 31.3s |
| #28 | GPT-5.2 Chat none | OpenAI | 1 | 4.4 | 0/1 | 3.20s |
| #29 | Gemini 3.1 Flash Lite Preview none | 1 | 4.0 | 0/1 | 741ms | |
| #30 | Step 3.5 Flash medium | Stepfun | 1 | 5.5 | 0/1 | 6.54s |
| #32 | Qwen3.5-Flash medium | Qwen | 1 | 6.1 | 0/1 | 40.1s |
| #36 | GPT-5.3 Chat none | OpenAI | 1 | 4.6 | 0/1 | 1.99s |