Kategori AI BENCHY
Peringkat Kepatuhan instruksi
Lihat model AI mana yang paling baik di Kepatuhan instruksi, mana yang tetap andal, dan di mana kesenjangan terbesar muncul. Urutkan berdasarkan: Waktu respons (rata-rata) ↑.
| Peringkat | Model | Perusahaan | Skor Kepatuhan instruksi | Skor | Tes benar | Waktu respons (rata-rata) |
|---|---|---|---|---|---|---|
| #104 | Nemotron 3 Ultra 550b A55b none | NVIDIA | 10.0 | 6.0 | 2/2 | 1.46s |
| #98 | GLM 5 none | Z.ai | 10.0 | 6.1 | 2/2 | 1.48s |
| #50 | Gemini 3.1 Flash Lite Preview low | 10.0 | 7.4 | 2/2 | 1.49s | |
| #71 | Step 3.7 Flash high | Stepfun | 9.8 | 7.0 | 2/2 | 1.52s |
| #133 | DeepSeek V3.2 none | DeepSeek | 10.0 | 5.2 | 2/2 | 1.52s |
| #61 | Gemini 3.1 Flash Lite low | 10.0 | 7.2 | 2/2 | 1.52s | |
| #11 | Claude Opus 4.7 medium | Anthropic | 10.0 | 8.7 | 2/2 | 1.57s |
| #48 | Gemini 3 Flash Preview none | 6.4 | 7.4 | 1/2 | 1.58s | |
| #57 | Step 3.7 Flash low | Stepfun | 9.8 | 7.3 | 2/2 | 1.58s |
| #124 | Kimi K2.6 none | Moonshot AI | 6.5 | 5.5 | 1/2 | 1.64s |
| #95 | Qwen3.5 Plus 2026-02-15 none | Qwen | 10.0 | 6.3 | 2/2 | 1.67s |
| #107 | Laguna Xs.2 medium | Poolside | 10.0 | 5.8 | 2/2 | 1.68s |
| #56 | MiMo-V2.5 medium | Xiaomi | 9.9 | 7.3 | 2/2 | 1.80s |
| #22 | Step 3.7 Flash medium | Stepfun | 9.8 | 8.0 | 2/2 | 1.83s |
| #153 | Qwen3.6 35B A3B none | Qwen | 6.2 | 4.6 | 1/2 | 1.86s |