Kategori AI BENCHY
Peringkat Spesifik domain
Lihat model AI mana yang paling baik di Spesifik domain, mana yang tetap andal, dan di mana kesenjangan terbesar muncul.
Model yang ditampilkan
15
Rata-rata Skor Spesifik domain
4.8
Model terbaik
Gemini 3 Flash Preview 10.0| Peringkat | Model | Perusahaan | Skor Spesifik domain | Skor | Tes benar | Waktu respons (rata-rata) |
|---|---|---|---|---|---|---|
| #38 | Grok 4.3 medium | X AI | 5.3 | 7.6 | 1/3 | 181.7s |
| #46 | Qwen3.6 35B A3B medium | Qwen | 5.3 | 7.4 | 1/3 | 22.5s |
| #49 | Qwen3.5-Flash medium | Qwen | 5.3 | 7.4 | 1/3 | 146.5s |
| #57 | Step 3.7 Flash low | Stepfun | 5.3 | 7.3 | 1/3 | 43.3s |
| #59 | GLM 5V Turbo medium | Z.ai | 5.3 | 7.2 | 1/3 | 38.1s |
| #60 | Kimi K2.6 medium | Moonshot AI | 5.3 | 7.2 | 1/3 | 202.4s |
| #62 | Step 3.5 Flash medium | Stepfun | 5.3 | 7.2 | 1/3 | 170.5s |
| #68 | Claude Opus 4.8 none | Anthropic | 5.3 | 7.0 | 1/3 | 1.66s |
| #82 | Hy3 preview high | Tencent | 5.3 | 6.6 | 1/3 | 109.0s |
| #92 | Laguna M.1 medium | Poolside | 5.3 | 6.4 | 1/3 | 24.1s |
| #96 | Ring-2.6-1T none | Inclusionai | 5.3 | 6.2 | 1/3 | 73.4s |
| #120 | Mimo V2 PRO none | Xiaomi | 5.3 | 5.6 | 1/3 | 1.78s |
| #124 | Kimi K2.6 none | Moonshot AI | 5.3 | 5.5 | 1/3 | 1.48s |
| #125 | GPT-5.4 none | OpenAI | 5.3 | 5.5 | 1/3 | 1.07s |
| #132 | Mistral Small 4 medium | Mistral | 5.3 | 5.3 | 1/3 | 6.11s |