Kategori AI BENCHY
Peringkat Spesifik domain
Lihat model AI mana yang paling baik di Spesifik domain, mana yang tetap andal, dan di mana kesenjangan terbesar muncul.
Model yang ditampilkan
15
Rata-rata Skor Spesifik domain
4.8
Model terbaik
Gemini 3 Flash Preview 10.0| Peringkat | Model | Perusahaan | Skor Spesifik domain | Skor | Tes benar | Waktu respons (rata-rata) |
|---|---|---|---|---|---|---|
| #157 | Grok 4.1 Fast none | X AI | 5.9 | 4.4 | 1/3 | 1.06s |
| #160 | LFM2-24B-A2B none | Liquid | 5.9 | 4.2 | 1/3 | 287ms |
| #86 | Grok 4.1 Fast medium | X AI | 5.8 | 6.5 | 1/3 | 121.8s |
| #67 | MiniMax M3 medium | Minimax | 5.5 | 7.1 | 1/3 | 233.1s |
| #6 | GPT-5.5 low | OpenAI | 5.3 | 9.0 | 1/3 | 28.1s |
| #10 | Claude Opus 4.8 medium | Anthropic | 5.3 | 8.7 | 1/3 | 14.2s |
| #12 | Gemini 3.1 Flash Lite Preview high | 5.3 | 8.6 | 1/3 | 127.6s | |
| #13 | Grok 4.20 Beta medium | X AI | 5.3 | 8.5 | 1/3 | 21.3s |
| #24 | GPT-5.2 Chat none | OpenAI | 5.3 | 7.9 | 1/3 | 17.8s |
| #25 | Qwen3.5 Plus 2026-02-15 medium | Qwen | 5.3 | 7.9 | 1/3 | 17.5s |
| #30 | Qwen3.5-27B medium | Qwen | 5.3 | 7.8 | 1/3 | 79.5s |
| #33 | Hy3 preview medium | Tencent | 5.3 | 7.7 | 1/3 | 22.3s |
| #35 | Gemini 3 PRO Preview medium | 5.3 | 7.6 | 1/3 | 7.01s | |
| #43 | MiMo-V2.5-Pro medium | Xiaomi | 5.3 | 7.5 | 1/3 | 37.9s |
| #47 | Grok Build 0.1 medium | X AI | 5.3 | 7.4 | 1/3 | 158.0s |