Kategori AI BENCHY
Peringkat Pemanggilan alat
Lihat model AI mana yang paling baik di Pemanggilan alat, mana yang tetap andal, dan di mana kesenjangan terbesar muncul.
Model yang ditampilkan
15
Rata-rata Skor Pemanggilan alat
8.7
Model terbaik
Gemini 3 Flash Preview 10.0| Peringkat | Model | Perusahaan | Skor Pemanggilan alat | Skor | Tes benar | Waktu respons (rata-rata) |
|---|---|---|---|---|---|---|
| #35 | MiMo-V2-Omni medium | Xiaomi | 10.0 | 7.7 | 1/1 | 11.1s |
| #36 | GPT-5.3 Chat none | OpenAI | 10.0 | 7.7 | 1/1 | 8.36s |
| #37 | Claude Opus 4.6 medium | Anthropic | 10.0 | 7.6 | 1/1 | 9.73s |
| #38 | GPT-5.4 Nano medium | OpenAI | 10.0 | 7.6 | 1/1 | 7.71s |
| #39 | Seed-2.0-Mini medium | Bytedance Seed | 10.0 | 7.5 | 1/1 | 88.7s |
| #41 | MiMo-V2-Flash medium | Xiaomi | 10.0 | 7.5 | 1/1 | 27.8s |
| #42 | Claude Sonnet 4.6 none | Anthropic | 10.0 | 7.4 | 1/1 | 4.11s |
| #43 | Qwen3.5-35B-A3B medium | Qwen | 10.0 | 7.4 | 1/1 | 4.65s |
| #45 | GPT-5 Mini medium | OpenAI | 10.0 | 7.0 | 1/1 | 18.6s |
| #46 | Kimi K2.5 medium | Moonshot AI | 10.0 | 7.0 | 1/1 | 31.7s |
| #49 | Qwen3.5 Plus 2026-02-15 none | Qwen | 10.0 | 6.8 | 1/1 | 3.33s |
| #50 | Hunter Alpha medium | OpenRouter | 10.0 | 6.7 | 1/1 | 17.3s |
| #51 | Nemotron 3 Super medium | NVIDIA | 10.0 | 6.7 | 1/1 | 39.7s |
| #53 | GLM 5 none | Z.ai | 10.0 | 6.6 | 1/1 | 11.1s |
| #54 | Mercury 2 medium | Inception | 10.0 | 6.5 | 1/1 | 1.89s |