Kategori AI BENCHY
Peringkat Pemanggilan alat
Lihat model AI mana yang paling baik di Pemanggilan alat, mana yang tetap andal, dan di mana kesenjangan terbesar muncul.
Model yang ditampilkan
15
Rata-rata Skor Pemanggilan alat
8.7
Model terbaik
Gemini 3 Flash Preview 10.0| Peringkat | Model | Perusahaan | Skor Pemanggilan alat | Skor | Tes benar | Waktu respons (rata-rata) |
|---|---|---|---|---|---|---|
| #35 | Gemini 3 PRO Preview medium | 10.0 | 7.6 | 1/1 | 12.0s | |
| #36 | Qwen3.5 Plus 2026-04-20 medium | Qwen | 10.0 | 7.6 | 1/1 | 14.7s |
| #37 | Gemma 4 26B A4B medium | 10.0 | 7.6 | 1/1 | 9.01s | |
| #38 | Grok 4.3 medium | X AI | 10.0 | 7.6 | 1/1 | 17.7s |
| #39 | Qwen3.6 Flash medium | Qwen | 10.0 | 7.5 | 1/1 | 4.00s |
| #40 | Gemini 3.1 Flash Lite Preview medium | 10.0 | 7.5 | 1/1 | 3.80s | |
| #41 | Nemotron 3 Ultra 550b A55b medium | NVIDIA | 10.0 | 7.5 | 1/1 | 7.72s |
| #43 | MiMo-V2.5-Pro medium | Xiaomi | 10.0 | 7.5 | 1/1 | 16.9s |
| #44 | Gemini 3.1 Flash Lite medium | 10.0 | 7.5 | 1/1 | 4.55s | |
| #47 | Grok Build 0.1 medium | X AI | 10.0 | 7.4 | 1/1 | 13.1s |
| #48 | Gemini 3 Flash Preview none | 10.0 | 7.4 | 1/1 | 3.35s | |
| #49 | Qwen3.5-Flash medium | Qwen | 10.0 | 7.4 | 1/1 | 10.3s |
| #50 | Gemini 3.1 Flash Lite Preview low | 10.0 | 7.4 | 1/1 | 9.54s | |
| #51 | Mimo V2 PRO medium | Xiaomi | 10.0 | 7.4 | 1/1 | 8.19s |
| #52 | Claude Sonnet 4.6 medium | Anthropic | 10.0 | 7.4 | 1/1 | 7.48s |