Kategori AI BENCHY
Peringkat Pemanggilan alat
Lihat model AI mana yang paling baik di Pemanggilan alat, mana yang tetap andal, dan di mana kesenjangan terbesar muncul.
Model yang ditampilkan
15
Rata-rata Skor Pemanggilan alat
8.7
Model terbaik
Gemini 3 Flash Preview 10.0| Peringkat | Model | Perusahaan | Skor Pemanggilan alat | Skor | Tes benar | Waktu respons (rata-rata) |
|---|---|---|---|---|---|---|
| #1 | Gemini 3 Flash Preview medium | 10.0 | 10.0 | 1/1 | 10.6s | |
| #2 | Gemini 3.1 Pro Preview medium | 10.0 | 9.6 | 1/1 | 23.1s | |
| #3 | Claude Opus 4.7 medium | Anthropic | 10.0 | 9.2 | 1/1 | 4.17s |
| #4 | Claude Opus 4.7 none | Anthropic | 10.0 | 9.2 | 1/1 | 4.74s |
| #5 | Gemini 3 Flash Preview low | 10.0 | 8.8 | 1/1 | 4.99s | |
| #6 | Seed-2.0-Lite medium | Bytedance Seed | 10.0 | 8.6 | 1/1 | 12.4s |
| #7 | GPT-5.3-Codex medium | OpenAI | 10.0 | 8.6 | 1/1 | 6.37s |
| #8 | Qwen3.5 Plus 2026-02-15 medium | Qwen | 10.0 | 8.5 | 1/1 | 7.54s |
| #9 | Qwen3.6 Plus Preview medium | Qwen | 10.0 | 8.5 | 1/1 | 5.87s |
| #10 | Qwen3.5-27B medium | Qwen | 10.0 | 8.4 | 1/1 | 7.45s |
| #11 | Gemini 3.1 Flash Lite Preview high | 10.0 | 8.4 | 1/1 | 7.73s | |
| #12 | Gemini 3 PRO Preview medium | 10.0 | 8.4 | 1/1 | 12.0s | |
| #13 | GLM 5 medium | Z.ai | 10.0 | 8.4 | 1/1 | 15.9s |
| #15 | Gemini 2.5 Flash medium | 10.0 | 8.2 | 1/1 | 6.20s | |
| #16 | GPT-5.4 medium | OpenAI | 10.0 | 8.2 | 1/1 | 13.3s |