Kategori AI BENCHY
Peringkat Pemanggilan alat
Lihat model AI mana yang paling baik di Pemanggilan alat, mana yang tetap andal, dan di mana kesenjangan terbesar muncul. Urutkan berdasarkan: Waktu respons (rata-rata) ↑.
| Peringkat | Model | Perusahaan | Skor Pemanggilan alat | Skor | Tes benar | Waktu respons (rata-rata) |
|---|---|---|---|---|---|---|
| #125 | GPT-5.4 none | OpenAI | 10.0 | 5.5 | 1/1 | 2.75s |
| #137 | Elephant Alpha none | Openrouter | 3.0 | 5.1 | 0/1 | 2.79s |
| #32 | Gemini 3.5 Flash minimal | 10.0 | 7.7 | 1/1 | 2.79s | |
| #71 | Step 3.7 Flash high | Stepfun | 10.0 | 7.0 | 1/1 | 2.79s |
| #136 | Elephant Alpha medium | Openrouter | 3.0 | 5.1 | 0/1 | 2.83s |
| #90 | Gemini 3.1 Flash Lite none | 10.0 | 6.4 | 1/1 | 2.97s | |
| #104 | Nemotron 3 Ultra 550b A55b none | NVIDIA | 10.0 | 6.0 | 1/1 | 2.99s |
| #57 | Step 3.7 Flash low | Stepfun | 10.0 | 7.3 | 1/1 | 3.25s |
| #3 | Gemini 3.5 Flash low | 10.0 | 9.4 | 1/1 | 3.27s | |
| #123 | MiMo-V2.5-Pro none | Xiaomi | 10.0 | 5.5 | 1/1 | 3.30s |
| #95 | Qwen3.5 Plus 2026-02-15 none | Qwen | 10.0 | 6.3 | 1/1 | 3.33s |
| #48 | Gemini 3 Flash Preview none | 10.0 | 7.4 | 1/1 | 3.35s | |
| #58 | Gemini 3.1 Flash Lite Preview none | 10.0 | 7.2 | 1/1 | 3.39s | |
| #107 | Laguna Xs.2 medium | Poolside | 4.7 | 5.8 | 0/1 | 3.39s |
| #148 | GPT-5.4 Nano none | OpenAI | 10.0 | 4.7 | 1/1 | 3.40s |