AI BENCHY Categorie
Toolaanroepen-ranglijst
Zie welke AI-modellen het best presteren op Toolaanroepen, welke betrouwbaar blijven en waar de grootste verschillen zitten. Sorteren op: Correcte tests โ.
169/169
Modellen filteren
Geen modellen komen overeen met de huidige zoekopdracht en filters.
| Rang | Model | Bedrijf | Toolaanroepen-score | Score | Totale kosten | Correcte tests | Responstijd (gem.) |
|---|---|---|---|---|---|---|---|
| #116 | GLM 5.1 none | Z.ai | 10.0 | 5.6 | $0.058 | 1/1 | 10.7s |
| #117 | DeepSeek V4 Flash none | DeepSeek | 10.0 | 5.5 | $0.007 | 1/1 | 77.9s |
| #118 | Kimi K2.5 none | Moonshot AI | 10.0 | 5.5 | $0.027 | 1/1 | 14.0s |
| #119 | MiMo-V2.5-Pro none | Xiaomi | 10.0 | 5.5 | $0.017 | 1/1 | 3.30s |
| #120 | Qwen3.6 27B none | Qwen | 9.5 | 5.5 | $0.028 | 1/1 | 6.74s |
| #121 | Gemma 4 26B A4B none | 10.0 | 5.5 | $0.004 | 1/1 | 57.1s | |
| #122 | Qwen3.5 Plus 2026-04-20 none | Qwen | 10.0 | 5.5 | $0.032 | 1/1 | 4.42s |
| #123 | GLM 5 Turbo none | Z.ai | 10.0 | 5.3 | $0.047 | 1/1 | 8.21s |
| #125 | Qwen3.5-122B-A10B none | Qwen | 10.0 | 5.3 | $0.020 | 1/1 | 2.04s |
| #126 | DeepSeek V3.2 none | DeepSeek | 10.0 | 5.3 | $0.017 | 1/1 | 11.8s |
| #129 | Mistral Small 4 none | Mistral | 10.0 | 5.1 | $0.007 | 1/1 | 1.40s |
| #130 | Qwen3 Coder Next none | Qwen | 10.0 | 5.1 | $0.009 | 1/1 | 2.47s |
| #131 | North Mini Code none | Cohere | 9.5 | 5.1 | $0.000 | 1/1 | 3.64s |
| #132 | Hunter Alpha medium | OpenRouter | 10.0 | 5.1 | $0.000 | 1/1 | 17.3s |
| #133 | Mistral Small 4 medium | Mistral | 10.0 | 5.1 | $0.068 | 1/1 | 3.50s |