AI BENCHY Categorie
Toolaanroepen-ranglijst
Zie welke AI-modellen het best presteren op Toolaanroepen, welke betrouwbaar blijven en waar de grootste verschillen zitten. Sorteren op: Correcte tests โ.
169/169
Modellen filteren
Geen modellen komen overeen met de huidige zoekopdracht en filters.
| Rang | Model | Bedrijf | Toolaanroepen-score | Score | Totale kosten | Correcte tests | Responstijd (gem.) |
|---|---|---|---|---|---|---|---|
| #68 | Qwen3.7 Max none | Qwen | 10.0 | 6.9 | $0.054 | 1/1 | 3.92s |
| #70 | Qwen3.5-Flash medium | Qwen | 10.0 | 6.8 | $0.080 | 1/1 | 10.3s |
| #71 | Gemini 3.5 Flash minimal | 10.0 | 6.8 | $0.108 | 1/1 | 2.79s | |
| #72 | Ring-2.6-1T medium | Inclusionai | 10.0 | 6.8 | $0.033 | 1/1 | 104.4s |
| #73 | Mimo V2 Omni medium | Xiaomi | 10.0 | 6.8 | $0.683 | 1/1 | 14.0s |
| #74 | Hy3 preview high | Tencent | 10.0 | 6.8 | $0.059 | 1/1 | 78.8s |
| #76 | MiMo-V2.5 medium | Xiaomi | 10.0 | 6.7 | $0.063 | 1/1 | 7.29s |
| #77 | Mimo V2 PRO medium | Xiaomi | 10.0 | 6.7 | $0.333 | 1/1 | 8.19s |
| #78 | gpt-oss-120b medium | OpenAI | 9.8 | 6.7 | $0.013 | 1/1 | 6.91s |
| #79 | GPT-5 Nano medium | OpenAI | 10.0 | 6.7 | $0.081 | 1/1 | 33.3s |
| #80 | Step 3.5 Flash medium | Stepfun | 10.0 | 6.6 | $0.070 | 1/1 | 11.9s |
| #81 | Qwen3.6 27B medium | Qwen | 10.0 | 6.6 | $0.440 | 1/1 | 16.9s |
| #82 | Gemini 3.1 Flash Lite Preview low | 10.0 | 6.5 | $0.026 | 1/1 | 9.54s | |
| #83 | Gemini 3.1 Flash Lite high | 10.0 | 6.5 | $2.044 | 1/1 | 6.44s | |
| #84 | Gemini 3.1 Flash Lite Preview none | 10.0 | 6.4 | $0.018 | 1/1 | 3.39s |