Categorie AI BENCHY
Clasament Apelare instrumente
Vezi ce modele AI se descurcă cel mai bine la Apelare instrumente, care rămân fiabile și unde apar cele mai mari diferențe. Sortează după: Teste corecte ↓.
Modele afișate
15
Media pentru Scor Apelare instrumente
8.7
Cel mai bun model
Gemini 3 Flash Preview 10.0| Rang | Model | Companie | Scor Apelare instrumente | Scor | Teste corecte | Timp de răspuns (mediu) |
|---|---|---|---|---|---|---|
| #55 | MiMo-V2-Omni none | Xiaomi | 10.0 | 6.5 | 1/1 | 2.76s |
| #57 | GPT-5 Nano medium | OpenAI | 10.0 | 6.3 | 1/1 | 33.3s |
| #58 | GLM 5V Turbo none | Z.ai | 10.0 | 6.2 | 1/1 | 4.86s |
| #59 | Qwen3.5-Flash none | Qwen | 10.0 | 6.2 | 1/1 | 3.67s |
| #60 | Gemma 4 26B A4B none | 10.0 | 6.2 | 1/1 | 57.1s | |
| #61 | Seed-2.0-Lite none | Bytedance Seed | 10.0 | 6.2 | 1/1 | 3.94s |
| #62 | Gemini 2.5 Flash none | 10.0 | 6.2 | 1/1 | 1.91s | |
| #63 | Qwen3.5-35B-A3B none | Qwen | 10.0 | 6.1 | 1/1 | 2.30s |
| #64 | DeepSeek V3.2 none | DeepSeek | 10.0 | 6.1 | 1/1 | 11.8s |
| #65 | MiMo-V2-Pro none | Xiaomi | 10.0 | 6.0 | 1/1 | 4.39s |
| #66 | GPT-5.4 none | OpenAI | 10.0 | 5.9 | 1/1 | 2.75s |
| #67 | Qwen3.5-27B none | Qwen | 10.0 | 5.9 | 1/1 | 3.54s |
| #68 | gpt-oss-120b medium | OpenAI | 9.8 | 5.8 | 1/1 | 6.91s |
| #69 | Kimi K2.6 none | Moonshot AI | 10.0 | 5.8 | 1/1 | 4.46s |
| #70 | Qwen3.5-122B-A10B none | Qwen | 10.0 | 5.7 | 1/1 | 2.04s |