AI BENCHY श्रेणी
टूल कॉलिंग रैंकिंग
देखें कि टूल कॉलिंग में कौन से AI मॉडल सबसे अच्छा प्रदर्शन करते हैं, कौन से भरोसेमंद बने रहते हैं और सबसे बड़े अंतर कहाँ दिखाई देते हैं। क्रमबद्ध करें: प्रतिक्रिया समय (औसत) ↓.
संबंधित विफलता कारण
| रैंक | मॉडल | कंपनी | टूल कॉलिंग स्कोर | औसत स्कोर | सही परीक्षण | प्रतिक्रिया समय (औसत) |
|---|---|---|---|---|---|---|
| #23 | Seed-2.0-Mini medium | Bytedance Seed | 10.0 | 6.9 | 1/1 | 88.7s |
| #18 | DeepSeek V3.2 medium | DeepSeek | 10.0 | 7.3 | 1/1 | 34.8s |
| #34 | GPT-5 Nano medium | OpenAI | 10.0 | 5.5 | 1/1 | 33.3s |
| #28 | Kimi K2.5 medium | Moonshot AI | 10.0 | 6.4 | 1/1 | 31.7s |
| #21 | MiMo-V2-Flash medium | Xiaomi | 10.0 | 7.2 | 1/1 | 27.8s |
| #30 | Grok 4.1 Fast medium | X AI | 10.0 | 6.2 | 0/1 | 27.7s |
| #2 | Gemini 3.1 Pro Preview medium | 10.0 | 9.4 | 1/1 | 23.1s | |
| #32 | GPT-5 Mini medium | OpenAI | 10.0 | 6.0 | 1/1 | 18.6s |
| #52 | GLM 4.7 Flash medium | Z.ai | 10.0 | 3.1 | 1/1 | 15.9s |
| #14 | GLM 5 medium | Z.ai | 10.0 | 7.4 | 1/1 | 15.9s |
| #43 | MiniMax M2.5 medium | Minimax | 10.0 | 4.7 | 1/1 | 15.4s |
| #46 | Kimi K2.5 none | Moonshot AI | 10.0 | 4.1 | 1/1 | 14.0s |
| #9 | GPT-5.4 medium | OpenAI | 10.0 | 8.0 | 1/1 | 13.3s |
| #6 | Gemini 3 Pro Preview medium | 10.0 | 8.2 | 1/1 | 12.0s | |
| #13 | Step 3.5 Flash medium | Stepfun | 10.0 | 7.4 | 1/1 | 11.9s |
| #33 | DeepSeek V3.2 none | DeepSeek | 10.0 | 5.5 | 1/1 | 11.8s |
| #31 | GLM 5 none | Z.ai | 10.0 | 6.0 | 1/1 | 11.1s |
| #1 | Gemini 3 Flash Preview medium | 10.0 | 10.0 | 1/1 | 10.6s | |
| #24 | Qwen3.5-Flash medium | Qwen | 10.0 | 6.9 | 1/1 | 10.3s |
| #27 | GPT-5.2 medium | OpenAI | 10.0 | 6.5 | 0/1 | 10.3s |
| #26 | Claude Opus 4.6 medium | Anthropic | 10.0 | 6.6 | 1/1 | 9.73s |
| #17 | Gemini 3.1 Flash Lite Preview low | 10.0 | 7.3 | 1/1 | 9.54s | |
| #19 | GPT-5.3 Chat none | OpenAI | 10.0 | 7.3 | 1/1 | 8.36s |
| #8 | Gemini 3.1 Flash Lite Preview high | 10.0 | 8.2 | 1/1 | 7.73s | |
| #4 | Qwen3.5 Plus 2026-02-15 medium | Qwen | 10.0 | 8.3 | 1/1 | 7.54s |
| #11 | Claude Sonnet 4.6 medium | Anthropic | 10.0 | 7.7 | 1/1 | 7.48s |
| #7 | Qwen3.5-27B medium | Qwen | 10.0 | 8.2 | 1/1 | 7.45s |
| #49 | GLM 4.7 Flash none | Z.ai | 10.0 | 3.9 | 0/1 | 7.05s |
| #39 | gpt-oss-120b medium | OpenAI | 9.0 | 5.1 | 1/1 | 6.91s |
| #45 | Trinity Large Preview none | Arcee AI | 10.0 | 4.2 | 1/1 | 6.67s |
| #3 | GPT-5.3-Codex medium | OpenAI | 10.0 | 8.4 | 1/1 | 6.37s |
| #16 | Gemini 2.5 Flash medium | 10.0 | 7.4 | 1/1 | 6.20s | |
| #53 | Grok 4.1 Fast none | X AI | 10.0 | 2.9 | 0/1 | 5.51s |
| #5 | Gemini 3 Flash Preview low | 10.0 | 8.2 | 1/1 | 4.99s | |
| #15 | GPT-5.2 Chat none | OpenAI | 10.0 | 7.4 | 1/1 | 4.68s |
| #35 | Qwen3.5-35B-A3B medium | Qwen | 10.0 | 5.5 | 1/1 | 4.65s |
| #10 | Qwen3.5-122B-A10B medium | Qwen | 10.0 | 7.7 | 1/1 | 4.60s |
| #25 | Claude Sonnet 4.6 none | Anthropic | 10.0 | 6.8 | 1/1 | 4.11s |
| #12 | Gemini 3.1 Flash Lite Preview medium | 10.0 | 7.5 | 1/1 | 3.80s | |
| #37 | Qwen3.5-Flash none | Qwen | 10.0 | 5.2 | 1/1 | 3.67s |
| #41 | Qwen3.5-27B none | Qwen | 10.0 | 4.9 | 1/1 | 3.54s |
| #22 | Gemini 3.1 Flash Lite Preview none | 10.0 | 7.1 | 1/1 | 3.39s | |
| #20 | Gemini 3 Flash Preview none | 10.0 | 7.2 | 1/1 | 3.35s | |
| #29 | Qwen3.5 Plus 2026-02-15 none | Qwen | 10.0 | 6.2 | 1/1 | 3.33s |
| #44 | GPT-5.4 none | OpenAI | 10.0 | 4.5 | 1/1 | 2.75s |
| #50 | Qwen3 Coder Next medium | Qwen | 10.0 | 3.5 | 1/1 | 2.64s |
| #47 | GPT-4o-mini none | OpenAI | 10.0 | 4.0 | 1/1 | 2.51s |
| #48 | Qwen3 Coder Next none | Qwen | 10.0 | 4.0 | 1/1 | 2.47s |
| #42 | Qwen3.5-35B-A3B none | Qwen | 10.0 | 4.7 | 1/1 | 2.30s |
| #54 | MiMo-V2-Flash none | Xiaomi | 10.0 | 2.9 | 1/1 | 2.28s |
| #40 | Qwen3.5-122B-A10B none | Qwen | 10.0 | 5.0 | 1/1 | 2.04s |
| #38 | Gemini 2.5 Flash none | 10.0 | 5.2 | 1/1 | 1.91s | |
| #36 | Mercury 2 medium | Inception | 10.0 | 5.3 | 1/1 | 1.89s |
| #51 | Mercury 2 none | Inception | 10.0 | 3.4 | 1/1 | 1.27s |
| #55 | LFM2-24B-A2B none | Liquid | 10.0 | 2.6 | 0/1 | 0ms |