AI BENCHY श्रेणी
टूल कॉलिंग रैंकिंग
देखें कि टूल कॉलिंग में कौन से AI मॉडल सबसे अच्छा प्रदर्शन करते हैं, कौन से भरोसेमंद बने रहते हैं और सबसे बड़े अंतर कहाँ दिखाई देते हैं। क्रमबद्ध करें: प्रतिक्रिया समय (औसत) ↑.
संबंधित विफलता कारण
| रैंक | मॉडल | कंपनी | टूल कॉलिंग स्कोर | औसत स्कोर | सही परीक्षण | प्रतिक्रिया समय (औसत) |
|---|---|---|---|---|---|---|
| #55 | LFM2-24B-A2B none | Liquid | 10.0 | 2.6 | 0/1 | 0ms |
| #51 | Mercury 2 none | Inception | 10.0 | 3.4 | 1/1 | 1.27s |
| #36 | Mercury 2 medium | Inception | 10.0 | 5.3 | 1/1 | 1.89s |
| #38 | Gemini 2.5 Flash none | 10.0 | 5.2 | 1/1 | 1.91s | |
| #40 | Qwen3.5-122B-A10B none | Qwen | 10.0 | 5.0 | 1/1 | 2.04s |
| #54 | MiMo-V2-Flash none | Xiaomi | 10.0 | 2.9 | 1/1 | 2.28s |
| #42 | Qwen3.5-35B-A3B none | Qwen | 10.0 | 4.7 | 1/1 | 2.30s |
| #48 | Qwen3 Coder Next none | Qwen | 10.0 | 4.0 | 1/1 | 2.47s |
| #47 | GPT-4o-mini none | OpenAI | 10.0 | 4.0 | 1/1 | 2.51s |
| #50 | Qwen3 Coder Next medium | Qwen | 10.0 | 3.5 | 1/1 | 2.64s |
| #44 | GPT-5.4 none | OpenAI | 10.0 | 4.5 | 1/1 | 2.75s |
| #29 | Qwen3.5 Plus 2026-02-15 none | Qwen | 10.0 | 6.2 | 1/1 | 3.33s |
| #20 | Gemini 3 Flash Preview none | 10.0 | 7.2 | 1/1 | 3.35s | |
| #22 | Gemini 3.1 Flash Lite Preview none | 10.0 | 7.1 | 1/1 | 3.39s | |
| #41 | Qwen3.5-27B none | Qwen | 10.0 | 4.9 | 1/1 | 3.54s |
| #37 | Qwen3.5-Flash none | Qwen | 10.0 | 5.2 | 1/1 | 3.67s |
| #12 | Gemini 3.1 Flash Lite Preview medium | 10.0 | 7.5 | 1/1 | 3.80s | |
| #25 | Claude Sonnet 4.6 none | Anthropic | 10.0 | 6.8 | 1/1 | 4.11s |
| #10 | Qwen3.5-122B-A10B medium | Qwen | 10.0 | 7.7 | 1/1 | 4.60s |
| #35 | Qwen3.5-35B-A3B medium | Qwen | 10.0 | 5.5 | 1/1 | 4.65s |
| #15 | GPT-5.2 Chat none | OpenAI | 10.0 | 7.4 | 1/1 | 4.68s |
| #5 | Gemini 3 Flash Preview low | 10.0 | 8.2 | 1/1 | 4.99s | |
| #53 | Grok 4.1 Fast none | X AI | 10.0 | 2.9 | 0/1 | 5.51s |
| #16 | Gemini 2.5 Flash medium | 10.0 | 7.4 | 1/1 | 6.20s | |
| #3 | GPT-5.3-Codex medium | OpenAI | 10.0 | 8.4 | 1/1 | 6.37s |
| #45 | Trinity Large Preview none | Arcee AI | 10.0 | 4.2 | 1/1 | 6.67s |
| #39 | gpt-oss-120b medium | OpenAI | 9.0 | 5.1 | 1/1 | 6.91s |
| #49 | GLM 4.7 Flash none | Z.ai | 10.0 | 3.9 | 0/1 | 7.05s |
| #7 | Qwen3.5-27B medium | Qwen | 10.0 | 8.2 | 1/1 | 7.45s |
| #11 | Claude Sonnet 4.6 medium | Anthropic | 10.0 | 7.7 | 1/1 | 7.48s |
| #4 | Qwen3.5 Plus 2026-02-15 medium | Qwen | 10.0 | 8.3 | 1/1 | 7.54s |
| #8 | Gemini 3.1 Flash Lite Preview high | 10.0 | 8.2 | 1/1 | 7.73s | |
| #19 | GPT-5.3 Chat none | OpenAI | 10.0 | 7.3 | 1/1 | 8.36s |
| #17 | Gemini 3.1 Flash Lite Preview low | 10.0 | 7.3 | 1/1 | 9.54s | |
| #26 | Claude Opus 4.6 medium | Anthropic | 10.0 | 6.6 | 1/1 | 9.73s |
| #27 | GPT-5.2 medium | OpenAI | 10.0 | 6.5 | 0/1 | 10.3s |
| #24 | Qwen3.5-Flash medium | Qwen | 10.0 | 6.9 | 1/1 | 10.3s |
| #1 | Gemini 3 Flash Preview medium | 10.0 | 10.0 | 1/1 | 10.6s | |
| #31 | GLM 5 none | Z.ai | 10.0 | 6.0 | 1/1 | 11.1s |
| #33 | DeepSeek V3.2 none | DeepSeek | 10.0 | 5.5 | 1/1 | 11.8s |
| #13 | Step 3.5 Flash medium | Stepfun | 10.0 | 7.4 | 1/1 | 11.9s |
| #6 | Gemini 3 Pro Preview medium | 10.0 | 8.2 | 1/1 | 12.0s | |
| #9 | GPT-5.4 medium | OpenAI | 10.0 | 8.0 | 1/1 | 13.3s |
| #46 | Kimi K2.5 none | Moonshot AI | 10.0 | 4.1 | 1/1 | 14.0s |
| #43 | MiniMax M2.5 medium | Minimax | 10.0 | 4.7 | 1/1 | 15.4s |
| #14 | GLM 5 medium | Z.ai | 10.0 | 7.4 | 1/1 | 15.9s |
| #52 | GLM 4.7 Flash medium | Z.ai | 10.0 | 3.1 | 1/1 | 15.9s |
| #32 | GPT-5 Mini medium | OpenAI | 10.0 | 6.0 | 1/1 | 18.6s |
| #2 | Gemini 3.1 Pro Preview medium | 10.0 | 9.4 | 1/1 | 23.1s | |
| #30 | Grok 4.1 Fast medium | X AI | 10.0 | 6.2 | 0/1 | 27.7s |
| #21 | MiMo-V2-Flash medium | Xiaomi | 10.0 | 7.2 | 1/1 | 27.8s |
| #28 | Kimi K2.5 medium | Moonshot AI | 10.0 | 6.4 | 1/1 | 31.7s |
| #34 | GPT-5 Nano medium | OpenAI | 10.0 | 5.5 | 1/1 | 33.3s |
| #18 | DeepSeek V3.2 medium | DeepSeek | 10.0 | 7.3 | 1/1 | 34.8s |
| #23 | Seed-2.0-Mini medium | Bytedance Seed | 10.0 | 6.9 | 1/1 | 88.7s |