AI BENCHY زمرہ
ٹول کالنگ درجہ بندی
دیکھیں کہ ٹول کالنگ میں کون سے AI ماڈلز بہترین کارکردگی دکھاتے ہیں، کون سے قابلِ اعتماد رہتے ہیں، اور سب سے بڑے فرق کہاں نظر آتے ہیں۔ ترتیب دیں حسب: ردِعمل کا وقت (اوسط) ↓.
متعلقہ ناکامی کی وجوہات
| درجہ | ماڈل | کمپنی | ٹول کالنگ اسکور | اوسط اسکور | درست ٹیسٹس | ردِعمل کا وقت (اوسط) |
|---|---|---|---|---|---|---|
| #23 | Seed-2.0-Mini medium | Bytedance Seed | 10.0 | 6.9 | 1/1 | 88.7s |
| #18 | DeepSeek V3.2 medium | DeepSeek | 10.0 | 7.3 | 1/1 | 34.8s |
| #34 | GPT-5 Nano medium | OpenAI | 10.0 | 5.5 | 1/1 | 33.3s |
| #28 | Kimi K2.5 medium | Moonshot AI | 10.0 | 6.4 | 1/1 | 31.7s |
| #21 | MiMo-V2-Flash medium | Xiaomi | 10.0 | 7.2 | 1/1 | 27.8s |
| #30 | Grok 4.1 Fast medium | X AI | 10.0 | 6.2 | 0/1 | 27.7s |
| #2 | Gemini 3.1 Pro Preview medium | 10.0 | 9.4 | 1/1 | 23.1s | |
| #32 | GPT-5 Mini medium | OpenAI | 10.0 | 6.0 | 1/1 | 18.6s |
| #52 | GLM 4.7 Flash medium | Z.ai | 10.0 | 3.1 | 1/1 | 15.9s |
| #14 | GLM 5 medium | Z.ai | 10.0 | 7.4 | 1/1 | 15.9s |
| #43 | MiniMax M2.5 medium | Minimax | 10.0 | 4.7 | 1/1 | 15.4s |
| #46 | Kimi K2.5 none | Moonshot AI | 10.0 | 4.1 | 1/1 | 14.0s |
| #9 | GPT-5.4 medium | OpenAI | 10.0 | 8.0 | 1/1 | 13.3s |
| #6 | Gemini 3 Pro Preview medium | 10.0 | 8.2 | 1/1 | 12.0s | |
| #13 | Step 3.5 Flash medium | Stepfun | 10.0 | 7.4 | 1/1 | 11.9s |
| #33 | DeepSeek V3.2 none | DeepSeek | 10.0 | 5.5 | 1/1 | 11.8s |
| #31 | GLM 5 none | Z.ai | 10.0 | 6.0 | 1/1 | 11.1s |
| #1 | Gemini 3 Flash Preview medium | 10.0 | 10.0 | 1/1 | 10.6s | |
| #24 | Qwen3.5-Flash medium | Qwen | 10.0 | 6.9 | 1/1 | 10.3s |
| #27 | GPT-5.2 medium | OpenAI | 10.0 | 6.5 | 0/1 | 10.3s |
| #26 | Claude Opus 4.6 medium | Anthropic | 10.0 | 6.6 | 1/1 | 9.73s |
| #17 | Gemini 3.1 Flash Lite Preview low | 10.0 | 7.3 | 1/1 | 9.54s | |
| #19 | GPT-5.3 Chat none | OpenAI | 10.0 | 7.3 | 1/1 | 8.36s |
| #8 | Gemini 3.1 Flash Lite Preview high | 10.0 | 8.2 | 1/1 | 7.73s | |
| #4 | Qwen3.5 Plus 2026-02-15 medium | Qwen | 10.0 | 8.3 | 1/1 | 7.54s |
| #11 | Claude Sonnet 4.6 medium | Anthropic | 10.0 | 7.7 | 1/1 | 7.48s |
| #7 | Qwen3.5-27B medium | Qwen | 10.0 | 8.2 | 1/1 | 7.45s |
| #49 | GLM 4.7 Flash none | Z.ai | 10.0 | 3.9 | 0/1 | 7.05s |
| #39 | gpt-oss-120b medium | OpenAI | 9.0 | 5.1 | 1/1 | 6.91s |
| #45 | Trinity Large Preview none | Arcee AI | 10.0 | 4.2 | 1/1 | 6.67s |
| #3 | GPT-5.3-Codex medium | OpenAI | 10.0 | 8.4 | 1/1 | 6.37s |
| #16 | Gemini 2.5 Flash medium | 10.0 | 7.4 | 1/1 | 6.20s | |
| #53 | Grok 4.1 Fast none | X AI | 10.0 | 2.9 | 0/1 | 5.51s |
| #5 | Gemini 3 Flash Preview low | 10.0 | 8.2 | 1/1 | 4.99s | |
| #15 | GPT-5.2 Chat none | OpenAI | 10.0 | 7.4 | 1/1 | 4.68s |
| #35 | Qwen3.5-35B-A3B medium | Qwen | 10.0 | 5.5 | 1/1 | 4.65s |
| #10 | Qwen3.5-122B-A10B medium | Qwen | 10.0 | 7.7 | 1/1 | 4.60s |
| #25 | Claude Sonnet 4.6 none | Anthropic | 10.0 | 6.8 | 1/1 | 4.11s |
| #12 | Gemini 3.1 Flash Lite Preview medium | 10.0 | 7.5 | 1/1 | 3.80s | |
| #37 | Qwen3.5-Flash none | Qwen | 10.0 | 5.2 | 1/1 | 3.67s |
| #41 | Qwen3.5-27B none | Qwen | 10.0 | 4.9 | 1/1 | 3.54s |
| #22 | Gemini 3.1 Flash Lite Preview none | 10.0 | 7.1 | 1/1 | 3.39s | |
| #20 | Gemini 3 Flash Preview none | 10.0 | 7.2 | 1/1 | 3.35s | |
| #29 | Qwen3.5 Plus 2026-02-15 none | Qwen | 10.0 | 6.2 | 1/1 | 3.33s |
| #44 | GPT-5.4 none | OpenAI | 10.0 | 4.5 | 1/1 | 2.75s |
| #50 | Qwen3 Coder Next medium | Qwen | 10.0 | 3.5 | 1/1 | 2.64s |
| #47 | GPT-4o-mini none | OpenAI | 10.0 | 4.0 | 1/1 | 2.51s |
| #48 | Qwen3 Coder Next none | Qwen | 10.0 | 4.0 | 1/1 | 2.47s |
| #42 | Qwen3.5-35B-A3B none | Qwen | 10.0 | 4.7 | 1/1 | 2.30s |
| #54 | MiMo-V2-Flash none | Xiaomi | 10.0 | 2.9 | 1/1 | 2.28s |
| #40 | Qwen3.5-122B-A10B none | Qwen | 10.0 | 5.0 | 1/1 | 2.04s |
| #38 | Gemini 2.5 Flash none | 10.0 | 5.2 | 1/1 | 1.91s | |
| #36 | Mercury 2 medium | Inception | 10.0 | 5.3 | 1/1 | 1.89s |
| #51 | Mercury 2 none | Inception | 10.0 | 3.4 | 1/1 | 1.27s |
| #55 | LFM2-24B-A2B none | Liquid | 10.0 | 2.6 | 0/1 | 0ms |