AI BENCHY زمرہ
مشترکہ درجہ بندی
دیکھیں کہ مشترکہ میں کون سے AI ماڈلز بہترین کارکردگی دکھاتے ہیں، کون سے قابلِ اعتماد رہتے ہیں، اور سب سے بڑے فرق کہاں نظر آتے ہیں۔ ترتیب دیں حسب: ردِعمل کا وقت (اوسط) ↑.
متعلقہ ناکامی کی وجوہات
| درجہ | ماڈل | کمپنی | مشترکہ اسکور | اوسط اسکور | درست ٹیسٹس | ردِعمل کا وقت (اوسط) |
|---|---|---|---|---|---|---|
| #55 | LFM2-24B-A2B none | Liquid | 10.0 | 2.6 | 0/1 | 0ms |
| #51 | Mercury 2 none | Inception | 10.0 | 3.4 | 0/1 | 606ms |
| #54 | MiMo-V2-Flash none | Xiaomi | 10.0 | 2.9 | 0/1 | 2.87s |
| #44 | GPT-5.4 none | OpenAI | 10.0 | 4.5 | 0/1 | 2.89s |
| #22 | Gemini 3.1 Flash Lite Preview none | 10.0 | 7.1 | 0/1 | 3.20s | |
| #49 | GLM 4.7 Flash none | Z.ai | 10.0 | 3.9 | 0/1 | 3.22s |
| #5 | Gemini 3 Flash Preview low | 10.0 | 8.2 | 0/1 | 3.27s | |
| #36 | Mercury 2 medium | Inception | 10.0 | 5.3 | 1/1 | 3.28s |
| #53 | Grok 4.1 Fast none | X AI | 10.0 | 2.9 | 0/1 | 3.33s |
| #20 | Gemini 3 Flash Preview none | 10.0 | 7.2 | 0/1 | 3.56s | |
| #50 | Qwen3 Coder Next medium | Qwen | 10.0 | 3.5 | 0/1 | 4.28s |
| #38 | Gemini 2.5 Flash none | 10.0 | 5.2 | 0/1 | 4.39s | |
| #31 | GLM 5 none | Z.ai | 10.0 | 6.0 | 0/1 | 4.98s |
| #37 | Qwen3.5-Flash none | Qwen | 10.0 | 5.2 | 0/1 | 6.22s |
| #29 | Qwen3.5 Plus 2026-02-15 none | Qwen | 10.0 | 6.2 | 0/1 | 6.65s |
| #47 | GPT-4o-mini none | OpenAI | 10.0 | 4.0 | 0/1 | 7.58s |
| #45 | Trinity Large Preview none | Arcee AI | 10.0 | 4.2 | 0/1 | 8.91s |
| #15 | GPT-5.2 Chat none | OpenAI | 10.0 | 7.4 | 1/1 | 9.12s |
| #41 | Qwen3.5-27B none | Qwen | 10.0 | 4.9 | 0/1 | 9.39s |
| #6 | Gemini 3 Pro Preview medium | 10.0 | 8.2 | 0/1 | 10.4s | |
| #17 | Gemini 3.1 Flash Lite Preview low | 10.0 | 7.3 | 0/1 | 11.9s | |
| #19 | GPT-5.3 Chat none | OpenAI | 10.0 | 7.3 | 1/1 | 12.0s |
| #27 | GPT-5.2 medium | OpenAI | 10.0 | 6.5 | 1/1 | 14.1s |
| #12 | Gemini 3.1 Flash Lite Preview medium | 10.0 | 7.5 | 1/1 | 14.9s | |
| #24 | Qwen3.5-Flash medium | Qwen | 10.0 | 6.9 | 1/1 | 17.8s |
| #46 | Kimi K2.5 none | Moonshot AI | 10.0 | 4.1 | 0/1 | 19.2s |
| #3 | GPT-5.3-Codex medium | OpenAI | 10.0 | 8.4 | 1/1 | 19.6s |
| #9 | GPT-5.4 medium | OpenAI | 10.0 | 8.0 | 1/1 | 20.6s |
| #25 | Claude Sonnet 4.6 none | Anthropic | 9.0 | 6.8 | 1/1 | 23.8s |
| #16 | Gemini 2.5 Flash medium | 10.0 | 7.4 | 1/1 | 28.4s | |
| #14 | GLM 5 medium | Z.ai | 10.0 | 7.4 | 1/1 | 29.0s |
| #13 | Step 3.5 Flash medium | Stepfun | 10.0 | 7.4 | 1/1 | 29.6s |
| #39 | gpt-oss-120b medium | OpenAI | 10.0 | 5.1 | 1/1 | 31.2s |
| #30 | Grok 4.1 Fast medium | X AI | 10.0 | 6.2 | 1/1 | 37.6s |
| #2 | Gemini 3.1 Pro Preview medium | 9.0 | 9.4 | 1/1 | 40.6s | |
| #48 | Qwen3 Coder Next none | Qwen | 10.0 | 4.0 | 0/1 | 45.1s |
| #40 | Qwen3.5-122B-A10B none | Qwen | 10.0 | 5.0 | 0/1 | 46.0s |
| #11 | Claude Sonnet 4.6 medium | Anthropic | 10.0 | 7.7 | 1/1 | 46.4s |
| #4 | Qwen3.5 Plus 2026-02-15 medium | Qwen | 10.0 | 8.3 | 1/1 | 46.8s |
| #42 | Qwen3.5-35B-A3B none | Qwen | 10.0 | 4.7 | 0/1 | 47.4s |
| #1 | Gemini 3 Flash Preview medium | 10.0 | 10.0 | 1/1 | 50.2s | |
| #43 | MiniMax M2.5 medium | Minimax | 10.0 | 4.7 | 0/1 | 60.4s |
| #52 | GLM 4.7 Flash medium | Z.ai | 10.0 | 3.1 | 0/1 | 65.6s |
| #34 | GPT-5 Nano medium | OpenAI | 10.0 | 5.5 | 1/1 | 66.0s |
| #28 | Kimi K2.5 medium | Moonshot AI | 10.0 | 6.4 | 1/1 | 71.4s |
| #35 | Qwen3.5-35B-A3B medium | Qwen | 10.0 | 5.5 | 0/1 | 75.3s |
| #21 | MiMo-V2-Flash medium | Xiaomi | 9.0 | 7.2 | 1/1 | 75.7s |
| #26 | Claude Opus 4.6 medium | Anthropic | 10.0 | 6.6 | 1/1 | 76.7s |
| #32 | GPT-5 Mini medium | OpenAI | 10.0 | 6.0 | 1/1 | 88.2s |
| #18 | DeepSeek V3.2 medium | DeepSeek | 10.0 | 7.3 | 1/1 | 93.1s |
| #10 | Qwen3.5-122B-A10B medium | Qwen | 10.0 | 7.7 | 1/1 | 107.8s |
| #33 | DeepSeek V3.2 none | DeepSeek | 8.0 | 5.5 | 0/1 | 115.9s |
| #7 | Qwen3.5-27B medium | Qwen | 10.0 | 8.2 | 1/1 | 164.0s |
| #23 | Seed-2.0-Mini medium | Bytedance Seed | 10.0 | 6.9 | 1/1 | 262.8s |
| #8 | Gemini 3.1 Flash Lite Preview high | 10.0 | 8.2 | 1/1 | 280.5s |