AI BENCHY زمرہ
اینٹی اے آئی چالیں درجہ بندی
دیکھیں کہ اینٹی اے آئی چالیں میں کون سے AI ماڈلز بہترین کارکردگی دکھاتے ہیں، کون سے قابلِ اعتماد رہتے ہیں، اور سب سے بڑے فرق کہاں نظر آتے ہیں۔ ترتیب دیں حسب: ردِعمل کا وقت (اوسط) ↑.
متعلقہ ناکامی کی وجوہات
| درجہ | ماڈل | کمپنی | اینٹی اے آئی چالیں اسکور | اوسط اسکور | درست ٹیسٹس | ردِعمل کا وقت (اوسط) |
|---|---|---|---|---|---|---|
| #51 | Mercury 2 none | Inception | 10.0 | 3.4 | 0/3 | 466ms |
| #55 | LFM2-24B-A2B none | Liquid | 10.0 | 2.6 | 0/3 | 471ms |
| #38 | Gemini 2.5 Flash none | 10.0 | 5.2 | 0/3 | 668ms | |
| #41 | Qwen3.5-27B none | Qwen | 4.0 | 4.9 | 1/3 | 796ms |
| #40 | Qwen3.5-122B-A10B none | Qwen | 4.0 | 5.0 | 1/3 | 927ms |
| #22 | Gemini 3.1 Flash Lite Preview none | 6.0 | 7.1 | 1/3 | 1.16s | |
| #36 | Mercury 2 medium | Inception | 7.3 | 5.3 | 2/3 | 1.30s |
| #54 | MiMo-V2-Flash none | Xiaomi | 10.0 | 2.9 | 0/3 | 1.36s |
| #44 | GPT-5.4 none | OpenAI | 10.0 | 4.5 | 0/3 | 1.41s |
| #20 | Gemini 3 Flash Preview none | 7.0 | 7.2 | 2/3 | 1.59s | |
| #37 | Qwen3.5-Flash none | Qwen | 2.3 | 5.2 | 0/3 | 1.62s |
| #53 | Grok 4.1 Fast none | X AI | 1.3 | 2.9 | 0/3 | 1.73s |
| #42 | Qwen3.5-35B-A3B none | Qwen | 10.0 | 4.7 | 0/3 | 1.76s |
| #47 | GPT-4o-mini none | OpenAI | 4.0 | 4.0 | 1/3 | 1.83s |
| #17 | Gemini 3.1 Flash Lite Preview low | 7.0 | 7.3 | 2/3 | 2.18s | |
| #12 | Gemini 3.1 Flash Lite Preview medium | 9.0 | 7.5 | 2/3 | 2.53s | |
| #29 | Qwen3.5 Plus 2026-02-15 none | Qwen | 4.0 | 6.2 | 1/3 | 2.74s |
| #31 | GLM 5 none | Z.ai | 4.0 | 6.0 | 1/3 | 3.39s |
| #5 | Gemini 3 Flash Preview low | 10.0 | 8.2 | 3/3 | 3.50s | |
| #45 | Trinity Large Preview none | Arcee AI | 10.0 | 4.2 | 0/3 | 3.59s |
| #6 | Gemini 3 Pro Preview medium | 10.0 | 8.2 | 3/3 | 3.75s | |
| #15 | GPT-5.2 Chat none | OpenAI | 10.0 | 7.4 | 3/3 | 3.97s |
| #48 | Qwen3 Coder Next none | Qwen | 2.3 | 4.0 | 0/3 | 4.39s |
| #3 | GPT-5.3-Codex medium | OpenAI | 10.0 | 8.4 | 3/3 | 4.69s |
| #19 | GPT-5.3 Chat none | OpenAI | 7.3 | 7.3 | 2/3 | 4.72s |
| #25 | Claude Sonnet 4.6 none | Anthropic | 4.0 | 6.8 | 1/3 | 4.83s |
| #11 | Claude Sonnet 4.6 medium | Anthropic | 7.0 | 7.7 | 2/3 | 4.95s |
| #9 | GPT-5.4 medium | OpenAI | 10.0 | 8.0 | 3/3 | 5.02s |
| #1 | Gemini 3 Flash Preview medium | 10.0 | 10.0 | 3/3 | 5.61s | |
| #30 | Grok 4.1 Fast medium | X AI | 10.0 | 6.2 | 3/3 | 5.65s |
| #49 | GLM 4.7 Flash none | Z.ai | 10.0 | 3.9 | 0/3 | 6.59s |
| #16 | Gemini 2.5 Flash medium | 7.3 | 7.4 | 2/3 | 6.98s | |
| #10 | Qwen3.5-122B-A10B medium | Qwen | 10.0 | 7.7 | 3/3 | 6.99s |
| #33 | DeepSeek V3.2 none | DeepSeek | 10.0 | 5.5 | 0/3 | 8.79s |
| #2 | Gemini 3.1 Pro Preview medium | 10.0 | 9.4 | 3/3 | 9.52s | |
| #7 | Qwen3.5-27B medium | Qwen | 10.0 | 8.2 | 3/3 | 9.69s |
| #4 | Qwen3.5 Plus 2026-02-15 medium | Qwen | 10.0 | 8.3 | 3/3 | 10.4s |
| #46 | Kimi K2.5 none | Moonshot AI | 2.7 | 4.1 | 0/3 | 11.4s |
| #26 | Claude Opus 4.6 medium | Anthropic | 4.0 | 6.6 | 1/3 | 11.9s |
| #27 | GPT-5.2 medium | OpenAI | 7.0 | 6.5 | 2/3 | 14.3s |
| #50 | Qwen3 Coder Next medium | Qwen | 1.3 | 3.5 | 0/3 | 15.3s |
| #32 | GPT-5 Mini medium | OpenAI | 7.0 | 6.0 | 2/3 | 16.5s |
| #21 | MiMo-V2-Flash medium | Xiaomi | 9.7 | 7.2 | 3/3 | 16.8s |
| #13 | Step 3.5 Flash medium | Stepfun | 10.0 | 7.4 | 3/3 | 18.5s |
| #39 | gpt-oss-120b medium | OpenAI | 7.0 | 5.1 | 2/3 | 19.8s |
| #35 | Qwen3.5-35B-A3B medium | Qwen | 10.0 | 5.5 | 3/3 | 21.8s |
| #14 | GLM 5 medium | Z.ai | 10.0 | 7.4 | 3/3 | 22.3s |
| #52 | GLM 4.7 Flash medium | Z.ai | 4.0 | 3.1 | 1/3 | 27.1s |
| #43 | MiniMax M2.5 medium | Minimax | 9.3 | 4.7 | 2/3 | 32.4s |
| #18 | DeepSeek V3.2 medium | DeepSeek | 7.0 | 7.3 | 2/3 | 33.4s |
| #34 | GPT-5 Nano medium | OpenAI | 7.0 | 5.5 | 2/3 | 37.7s |
| #8 | Gemini 3.1 Flash Lite Preview high | 10.0 | 8.2 | 3/3 | 43.9s | |
| #24 | Qwen3.5-Flash medium | Qwen | 10.0 | 6.9 | 3/3 | 71.4s |
| #28 | Kimi K2.5 medium | Moonshot AI | 7.0 | 6.4 | 2/3 | 85.3s |
| #23 | Seed-2.0-Mini medium | Bytedance Seed | 7.0 | 6.9 | 2/3 | 99.0s |