AI BENCHY زمرہ
پہیلی حل کرنا درجہ بندی
دیکھیں کہ پہیلی حل کرنا میں کون سے AI ماڈلز بہترین کارکردگی دکھاتے ہیں، کون سے قابلِ اعتماد رہتے ہیں، اور سب سے بڑے فرق کہاں نظر آتے ہیں۔ ترتیب دیں حسب: ردِعمل کا وقت (اوسط) ↓.
متعلقہ ناکامی کی وجوہات
| درجہ | ماڈل | کمپنی | پہیلی حل کرنا اسکور | اوسط اسکور | درست ٹیسٹس | ردِعمل کا وقت (اوسط) |
|---|---|---|---|---|---|---|
| #7 | Qwen3.5-27B medium | Qwen | 8.3 | 8.2 | 2/3 | 64.6s |
| #24 | Qwen3.5-Flash medium | Qwen | 4.0 | 6.9 | 1/3 | 56.7s |
| #8 | Gemini 3.1 Flash Lite Preview high | 7.0 | 8.2 | 2/3 | 46.3s | |
| #28 | Kimi K2.5 medium | Moonshot AI | 4.0 | 6.4 | 1/3 | 45.4s |
| #18 | DeepSeek V3.2 medium | DeepSeek | 7.0 | 7.3 | 2/3 | 36.9s |
| #4 | Qwen3.5 Plus 2026-02-15 medium | Qwen | 10.0 | 8.3 | 3/3 | 34.6s |
| #35 | Qwen3.5-35B-A3B medium | Qwen | 4.0 | 5.5 | 1/3 | 31.6s |
| #23 | Seed-2.0-Mini medium | Bytedance Seed | 7.0 | 6.9 | 2/3 | 25.9s |
| #48 | Qwen3 Coder Next none | Qwen | 1.3 | 4.0 | 0/3 | 22.9s |
| #34 | GPT-5 Nano medium | OpenAI | 4.0 | 5.5 | 1/3 | 19.8s |
| #10 | Qwen3.5-122B-A10B medium | Qwen | 10.0 | 7.7 | 3/3 | 17.2s |
| #14 | GLM 5 medium | Z.ai | 10.0 | 7.4 | 3/3 | 15.6s |
| #32 | GPT-5 Mini medium | OpenAI | 4.3 | 6.0 | 1/3 | 14.1s |
| #52 | GLM 4.7 Flash medium | Z.ai | 10.0 | 3.1 | 0/3 | 12.9s |
| #39 | gpt-oss-120b medium | OpenAI | 1.7 | 5.1 | 0/3 | 11.8s |
| #43 | MiniMax M2.5 medium | Minimax | 4.0 | 4.7 | 1/3 | 11.5s |
| #9 | GPT-5.4 medium | OpenAI | 7.0 | 8.0 | 2/3 | 9.13s |
| #30 | Grok 4.1 Fast medium | X AI | 4.0 | 6.2 | 1/3 | 8.08s |
| #13 | Step 3.5 Flash medium | Stepfun | 4.0 | 7.4 | 1/3 | 7.72s |
| #33 | DeepSeek V3.2 none | DeepSeek | 7.7 | 5.5 | 2/3 | 7.37s |
| #2 | Gemini 3.1 Pro Preview medium | 10.0 | 9.4 | 3/3 | 7.15s | |
| #5 | Gemini 3 Flash Preview low | 10.0 | 8.2 | 3/3 | 6.11s | |
| #37 | Qwen3.5-Flash none | Qwen | 1.3 | 5.2 | 0/3 | 5.90s |
| #27 | GPT-5.2 medium | OpenAI | 7.0 | 6.5 | 2/3 | 5.47s |
| #3 | GPT-5.3-Codex medium | OpenAI | 9.3 | 8.4 | 2/3 | 5.12s |
| #11 | Claude Sonnet 4.6 medium | Anthropic | 10.0 | 7.7 | 3/3 | 4.80s |
| #46 | Kimi K2.5 none | Moonshot AI | 10.0 | 4.1 | 0/3 | 4.73s |
| #26 | Claude Opus 4.6 medium | Anthropic | 7.0 | 6.6 | 2/3 | 4.60s |
| #1 | Gemini 3 Flash Preview medium | 10.0 | 10.0 | 3/3 | 4.43s | |
| #15 | GPT-5.2 Chat none | OpenAI | 7.0 | 7.4 | 2/3 | 4.42s |
| #16 | Gemini 2.5 Flash medium | 7.0 | 7.4 | 2/3 | 3.94s | |
| #6 | Gemini 3 Pro Preview medium | 10.0 | 8.2 | 3/3 | 3.91s | |
| #21 | MiMo-V2-Flash medium | Xiaomi | 7.0 | 7.2 | 2/3 | 3.77s |
| #12 | Gemini 3.1 Flash Lite Preview medium | 7.0 | 7.5 | 2/3 | 3.58s | |
| #45 | Trinity Large Preview none | Arcee AI | 4.0 | 4.2 | 1/3 | 3.30s |
| #19 | GPT-5.3 Chat none | OpenAI | 10.0 | 7.3 | 3/3 | 2.93s |
| #25 | Claude Sonnet 4.6 none | Anthropic | 7.0 | 6.8 | 2/3 | 2.92s |
| #29 | Qwen3.5 Plus 2026-02-15 none | Qwen | 7.0 | 6.2 | 2/3 | 2.82s |
| #17 | Gemini 3.1 Flash Lite Preview low | 10.0 | 7.3 | 3/3 | 2.76s | |
| #50 | Qwen3 Coder Next medium | Qwen | 10.0 | 3.5 | 0/3 | 2.30s |
| #31 | GLM 5 none | Z.ai | 7.0 | 6.0 | 2/3 | 2.05s |
| #55 | LFM2-24B-A2B none | Liquid | 3.3 | 2.6 | 0/3 | 1.69s |
| #44 | GPT-5.4 none | OpenAI | 4.0 | 4.5 | 1/3 | 1.52s |
| #54 | MiMo-V2-Flash none | Xiaomi | 10.0 | 2.9 | 0/3 | 1.38s |
| #41 | Qwen3.5-27B none | Qwen | 6.3 | 4.9 | 1/3 | 1.37s |
| #42 | Qwen3.5-35B-A3B none | Qwen | 1.7 | 4.7 | 0/3 | 1.34s |
| #47 | GPT-4o-mini none | OpenAI | 2.3 | 4.0 | 0/3 | 1.30s |
| #53 | Grok 4.1 Fast none | X AI | 1.3 | 2.9 | 0/3 | 1.28s |
| #20 | Gemini 3 Flash Preview none | 7.0 | 7.2 | 2/3 | 1.06s | |
| #49 | GLM 4.7 Flash none | Z.ai | 3.7 | 3.9 | 0/3 | 1.00s |
| #40 | Qwen3.5-122B-A10B none | Qwen | 4.0 | 5.0 | 1/3 | 982ms |
| #22 | Gemini 3.1 Flash Lite Preview none | 10.0 | 7.1 | 3/3 | 972ms | |
| #36 | Mercury 2 medium | Inception | 1.7 | 5.3 | 0/3 | 934ms |
| #38 | Gemini 2.5 Flash none | 4.7 | 5.2 | 1/3 | 576ms | |
| #51 | Mercury 2 none | Inception | 10.0 | 3.4 | 0/3 | 533ms |