AI BENCHY زمرہ
پہیلی حل کرنا درجہ بندی
دیکھیں کہ پہیلی حل کرنا میں کون سے AI ماڈلز بہترین کارکردگی دکھاتے ہیں، کون سے قابلِ اعتماد رہتے ہیں، اور سب سے بڑے فرق کہاں نظر آتے ہیں۔ ترتیب دیں حسب: ردِعمل کا وقت (اوسط) ↑.
متعلقہ ناکامی کی وجوہات
| درجہ | ماڈل | کمپنی | پہیلی حل کرنا اسکور | اوسط اسکور | درست ٹیسٹس | ردِعمل کا وقت (اوسط) |
|---|---|---|---|---|---|---|
| #51 | Mercury 2 none | Inception | 10.0 | 3.4 | 0/3 | 533ms |
| #38 | Gemini 2.5 Flash none | 4.7 | 5.2 | 1/3 | 576ms | |
| #36 | Mercury 2 medium | Inception | 1.7 | 5.3 | 0/3 | 934ms |
| #22 | Gemini 3.1 Flash Lite Preview none | 10.0 | 7.1 | 3/3 | 972ms | |
| #40 | Qwen3.5-122B-A10B none | Qwen | 4.0 | 5.0 | 1/3 | 982ms |
| #49 | GLM 4.7 Flash none | Z.ai | 3.7 | 3.9 | 0/3 | 1.00s |
| #20 | Gemini 3 Flash Preview none | 7.0 | 7.2 | 2/3 | 1.06s | |
| #53 | Grok 4.1 Fast none | X AI | 1.3 | 2.9 | 0/3 | 1.28s |
| #47 | GPT-4o-mini none | OpenAI | 2.3 | 4.0 | 0/3 | 1.30s |
| #42 | Qwen3.5-35B-A3B none | Qwen | 1.7 | 4.7 | 0/3 | 1.34s |
| #41 | Qwen3.5-27B none | Qwen | 6.3 | 4.9 | 1/3 | 1.37s |
| #54 | MiMo-V2-Flash none | Xiaomi | 10.0 | 2.9 | 0/3 | 1.38s |
| #44 | GPT-5.4 none | OpenAI | 4.0 | 4.5 | 1/3 | 1.52s |
| #55 | LFM2-24B-A2B none | Liquid | 3.3 | 2.6 | 0/3 | 1.69s |
| #31 | GLM 5 none | Z.ai | 7.0 | 6.0 | 2/3 | 2.05s |
| #50 | Qwen3 Coder Next medium | Qwen | 10.0 | 3.5 | 0/3 | 2.30s |
| #17 | Gemini 3.1 Flash Lite Preview low | 10.0 | 7.3 | 3/3 | 2.76s | |
| #29 | Qwen3.5 Plus 2026-02-15 none | Qwen | 7.0 | 6.2 | 2/3 | 2.82s |
| #25 | Claude Sonnet 4.6 none | Anthropic | 7.0 | 6.8 | 2/3 | 2.92s |
| #19 | GPT-5.3 Chat none | OpenAI | 10.0 | 7.3 | 3/3 | 2.93s |
| #45 | Trinity Large Preview none | Arcee AI | 4.0 | 4.2 | 1/3 | 3.30s |
| #12 | Gemini 3.1 Flash Lite Preview medium | 7.0 | 7.5 | 2/3 | 3.58s | |
| #21 | MiMo-V2-Flash medium | Xiaomi | 7.0 | 7.2 | 2/3 | 3.77s |
| #6 | Gemini 3 Pro Preview medium | 10.0 | 8.2 | 3/3 | 3.91s | |
| #16 | Gemini 2.5 Flash medium | 7.0 | 7.4 | 2/3 | 3.94s | |
| #15 | GPT-5.2 Chat none | OpenAI | 7.0 | 7.4 | 2/3 | 4.42s |
| #1 | Gemini 3 Flash Preview medium | 10.0 | 10.0 | 3/3 | 4.43s | |
| #26 | Claude Opus 4.6 medium | Anthropic | 7.0 | 6.6 | 2/3 | 4.60s |
| #46 | Kimi K2.5 none | Moonshot AI | 10.0 | 4.1 | 0/3 | 4.73s |
| #11 | Claude Sonnet 4.6 medium | Anthropic | 10.0 | 7.7 | 3/3 | 4.80s |
| #3 | GPT-5.3-Codex medium | OpenAI | 9.3 | 8.4 | 2/3 | 5.12s |
| #27 | GPT-5.2 medium | OpenAI | 7.0 | 6.5 | 2/3 | 5.47s |
| #37 | Qwen3.5-Flash none | Qwen | 1.3 | 5.2 | 0/3 | 5.90s |
| #5 | Gemini 3 Flash Preview low | 10.0 | 8.2 | 3/3 | 6.11s | |
| #2 | Gemini 3.1 Pro Preview medium | 10.0 | 9.4 | 3/3 | 7.15s | |
| #33 | DeepSeek V3.2 none | DeepSeek | 7.7 | 5.5 | 2/3 | 7.37s |
| #13 | Step 3.5 Flash medium | Stepfun | 4.0 | 7.4 | 1/3 | 7.72s |
| #30 | Grok 4.1 Fast medium | X AI | 4.0 | 6.2 | 1/3 | 8.08s |
| #9 | GPT-5.4 medium | OpenAI | 7.0 | 8.0 | 2/3 | 9.13s |
| #43 | MiniMax M2.5 medium | Minimax | 4.0 | 4.7 | 1/3 | 11.5s |
| #39 | gpt-oss-120b medium | OpenAI | 1.7 | 5.1 | 0/3 | 11.8s |
| #52 | GLM 4.7 Flash medium | Z.ai | 10.0 | 3.1 | 0/3 | 12.9s |
| #32 | GPT-5 Mini medium | OpenAI | 4.3 | 6.0 | 1/3 | 14.1s |
| #14 | GLM 5 medium | Z.ai | 10.0 | 7.4 | 3/3 | 15.6s |
| #10 | Qwen3.5-122B-A10B medium | Qwen | 10.0 | 7.7 | 3/3 | 17.2s |
| #34 | GPT-5 Nano medium | OpenAI | 4.0 | 5.5 | 1/3 | 19.8s |
| #48 | Qwen3 Coder Next none | Qwen | 1.3 | 4.0 | 0/3 | 22.9s |
| #23 | Seed-2.0-Mini medium | Bytedance Seed | 7.0 | 6.9 | 2/3 | 25.9s |
| #35 | Qwen3.5-35B-A3B medium | Qwen | 4.0 | 5.5 | 1/3 | 31.6s |
| #4 | Qwen3.5 Plus 2026-02-15 medium | Qwen | 10.0 | 8.3 | 3/3 | 34.6s |
| #18 | DeepSeek V3.2 medium | DeepSeek | 7.0 | 7.3 | 2/3 | 36.9s |
| #28 | Kimi K2.5 medium | Moonshot AI | 4.0 | 6.4 | 1/3 | 45.4s |
| #8 | Gemini 3.1 Flash Lite Preview high | 7.0 | 8.2 | 2/3 | 46.3s | |
| #24 | Qwen3.5-Flash medium | Qwen | 4.0 | 6.9 | 1/3 | 56.7s |
| #7 | Qwen3.5-27B medium | Qwen | 8.3 | 8.2 | 2/3 | 64.6s |