AI BENCHY زمرہ
عمومی ذہانت درجہ بندی
دیکھیں کہ عمومی ذہانت میں کون سے AI ماڈلز بہترین کارکردگی دکھاتے ہیں، کون سے قابلِ اعتماد رہتے ہیں، اور سب سے بڑے فرق کہاں نظر آتے ہیں۔ ترتیب دیں حسب: ردِعمل کا وقت (اوسط) ↑.
متعلقہ ناکامی کی وجوہات
| درجہ | ماڈل | کمپنی | عمومی ذہانت اسکور | اوسط اسکور | درست ٹیسٹس | ردِعمل کا وقت (اوسط) |
|---|---|---|---|---|---|---|
| #55 | LFM2-24B-A2B none | Liquid | 3.0 | 2.6 | 0/1 | 395ms |
| #38 | Gemini 2.5 Flash none | 5.0 | 5.2 | 0/1 | 615ms | |
| #51 | Mercury 2 none | Inception | 4.0 | 3.4 | 0/1 | 628ms |
| #22 | Gemini 3.1 Flash Lite Preview none | 3.0 | 7.1 | 0/1 | 741ms | |
| #37 | Qwen3.5-Flash none | Qwen | 10.0 | 5.2 | 1/1 | 803ms |
| #36 | Mercury 2 medium | Inception | 4.0 | 5.3 | 0/1 | 821ms |
| #47 | GPT-4o-mini none | OpenAI | 3.0 | 4.0 | 0/1 | 909ms |
| #53 | Grok 4.1 Fast none | X AI | 3.0 | 2.9 | 0/1 | 1.08s |
| #40 | Qwen3.5-122B-A10B none | Qwen | 5.0 | 5.0 | 0/1 | 1.12s |
| #20 | Gemini 3 Flash Preview none | 10.0 | 7.2 | 1/1 | 1.13s | |
| #42 | Qwen3.5-35B-A3B none | Qwen | 6.0 | 4.7 | 0/1 | 1.19s |
| #48 | Qwen3 Coder Next none | Qwen | 10.0 | 4.0 | 1/1 | 1.34s |
| #50 | Qwen3 Coder Next medium | Qwen | 6.0 | 3.5 | 0/1 | 1.39s |
| #17 | Gemini 3.1 Flash Lite Preview low | 3.0 | 7.3 | 0/1 | 1.54s | |
| #49 | GLM 4.7 Flash none | Z.ai | 3.0 | 3.9 | 0/1 | 1.59s |
| #54 | MiMo-V2-Flash none | Xiaomi | 4.0 | 2.9 | 0/1 | 1.67s |
| #44 | GPT-5.4 none | OpenAI | 3.0 | 4.5 | 0/1 | 1.78s |
| #19 | GPT-5.3 Chat none | OpenAI | 4.0 | 7.3 | 0/1 | 1.99s |
| #29 | Qwen3.5 Plus 2026-02-15 none | Qwen | 4.0 | 6.2 | 0/1 | 2.26s |
| #41 | Qwen3.5-27B none | Qwen | 5.0 | 4.9 | 0/1 | 2.51s |
| #25 | Claude Sonnet 4.6 none | Anthropic | 5.0 | 6.8 | 0/1 | 2.56s |
| #45 | Trinity Large Preview none | Arcee AI | 3.0 | 4.2 | 0/1 | 2.86s |
| #33 | DeepSeek V3.2 none | DeepSeek | 10.0 | 5.5 | 1/1 | 2.86s |
| #12 | Gemini 3.1 Flash Lite Preview medium | 10.0 | 7.5 | 1/1 | 3.16s | |
| #15 | GPT-5.2 Chat none | OpenAI | 4.0 | 7.4 | 0/1 | 3.20s |
| #31 | GLM 5 none | Z.ai | 10.0 | 6.0 | 1/1 | 3.27s |
| #5 | Gemini 3 Flash Preview low | 10.0 | 8.2 | 1/1 | 3.68s | |
| #46 | Kimi K2.5 none | Moonshot AI | 10.0 | 4.1 | 1/1 | 4.00s |
| #1 | Gemini 3 Flash Preview medium | 10.0 | 10.0 | 1/1 | 4.09s | |
| #21 | MiMo-V2-Flash medium | Xiaomi | 3.0 | 7.2 | 0/1 | 4.20s |
| #27 | GPT-5.2 medium | OpenAI | 10.0 | 6.5 | 0/1 | 4.32s |
| #16 | Gemini 2.5 Flash medium | 4.0 | 7.4 | 0/1 | 4.86s | |
| #3 | GPT-5.3-Codex medium | OpenAI | 4.0 | 8.4 | 0/1 | 4.87s |
| #9 | GPT-5.4 medium | OpenAI | 5.0 | 8.0 | 0/1 | 4.92s |
| #11 | Claude Sonnet 4.6 medium | Anthropic | 10.0 | 7.7 | 1/1 | 4.94s |
| #26 | Claude Opus 4.6 medium | Anthropic | 10.0 | 6.6 | 1/1 | 5.04s |
| #8 | Gemini 3.1 Flash Lite Preview high | 10.0 | 8.2 | 1/1 | 5.25s | |
| #13 | Step 3.5 Flash medium | Stepfun | 6.0 | 7.4 | 0/1 | 6.54s |
| #43 | MiniMax M2.5 medium | Minimax | 3.0 | 4.7 | 0/1 | 6.63s |
| #39 | gpt-oss-120b medium | OpenAI | 3.0 | 5.1 | 0/1 | 7.90s |
| #6 | Gemini 3 Pro Preview medium | 10.0 | 8.2 | 1/1 | 9.34s | |
| #2 | Gemini 3.1 Pro Preview medium | 10.0 | 9.4 | 1/1 | 11.8s | |
| #32 | GPT-5 Mini medium | OpenAI | 4.0 | 6.0 | 0/1 | 13.5s |
| #14 | GLM 5 medium | Z.ai | 5.0 | 7.4 | 0/1 | 14.7s |
| #30 | Grok 4.1 Fast medium | X AI | 3.0 | 6.2 | 0/1 | 16.2s |
| #34 | GPT-5 Nano medium | OpenAI | 3.0 | 5.5 | 0/1 | 17.5s |
| #52 | GLM 4.7 Flash medium | Z.ai | 10.0 | 3.1 | 0/1 | 18.1s |
| #35 | Qwen3.5-35B-A3B medium | Qwen | 10.0 | 5.5 | 0/1 | 30.3s |
| #18 | DeepSeek V3.2 medium | DeepSeek | 3.0 | 7.3 | 0/1 | 31.3s |
| #10 | Qwen3.5-122B-A10B medium | Qwen | 10.0 | 7.7 | 0/1 | 34.1s |
| #23 | Seed-2.0-Mini medium | Bytedance Seed | 6.0 | 6.9 | 0/1 | 36.7s |
| #24 | Qwen3.5-Flash medium | Qwen | 5.0 | 6.9 | 0/1 | 40.1s |
| #28 | Kimi K2.5 medium | Moonshot AI | 6.0 | 6.4 | 0/1 | 69.7s |
| #4 | Qwen3.5 Plus 2026-02-15 medium | Qwen | 10.0 | 8.3 | 0/1 | 79.9s |
| #7 | Qwen3.5-27B medium | Qwen | 5.0 | 8.2 | 0/1 | 101.4s |