Kategoria ya AI BENCHY
Orodha ya Akili ya jumla
Ona ni modeli gani za AI zinafanya vizuri zaidi katika Akili ya jumla, zipi zinabaki thabiti, na pengo kubwa liko wapi. Panga kwa: Muda wa majibu (wastani) ↓.
Sababu zinazohusiana za kushindwa
| Nafasi | Modeli | Kampuni | Alama ya Akili ya jumla | Wastani wa alama | Majaribio sahihi | Muda wa majibu (wastani) |
|---|---|---|---|---|---|---|
| #7 | Qwen3.5-27B medium | Qwen | 5.0 | 8.2 | 0/1 | 101.4s |
| #4 | Qwen3.5 Plus 2026-02-15 medium | Qwen | 10.0 | 8.3 | 0/1 | 79.9s |
| #28 | Kimi K2.5 medium | Moonshot AI | 6.0 | 6.4 | 0/1 | 69.7s |
| #24 | Qwen3.5-Flash medium | Qwen | 5.0 | 6.9 | 0/1 | 40.1s |
| #23 | Seed-2.0-Mini medium | Bytedance Seed | 6.0 | 6.9 | 0/1 | 36.7s |
| #10 | Qwen3.5-122B-A10B medium | Qwen | 10.0 | 7.7 | 0/1 | 34.1s |
| #18 | DeepSeek V3.2 medium | DeepSeek | 3.0 | 7.3 | 0/1 | 31.3s |
| #35 | Qwen3.5-35B-A3B medium | Qwen | 10.0 | 5.5 | 0/1 | 30.3s |
| #52 | GLM 4.7 Flash medium | Z.ai | 10.0 | 3.1 | 0/1 | 18.1s |
| #34 | GPT-5 Nano medium | OpenAI | 3.0 | 5.5 | 0/1 | 17.5s |
| #30 | Grok 4.1 Fast medium | X AI | 3.0 | 6.2 | 0/1 | 16.2s |
| #14 | GLM 5 medium | Z.ai | 5.0 | 7.4 | 0/1 | 14.7s |
| #32 | GPT-5 Mini medium | OpenAI | 4.0 | 6.0 | 0/1 | 13.5s |
| #2 | Gemini 3.1 Pro Preview medium | 10.0 | 9.4 | 1/1 | 11.8s | |
| #6 | Gemini 3 Pro Preview medium | 10.0 | 8.2 | 1/1 | 9.34s | |
| #39 | gpt-oss-120b medium | OpenAI | 3.0 | 5.1 | 0/1 | 7.90s |
| #43 | MiniMax M2.5 medium | Minimax | 3.0 | 4.7 | 0/1 | 6.63s |
| #13 | Step 3.5 Flash medium | Stepfun | 6.0 | 7.4 | 0/1 | 6.54s |
| #8 | Gemini 3.1 Flash Lite Preview high | 10.0 | 8.2 | 1/1 | 5.25s | |
| #26 | Claude Opus 4.6 medium | Anthropic | 10.0 | 6.6 | 1/1 | 5.04s |
| #11 | Claude Sonnet 4.6 medium | Anthropic | 10.0 | 7.7 | 1/1 | 4.94s |
| #9 | GPT-5.4 medium | OpenAI | 5.0 | 8.0 | 0/1 | 4.92s |
| #3 | GPT-5.3-Codex medium | OpenAI | 4.0 | 8.4 | 0/1 | 4.87s |
| #16 | Gemini 2.5 Flash medium | 4.0 | 7.4 | 0/1 | 4.86s | |
| #27 | GPT-5.2 medium | OpenAI | 10.0 | 6.5 | 0/1 | 4.32s |
| #21 | MiMo-V2-Flash medium | Xiaomi | 3.0 | 7.2 | 0/1 | 4.20s |
| #1 | Gemini 3 Flash Preview medium | 10.0 | 10.0 | 1/1 | 4.09s | |
| #46 | Kimi K2.5 none | Moonshot AI | 10.0 | 4.1 | 1/1 | 4.00s |
| #5 | Gemini 3 Flash Preview low | 10.0 | 8.2 | 1/1 | 3.68s | |
| #31 | GLM 5 none | Z.ai | 10.0 | 6.0 | 1/1 | 3.27s |
| #15 | GPT-5.2 Chat none | OpenAI | 4.0 | 7.4 | 0/1 | 3.20s |
| #12 | Gemini 3.1 Flash Lite Preview medium | 10.0 | 7.5 | 1/1 | 3.16s | |
| #33 | DeepSeek V3.2 none | DeepSeek | 10.0 | 5.5 | 1/1 | 2.86s |
| #45 | Trinity Large Preview none | Arcee AI | 3.0 | 4.2 | 0/1 | 2.86s |
| #25 | Claude Sonnet 4.6 none | Anthropic | 5.0 | 6.8 | 0/1 | 2.56s |
| #41 | Qwen3.5-27B none | Qwen | 5.0 | 4.9 | 0/1 | 2.51s |
| #29 | Qwen3.5 Plus 2026-02-15 none | Qwen | 4.0 | 6.2 | 0/1 | 2.26s |
| #19 | GPT-5.3 Chat none | OpenAI | 4.0 | 7.3 | 0/1 | 1.99s |
| #44 | GPT-5.4 none | OpenAI | 3.0 | 4.5 | 0/1 | 1.78s |
| #54 | MiMo-V2-Flash none | Xiaomi | 4.0 | 2.9 | 0/1 | 1.67s |
| #49 | GLM 4.7 Flash none | Z.ai | 3.0 | 3.9 | 0/1 | 1.59s |
| #17 | Gemini 3.1 Flash Lite Preview low | 3.0 | 7.3 | 0/1 | 1.54s | |
| #50 | Qwen3 Coder Next medium | Qwen | 6.0 | 3.5 | 0/1 | 1.39s |
| #48 | Qwen3 Coder Next none | Qwen | 10.0 | 4.0 | 1/1 | 1.34s |
| #42 | Qwen3.5-35B-A3B none | Qwen | 6.0 | 4.7 | 0/1 | 1.19s |
| #20 | Gemini 3 Flash Preview none | 10.0 | 7.2 | 1/1 | 1.13s | |
| #40 | Qwen3.5-122B-A10B none | Qwen | 5.0 | 5.0 | 0/1 | 1.12s |
| #53 | Grok 4.1 Fast none | X AI | 3.0 | 2.9 | 0/1 | 1.08s |
| #47 | GPT-4o-mini none | OpenAI | 3.0 | 4.0 | 0/1 | 909ms |
| #36 | Mercury 2 medium | Inception | 4.0 | 5.3 | 0/1 | 821ms |
| #37 | Qwen3.5-Flash none | Qwen | 10.0 | 5.2 | 1/1 | 803ms |
| #22 | Gemini 3.1 Flash Lite Preview none | 3.0 | 7.1 | 0/1 | 741ms | |
| #51 | Mercury 2 none | Inception | 4.0 | 3.4 | 0/1 | 628ms |
| #38 | Gemini 2.5 Flash none | 5.0 | 5.2 | 0/1 | 615ms | |
| #55 | LFM2-24B-A2B none | Liquid | 3.0 | 2.6 | 0/1 | 395ms |