Kategoria ya AI BENCHY
Orodha ya Akili ya jumla
Ona ni modeli gani za AI zinafanya vizuri zaidi katika Akili ya jumla, zipi zinabaki thabiti, na pengo kubwa liko wapi. Panga kwa: Muda wa majibu (wastani) ↑.
Sababu zinazohusiana za kushindwa
| Nafasi | Modeli | Kampuni | Alama ya Akili ya jumla | Wastani wa alama | Majaribio sahihi | Muda wa majibu (wastani) |
|---|---|---|---|---|---|---|
| #55 | LFM2-24B-A2B none | Liquid | 3.0 | 2.6 | 0/1 | 395ms |
| #38 | Gemini 2.5 Flash none | 5.0 | 5.2 | 0/1 | 615ms | |
| #51 | Mercury 2 none | Inception | 4.0 | 3.4 | 0/1 | 628ms |
| #22 | Gemini 3.1 Flash Lite Preview none | 3.0 | 7.1 | 0/1 | 741ms | |
| #37 | Qwen3.5-Flash none | Qwen | 10.0 | 5.2 | 1/1 | 803ms |
| #36 | Mercury 2 medium | Inception | 4.0 | 5.3 | 0/1 | 821ms |
| #47 | GPT-4o-mini none | OpenAI | 3.0 | 4.0 | 0/1 | 909ms |
| #53 | Grok 4.1 Fast none | X AI | 3.0 | 2.9 | 0/1 | 1.08s |
| #40 | Qwen3.5-122B-A10B none | Qwen | 5.0 | 5.0 | 0/1 | 1.12s |
| #20 | Gemini 3 Flash Preview none | 10.0 | 7.2 | 1/1 | 1.13s | |
| #42 | Qwen3.5-35B-A3B none | Qwen | 6.0 | 4.7 | 0/1 | 1.19s |
| #48 | Qwen3 Coder Next none | Qwen | 10.0 | 4.0 | 1/1 | 1.34s |
| #50 | Qwen3 Coder Next medium | Qwen | 6.0 | 3.5 | 0/1 | 1.39s |
| #17 | Gemini 3.1 Flash Lite Preview low | 3.0 | 7.3 | 0/1 | 1.54s | |
| #49 | GLM 4.7 Flash none | Z.ai | 3.0 | 3.9 | 0/1 | 1.59s |
| #54 | MiMo-V2-Flash none | Xiaomi | 4.0 | 2.9 | 0/1 | 1.67s |
| #44 | GPT-5.4 none | OpenAI | 3.0 | 4.5 | 0/1 | 1.78s |
| #19 | GPT-5.3 Chat none | OpenAI | 4.0 | 7.3 | 0/1 | 1.99s |
| #29 | Qwen3.5 Plus 2026-02-15 none | Qwen | 4.0 | 6.2 | 0/1 | 2.26s |
| #41 | Qwen3.5-27B none | Qwen | 5.0 | 4.9 | 0/1 | 2.51s |
| #25 | Claude Sonnet 4.6 none | Anthropic | 5.0 | 6.8 | 0/1 | 2.56s |
| #45 | Trinity Large Preview none | Arcee AI | 3.0 | 4.2 | 0/1 | 2.86s |
| #33 | DeepSeek V3.2 none | DeepSeek | 10.0 | 5.5 | 1/1 | 2.86s |
| #12 | Gemini 3.1 Flash Lite Preview medium | 10.0 | 7.5 | 1/1 | 3.16s | |
| #15 | GPT-5.2 Chat none | OpenAI | 4.0 | 7.4 | 0/1 | 3.20s |
| #31 | GLM 5 none | Z.ai | 10.0 | 6.0 | 1/1 | 3.27s |
| #5 | Gemini 3 Flash Preview low | 10.0 | 8.2 | 1/1 | 3.68s | |
| #46 | Kimi K2.5 none | Moonshot AI | 10.0 | 4.1 | 1/1 | 4.00s |
| #1 | Gemini 3 Flash Preview medium | 10.0 | 10.0 | 1/1 | 4.09s | |
| #21 | MiMo-V2-Flash medium | Xiaomi | 3.0 | 7.2 | 0/1 | 4.20s |
| #27 | GPT-5.2 medium | OpenAI | 10.0 | 6.5 | 0/1 | 4.32s |
| #16 | Gemini 2.5 Flash medium | 4.0 | 7.4 | 0/1 | 4.86s | |
| #3 | GPT-5.3-Codex medium | OpenAI | 4.0 | 8.4 | 0/1 | 4.87s |
| #9 | GPT-5.4 medium | OpenAI | 5.0 | 8.0 | 0/1 | 4.92s |
| #11 | Claude Sonnet 4.6 medium | Anthropic | 10.0 | 7.7 | 1/1 | 4.94s |
| #26 | Claude Opus 4.6 medium | Anthropic | 10.0 | 6.6 | 1/1 | 5.04s |
| #8 | Gemini 3.1 Flash Lite Preview high | 10.0 | 8.2 | 1/1 | 5.25s | |
| #13 | Step 3.5 Flash medium | Stepfun | 6.0 | 7.4 | 0/1 | 6.54s |
| #43 | MiniMax M2.5 medium | Minimax | 3.0 | 4.7 | 0/1 | 6.63s |
| #39 | gpt-oss-120b medium | OpenAI | 3.0 | 5.1 | 0/1 | 7.90s |
| #6 | Gemini 3 Pro Preview medium | 10.0 | 8.2 | 1/1 | 9.34s | |
| #2 | Gemini 3.1 Pro Preview medium | 10.0 | 9.4 | 1/1 | 11.8s | |
| #32 | GPT-5 Mini medium | OpenAI | 4.0 | 6.0 | 0/1 | 13.5s |
| #14 | GLM 5 medium | Z.ai | 5.0 | 7.4 | 0/1 | 14.7s |
| #30 | Grok 4.1 Fast medium | X AI | 3.0 | 6.2 | 0/1 | 16.2s |
| #34 | GPT-5 Nano medium | OpenAI | 3.0 | 5.5 | 0/1 | 17.5s |
| #52 | GLM 4.7 Flash medium | Z.ai | 10.0 | 3.1 | 0/1 | 18.1s |
| #35 | Qwen3.5-35B-A3B medium | Qwen | 10.0 | 5.5 | 0/1 | 30.3s |
| #18 | DeepSeek V3.2 medium | DeepSeek | 3.0 | 7.3 | 0/1 | 31.3s |
| #10 | Qwen3.5-122B-A10B medium | Qwen | 10.0 | 7.7 | 0/1 | 34.1s |
| #23 | Seed-2.0-Mini medium | Bytedance Seed | 6.0 | 6.9 | 0/1 | 36.7s |
| #24 | Qwen3.5-Flash medium | Qwen | 5.0 | 6.9 | 0/1 | 40.1s |
| #28 | Kimi K2.5 medium | Moonshot AI | 6.0 | 6.4 | 0/1 | 69.7s |
| #4 | Qwen3.5 Plus 2026-02-15 medium | Qwen | 10.0 | 8.3 | 0/1 | 79.9s |
| #7 | Qwen3.5-27B medium | Qwen | 5.0 | 8.2 | 0/1 | 101.4s |