Kategoria ya AI BENCHY
Orodha ya Mbinu za kupinga AI
Ona ni modeli gani za AI zinafanya vizuri zaidi katika Mbinu za kupinga AI, zipi zinabaki thabiti, na pengo kubwa liko wapi. Panga kwa: Muda wa majibu (wastani) ↑.
Sababu zinazohusiana za kushindwa
| Nafasi | Modeli | Kampuni | Alama ya Mbinu za kupinga AI | Wastani wa alama | Majaribio sahihi | Muda wa majibu (wastani) |
|---|---|---|---|---|---|---|
| #51 | Mercury 2 none | Inception | 10.0 | 3.4 | 0/3 | 466ms |
| #55 | LFM2-24B-A2B none | Liquid | 10.0 | 2.6 | 0/3 | 471ms |
| #38 | Gemini 2.5 Flash none | 10.0 | 5.2 | 0/3 | 668ms | |
| #41 | Qwen3.5-27B none | Qwen | 4.0 | 4.9 | 1/3 | 796ms |
| #40 | Qwen3.5-122B-A10B none | Qwen | 4.0 | 5.0 | 1/3 | 927ms |
| #22 | Gemini 3.1 Flash Lite Preview none | 6.0 | 7.1 | 1/3 | 1.16s | |
| #36 | Mercury 2 medium | Inception | 7.3 | 5.3 | 2/3 | 1.30s |
| #54 | MiMo-V2-Flash none | Xiaomi | 10.0 | 2.9 | 0/3 | 1.36s |
| #44 | GPT-5.4 none | OpenAI | 10.0 | 4.5 | 0/3 | 1.41s |
| #20 | Gemini 3 Flash Preview none | 7.0 | 7.2 | 2/3 | 1.59s | |
| #37 | Qwen3.5-Flash none | Qwen | 2.3 | 5.2 | 0/3 | 1.62s |
| #53 | Grok 4.1 Fast none | X AI | 1.3 | 2.9 | 0/3 | 1.73s |
| #42 | Qwen3.5-35B-A3B none | Qwen | 10.0 | 4.7 | 0/3 | 1.76s |
| #47 | GPT-4o-mini none | OpenAI | 4.0 | 4.0 | 1/3 | 1.83s |
| #17 | Gemini 3.1 Flash Lite Preview low | 7.0 | 7.3 | 2/3 | 2.18s | |
| #12 | Gemini 3.1 Flash Lite Preview medium | 9.0 | 7.5 | 2/3 | 2.53s | |
| #29 | Qwen3.5 Plus 2026-02-15 none | Qwen | 4.0 | 6.2 | 1/3 | 2.74s |
| #31 | GLM 5 none | Z.ai | 4.0 | 6.0 | 1/3 | 3.39s |
| #5 | Gemini 3 Flash Preview low | 10.0 | 8.2 | 3/3 | 3.50s | |
| #45 | Trinity Large Preview none | Arcee AI | 10.0 | 4.2 | 0/3 | 3.59s |
| #6 | Gemini 3 Pro Preview medium | 10.0 | 8.2 | 3/3 | 3.75s | |
| #15 | GPT-5.2 Chat none | OpenAI | 10.0 | 7.4 | 3/3 | 3.97s |
| #48 | Qwen3 Coder Next none | Qwen | 2.3 | 4.0 | 0/3 | 4.39s |
| #3 | GPT-5.3-Codex medium | OpenAI | 10.0 | 8.4 | 3/3 | 4.69s |
| #19 | GPT-5.3 Chat none | OpenAI | 7.3 | 7.3 | 2/3 | 4.72s |
| #25 | Claude Sonnet 4.6 none | Anthropic | 4.0 | 6.8 | 1/3 | 4.83s |
| #11 | Claude Sonnet 4.6 medium | Anthropic | 7.0 | 7.7 | 2/3 | 4.95s |
| #9 | GPT-5.4 medium | OpenAI | 10.0 | 8.0 | 3/3 | 5.02s |
| #1 | Gemini 3 Flash Preview medium | 10.0 | 10.0 | 3/3 | 5.61s | |
| #30 | Grok 4.1 Fast medium | X AI | 10.0 | 6.2 | 3/3 | 5.65s |
| #49 | GLM 4.7 Flash none | Z.ai | 10.0 | 3.9 | 0/3 | 6.59s |
| #16 | Gemini 2.5 Flash medium | 7.3 | 7.4 | 2/3 | 6.98s | |
| #10 | Qwen3.5-122B-A10B medium | Qwen | 10.0 | 7.7 | 3/3 | 6.99s |
| #33 | DeepSeek V3.2 none | DeepSeek | 10.0 | 5.5 | 0/3 | 8.79s |
| #2 | Gemini 3.1 Pro Preview medium | 10.0 | 9.4 | 3/3 | 9.52s | |
| #7 | Qwen3.5-27B medium | Qwen | 10.0 | 8.2 | 3/3 | 9.69s |
| #4 | Qwen3.5 Plus 2026-02-15 medium | Qwen | 10.0 | 8.3 | 3/3 | 10.4s |
| #46 | Kimi K2.5 none | Moonshot AI | 2.7 | 4.1 | 0/3 | 11.4s |
| #26 | Claude Opus 4.6 medium | Anthropic | 4.0 | 6.6 | 1/3 | 11.9s |
| #27 | GPT-5.2 medium | OpenAI | 7.0 | 6.5 | 2/3 | 14.3s |
| #50 | Qwen3 Coder Next medium | Qwen | 1.3 | 3.5 | 0/3 | 15.3s |
| #32 | GPT-5 Mini medium | OpenAI | 7.0 | 6.0 | 2/3 | 16.5s |
| #21 | MiMo-V2-Flash medium | Xiaomi | 9.7 | 7.2 | 3/3 | 16.8s |
| #13 | Step 3.5 Flash medium | Stepfun | 10.0 | 7.4 | 3/3 | 18.5s |
| #39 | gpt-oss-120b medium | OpenAI | 7.0 | 5.1 | 2/3 | 19.8s |
| #35 | Qwen3.5-35B-A3B medium | Qwen | 10.0 | 5.5 | 3/3 | 21.8s |
| #14 | GLM 5 medium | Z.ai | 10.0 | 7.4 | 3/3 | 22.3s |
| #52 | GLM 4.7 Flash medium | Z.ai | 4.0 | 3.1 | 1/3 | 27.1s |
| #43 | MiniMax M2.5 medium | Minimax | 9.3 | 4.7 | 2/3 | 32.4s |
| #18 | DeepSeek V3.2 medium | DeepSeek | 7.0 | 7.3 | 2/3 | 33.4s |
| #34 | GPT-5 Nano medium | OpenAI | 7.0 | 5.5 | 2/3 | 37.7s |
| #8 | Gemini 3.1 Flash Lite Preview high | 10.0 | 8.2 | 3/3 | 43.9s | |
| #24 | Qwen3.5-Flash medium | Qwen | 10.0 | 6.9 | 3/3 | 71.4s |
| #28 | Kimi K2.5 medium | Moonshot AI | 7.0 | 6.4 | 2/3 | 85.3s |
| #23 | Seed-2.0-Mini medium | Bytedance Seed | 7.0 | 6.9 | 2/3 | 99.0s |