Kushindwa kwa AI BENCHY
Kushindwa kwa Hakufuata maelekezo
Ona ni modeli gani za AI hukutana na Hakufuata maelekezo mara nyingi zaidi ili utambue hatari za utegemevu kabla ya kuchagua. Panga kwa: Wastani wa alama ↓.
Kategoria zinazohusiana
| Nafasi | Modeli | Kampuni | Idadi ya Hakufuata maelekezo | Wastani wa alama | Majaribio sahihi | Muda wa majibu (wastani) |
|---|---|---|---|---|---|---|
| #3 | GPT-5.3-Codex medium | OpenAI | 2 | 8.4 | 12/16 | 16.6s |
| #7 | Qwen3.5-27B medium | Qwen | 2 | 8.2 | 12/16 | 52.1s |
| #8 | Gemini 3.1 Flash Lite Preview high | 1 | 8.2 | 12/16 | 68.8s | |
| #9 | GPT-5.4 medium | OpenAI | 2 | 8.0 | 12/16 | 20.1s |
| #12 | Gemini 3.1 Flash Lite Preview medium | 1 | 7.5 | 11/16 | 3.83s | |
| #13 | Step 3.5 Flash medium | Stepfun | 3 | 7.4 | 10/16 | 29.1s |
| #14 | GLM 5 medium | Z.ai | 1 | 7.4 | 11/16 | 16.2s |
| #15 | GPT-5.2 Chat none | OpenAI | 1 | 7.4 | 11/16 | 7.03s |
| #16 | Gemini 2.5 Flash medium | 1 | 7.4 | 11/16 | 12.4s | |
| #17 | Gemini 3.1 Flash Lite Preview low | 1 | 7.3 | 11/16 | 3.36s | |
| #18 | DeepSeek V3.2 medium | DeepSeek | 1 | 7.3 | 11/16 | 39.5s |
| #19 | GPT-5.3 Chat none | OpenAI | 2 | 7.3 | 10/16 | 5.96s |
| #21 | MiMo-V2-Flash medium | Xiaomi | 1 | 7.2 | 11/16 | 25.3s |
| #22 | Gemini 3.1 Flash Lite Preview none | 2 | 7.1 | 10/16 | 1.33s | |
| #23 | Seed-2.0-Mini medium | Bytedance Seed | 1 | 6.9 | 10/16 | 65.1s |
| #24 | Qwen3.5-Flash medium | Qwen | 1 | 6.9 | 10/16 | 70.8s |
| #25 | Claude Sonnet 4.6 none | Anthropic | 1 | 6.8 | 10/16 | 5.57s |
| #27 | GPT-5.2 medium | OpenAI | 3 | 6.5 | 10/16 | 15.3s |
| #28 | Kimi K2.5 medium | Moonshot AI | 2 | 6.4 | 9/16 | 69.8s |
| #30 | Grok 4.1 Fast medium | X AI | 3 | 6.2 | 9/16 | 26.3s |
| #32 | GPT-5 Mini medium | OpenAI | 4 | 6.0 | 8/16 | 25.1s |
| #34 | GPT-5 Nano medium | OpenAI | 3 | 5.5 | 7/16 | 47.9s |
| #36 | Mercury 2 medium | Inception | 4 | 5.3 | 7/16 | 2.36s |
| #37 | Qwen3.5-Flash none | Qwen | 1 | 5.2 | 7/16 | 3.54s |
| #38 | Gemini 2.5 Flash none | 1 | 5.2 | 6/16 | 923ms | |
| #39 | gpt-oss-120b medium | OpenAI | 4 | 5.1 | 7/16 | 16.7s |
| #40 | Qwen3.5-122B-A10B none | Qwen | 1 | 5.0 | 6/16 | 3.72s |
| #41 | Qwen3.5-27B none | Qwen | 2 | 4.9 | 5/16 | 1.75s |
| #42 | Qwen3.5-35B-A3B none | Qwen | 2 | 4.7 | 6/16 | 4.10s |
| #43 | MiniMax M2.5 medium | Minimax | 3 | 4.7 | 5/16 | 43.0s |
| #44 | GPT-5.4 none | OpenAI | 1 | 4.5 | 6/16 | 1.48s |
| #45 | Trinity Large Preview none | Arcee AI | 2 | 4.2 | 5/16 | 3.15s |
| #47 | GPT-4o-mini none | OpenAI | 1 | 4.0 | 4/16 | 2.07s |
| #48 | Qwen3 Coder Next none | Qwen | 1 | 4.0 | 4/16 | 11.7s |
| #49 | GLM 4.7 Flash none | Z.ai | 2 | 3.9 | 4/16 | 2.99s |
| #50 | Qwen3 Coder Next medium | Qwen | 5 | 3.5 | 3/16 | 12.5s |
| #51 | Mercury 2 none | Inception | 1 | 3.4 | 4/16 | 596ms |
| #52 | GLM 4.7 Flash medium | Z.ai | 2 | 3.1 | 4/16 | 36.8s |
| #53 | Grok 4.1 Fast none | X AI | 2 | 2.9 | 3/16 | 1.90s |
| #54 | MiMo-V2-Flash none | Xiaomi | 1 | 2.9 | 3/16 | 2.97s |
| #55 | LFM2-24B-A2B none | Liquid | 2 | 2.6 | 1/16 | 811ms |