AI BENCHY श्रेणी
निर्देश पालन रैंकिंग
देखें कि निर्देश पालन में कौन से AI मॉडल सबसे अच्छा प्रदर्शन करते हैं, कौन से भरोसेमंद बने रहते हैं और सबसे बड़े अंतर कहाँ दिखाई देते हैं। क्रमबद्ध करें: प्रतिक्रिया समय (औसत) ↓.
संबंधित विफलता कारण
| रैंक | मॉडल | कंपनी | निर्देश पालन स्कोर | औसत स्कोर | सही परीक्षण | प्रतिक्रिया समय (औसत) |
|---|---|---|---|---|---|---|
| #28 | Kimi K2.5 medium | Moonshot AI | 10.0 | 6.4 | 2/2 | 92.5s |
| #8 | Gemini 3.1 Flash Lite Preview high | 9.0 | 8.2 | 1/2 | 70.1s | |
| #24 | Qwen3.5-Flash medium | Qwen | 10.0 | 6.9 | 2/2 | 63.5s |
| #18 | DeepSeek V3.2 medium | DeepSeek | 10.0 | 7.3 | 2/2 | 35.8s |
| #4 | Qwen3.5 Plus 2026-02-15 medium | Qwen | 10.0 | 8.3 | 2/2 | 31.9s |
| #35 | Qwen3.5-35B-A3B medium | Qwen | 10.0 | 5.5 | 2/2 | 24.4s |
| #7 | Qwen3.5-27B medium | Qwen | 10.0 | 8.2 | 2/2 | 19.7s |
| #23 | Seed-2.0-Mini medium | Bytedance Seed | 10.0 | 6.9 | 2/2 | 17.5s |
| #32 | GPT-5 Mini medium | OpenAI | 7.5 | 6.0 | 1/2 | 15.7s |
| #34 | GPT-5 Nano medium | OpenAI | 9.0 | 5.5 | 1/2 | 11.9s |
| #10 | Qwen3.5-122B-A10B medium | Qwen | 10.0 | 7.7 | 2/2 | 9.88s |
| #2 | Gemini 3.1 Pro Preview medium | 10.0 | 9.4 | 2/2 | 9.56s | |
| #37 | Qwen3.5-Flash none | Qwen | 5.0 | 5.2 | 1/2 | 8.81s |
| #48 | Qwen3 Coder Next none | Qwen | 4.5 | 4.0 | 0/2 | 7.71s |
| #39 | gpt-oss-120b medium | OpenAI | 9.5 | 5.1 | 2/2 | 7.63s |
| #50 | Qwen3 Coder Next medium | Qwen | 4.5 | 3.5 | 0/2 | 7.34s |
| #14 | GLM 5 medium | Z.ai | 10.0 | 7.4 | 2/2 | 7.25s |
| #5 | Gemini 3 Flash Preview low | 9.5 | 8.2 | 2/2 | 7.02s | |
| #1 | Gemini 3 Flash Preview medium | 10.0 | 10.0 | 2/2 | 6.10s | |
| #15 | GPT-5.2 Chat none | OpenAI | 6.0 | 7.4 | 1/2 | 5.46s |
| #30 | Grok 4.1 Fast medium | X AI | 5.5 | 6.2 | 1/2 | 5.30s |
| #13 | Step 3.5 Flash medium | Stepfun | 9.0 | 7.4 | 1/2 | 4.98s |
| #43 | MiniMax M2.5 medium | Minimax | 8.0 | 4.7 | 1/2 | 4.64s |
| #21 | MiMo-V2-Flash medium | Xiaomi | 10.0 | 7.2 | 2/2 | 4.28s |
| #19 | GPT-5.3 Chat none | OpenAI | 9.0 | 7.3 | 1/2 | 3.29s |
| #6 | Gemini 3 Pro Preview medium | 9.5 | 8.2 | 2/2 | 3.26s | |
| #27 | GPT-5.2 medium | OpenAI | 9.5 | 6.5 | 2/2 | 3.12s |
| #9 | GPT-5.4 medium | OpenAI | 10.0 | 8.0 | 2/2 | 3.11s |
| #3 | GPT-5.3-Codex medium | OpenAI | 10.0 | 8.4 | 2/2 | 3.04s |
| #52 | GLM 4.7 Flash medium | Z.ai | 5.0 | 3.1 | 1/2 | 2.97s |
| #46 | Kimi K2.5 none | Moonshot AI | 5.5 | 4.1 | 1/2 | 2.67s |
| #16 | Gemini 2.5 Flash medium | 9.5 | 7.4 | 2/2 | 2.62s | |
| #11 | Claude Sonnet 4.6 medium | Anthropic | 10.0 | 7.7 | 2/2 | 2.61s |
| #26 | Claude Opus 4.6 medium | Anthropic | 10.0 | 6.6 | 2/2 | 2.43s |
| #25 | Claude Sonnet 4.6 none | Anthropic | 5.5 | 6.8 | 1/2 | 1.96s |
| #12 | Gemini 3.1 Flash Lite Preview medium | 10.0 | 7.5 | 2/2 | 1.91s | |
| #29 | Qwen3.5 Plus 2026-02-15 none | Qwen | 10.0 | 6.2 | 2/2 | 1.67s |
| #20 | Gemini 3 Flash Preview none | 5.5 | 7.2 | 1/2 | 1.58s | |
| #33 | DeepSeek V3.2 none | DeepSeek | 10.0 | 5.5 | 2/2 | 1.52s |
| #17 | Gemini 3.1 Flash Lite Preview low | 10.0 | 7.3 | 2/2 | 1.49s | |
| #31 | GLM 5 none | Z.ai | 10.0 | 6.0 | 2/2 | 1.48s |
| #47 | GPT-4o-mini none | OpenAI | 4.5 | 4.0 | 0/2 | 1.27s |
| #22 | Gemini 3.1 Flash Lite Preview none | 10.0 | 7.1 | 2/2 | 1.13s | |
| #45 | Trinity Large Preview none | Arcee AI | 3.5 | 4.2 | 0/2 | 1.09s |
| #55 | LFM2-24B-A2B none | Liquid | 4.5 | 2.6 | 0/2 | 1.09s |
| #44 | GPT-5.4 none | OpenAI | 5.5 | 4.5 | 1/2 | 1.07s |
| #36 | Mercury 2 medium | Inception | 10.0 | 5.3 | 2/2 | 1.07s |
| #53 | Grok 4.1 Fast none | X AI | 10.0 | 2.9 | 0/2 | 923ms |
| #49 | GLM 4.7 Flash none | Z.ai | 5.5 | 3.9 | 1/2 | 888ms |
| #54 | MiMo-V2-Flash none | Xiaomi | 5.5 | 2.9 | 1/2 | 857ms |
| #41 | Qwen3.5-27B none | Qwen | 4.5 | 4.9 | 0/2 | 815ms |
| #42 | Qwen3.5-35B-A3B none | Qwen | 5.0 | 4.7 | 1/2 | 809ms |
| #38 | Gemini 2.5 Flash none | 9.0 | 5.2 | 1/2 | 672ms | |
| #40 | Qwen3.5-122B-A10B none | Qwen | 4.5 | 5.0 | 0/2 | 585ms |
| #51 | Mercury 2 none | Inception | 5.5 | 3.4 | 1/2 | 551ms |