AI BENCHY श्रेणी
निर्देश पालन रैंकिंग
देखें कि निर्देश पालन में कौन से AI मॉडल सबसे अच्छा प्रदर्शन करते हैं, कौन से भरोसेमंद बने रहते हैं और सबसे बड़े अंतर कहाँ दिखाई देते हैं। क्रमबद्ध करें: प्रतिक्रिया समय (औसत) ↑.
संबंधित विफलता कारण
| रैंक | मॉडल | कंपनी | निर्देश पालन स्कोर | औसत स्कोर | सही परीक्षण | प्रतिक्रिया समय (औसत) |
|---|---|---|---|---|---|---|
| #51 | Mercury 2 none | Inception | 5.5 | 3.4 | 1/2 | 551ms |
| #40 | Qwen3.5-122B-A10B none | Qwen | 4.5 | 5.0 | 0/2 | 585ms |
| #38 | Gemini 2.5 Flash none | 9.0 | 5.2 | 1/2 | 672ms | |
| #42 | Qwen3.5-35B-A3B none | Qwen | 5.0 | 4.7 | 1/2 | 809ms |
| #41 | Qwen3.5-27B none | Qwen | 4.5 | 4.9 | 0/2 | 815ms |
| #54 | MiMo-V2-Flash none | Xiaomi | 5.5 | 2.9 | 1/2 | 857ms |
| #49 | GLM 4.7 Flash none | Z.ai | 5.5 | 3.9 | 1/2 | 888ms |
| #53 | Grok 4.1 Fast none | X AI | 10.0 | 2.9 | 0/2 | 923ms |
| #36 | Mercury 2 medium | Inception | 10.0 | 5.3 | 2/2 | 1.07s |
| #44 | GPT-5.4 none | OpenAI | 5.5 | 4.5 | 1/2 | 1.07s |
| #55 | LFM2-24B-A2B none | Liquid | 4.5 | 2.6 | 0/2 | 1.09s |
| #45 | Trinity Large Preview none | Arcee AI | 3.5 | 4.2 | 0/2 | 1.09s |
| #22 | Gemini 3.1 Flash Lite Preview none | 10.0 | 7.1 | 2/2 | 1.13s | |
| #47 | GPT-4o-mini none | OpenAI | 4.5 | 4.0 | 0/2 | 1.27s |
| #31 | GLM 5 none | Z.ai | 10.0 | 6.0 | 2/2 | 1.48s |
| #17 | Gemini 3.1 Flash Lite Preview low | 10.0 | 7.3 | 2/2 | 1.49s | |
| #33 | DeepSeek V3.2 none | DeepSeek | 10.0 | 5.5 | 2/2 | 1.52s |
| #20 | Gemini 3 Flash Preview none | 5.5 | 7.2 | 1/2 | 1.58s | |
| #29 | Qwen3.5 Plus 2026-02-15 none | Qwen | 10.0 | 6.2 | 2/2 | 1.67s |
| #12 | Gemini 3.1 Flash Lite Preview medium | 10.0 | 7.5 | 2/2 | 1.91s | |
| #25 | Claude Sonnet 4.6 none | Anthropic | 5.5 | 6.8 | 1/2 | 1.96s |
| #26 | Claude Opus 4.6 medium | Anthropic | 10.0 | 6.6 | 2/2 | 2.43s |
| #11 | Claude Sonnet 4.6 medium | Anthropic | 10.0 | 7.7 | 2/2 | 2.61s |
| #16 | Gemini 2.5 Flash medium | 9.5 | 7.4 | 2/2 | 2.62s | |
| #46 | Kimi K2.5 none | Moonshot AI | 5.5 | 4.1 | 1/2 | 2.67s |
| #52 | GLM 4.7 Flash medium | Z.ai | 5.0 | 3.1 | 1/2 | 2.97s |
| #3 | GPT-5.3-Codex medium | OpenAI | 10.0 | 8.4 | 2/2 | 3.04s |
| #9 | GPT-5.4 medium | OpenAI | 10.0 | 8.0 | 2/2 | 3.11s |
| #27 | GPT-5.2 medium | OpenAI | 9.5 | 6.5 | 2/2 | 3.12s |
| #6 | Gemini 3 Pro Preview medium | 9.5 | 8.2 | 2/2 | 3.26s | |
| #19 | GPT-5.3 Chat none | OpenAI | 9.0 | 7.3 | 1/2 | 3.29s |
| #21 | MiMo-V2-Flash medium | Xiaomi | 10.0 | 7.2 | 2/2 | 4.28s |
| #43 | MiniMax M2.5 medium | Minimax | 8.0 | 4.7 | 1/2 | 4.64s |
| #13 | Step 3.5 Flash medium | Stepfun | 9.0 | 7.4 | 1/2 | 4.98s |
| #30 | Grok 4.1 Fast medium | X AI | 5.5 | 6.2 | 1/2 | 5.30s |
| #15 | GPT-5.2 Chat none | OpenAI | 6.0 | 7.4 | 1/2 | 5.46s |
| #1 | Gemini 3 Flash Preview medium | 10.0 | 10.0 | 2/2 | 6.10s | |
| #5 | Gemini 3 Flash Preview low | 9.5 | 8.2 | 2/2 | 7.02s | |
| #14 | GLM 5 medium | Z.ai | 10.0 | 7.4 | 2/2 | 7.25s |
| #50 | Qwen3 Coder Next medium | Qwen | 4.5 | 3.5 | 0/2 | 7.34s |
| #39 | gpt-oss-120b medium | OpenAI | 9.5 | 5.1 | 2/2 | 7.63s |
| #48 | Qwen3 Coder Next none | Qwen | 4.5 | 4.0 | 0/2 | 7.71s |
| #37 | Qwen3.5-Flash none | Qwen | 5.0 | 5.2 | 1/2 | 8.81s |
| #2 | Gemini 3.1 Pro Preview medium | 10.0 | 9.4 | 2/2 | 9.56s | |
| #10 | Qwen3.5-122B-A10B medium | Qwen | 10.0 | 7.7 | 2/2 | 9.88s |
| #34 | GPT-5 Nano medium | OpenAI | 9.0 | 5.5 | 1/2 | 11.9s |
| #32 | GPT-5 Mini medium | OpenAI | 7.5 | 6.0 | 1/2 | 15.7s |
| #23 | Seed-2.0-Mini medium | Bytedance Seed | 10.0 | 6.9 | 2/2 | 17.5s |
| #7 | Qwen3.5-27B medium | Qwen | 10.0 | 8.2 | 2/2 | 19.7s |
| #35 | Qwen3.5-35B-A3B medium | Qwen | 10.0 | 5.5 | 2/2 | 24.4s |
| #4 | Qwen3.5 Plus 2026-02-15 medium | Qwen | 10.0 | 8.3 | 2/2 | 31.9s |
| #18 | DeepSeek V3.2 medium | DeepSeek | 10.0 | 7.3 | 2/2 | 35.8s |
| #24 | Qwen3.5-Flash medium | Qwen | 10.0 | 6.9 | 2/2 | 63.5s |
| #8 | Gemini 3.1 Flash Lite Preview high | 9.0 | 8.2 | 1/2 | 70.1s | |
| #28 | Kimi K2.5 medium | Moonshot AI | 10.0 | 6.4 | 2/2 | 92.5s |