Catégorie AI BENCHY
Classement Suivi des instructions
Voyez quels modèles d'IA réussissent le mieux sur Suivi des instructions, lesquels restent fiables et où les écarts sont les plus marqués. Trier par: Temps de réponse (moy.) ↑.
Raisons d'échec liées
| Rang | Modèle | Entreprise | Score Suivi des instructions | Score moy. | Tests corrects | Temps de réponse (moy.) |
|---|---|---|---|---|---|---|
| #51 | Mercury 2 none | Inception | 5.5 | 3.4 | 1/2 | 551ms |
| #40 | Qwen3.5-122B-A10B none | Qwen | 4.5 | 5.0 | 0/2 | 585ms |
| #38 | Gemini 2.5 Flash none | 9.0 | 5.2 | 1/2 | 672ms | |
| #42 | Qwen3.5-35B-A3B none | Qwen | 5.0 | 4.7 | 1/2 | 809ms |
| #41 | Qwen3.5-27B none | Qwen | 4.5 | 4.9 | 0/2 | 815ms |
| #54 | MiMo-V2-Flash none | Xiaomi | 5.5 | 2.9 | 1/2 | 857ms |
| #49 | GLM 4.7 Flash none | Z.ai | 5.5 | 3.9 | 1/2 | 888ms |
| #53 | Grok 4.1 Fast none | X AI | 10.0 | 2.9 | 0/2 | 923ms |
| #36 | Mercury 2 medium | Inception | 10.0 | 5.3 | 2/2 | 1.07s |
| #44 | GPT-5.4 none | OpenAI | 5.5 | 4.5 | 1/2 | 1.07s |
| #55 | LFM2-24B-A2B none | Liquid | 4.5 | 2.6 | 0/2 | 1.09s |
| #45 | Trinity Large Preview none | Arcee AI | 3.5 | 4.2 | 0/2 | 1.09s |
| #22 | Gemini 3.1 Flash Lite Preview none | 10.0 | 7.1 | 2/2 | 1.13s | |
| #47 | GPT-4o-mini none | OpenAI | 4.5 | 4.0 | 0/2 | 1.27s |
| #31 | GLM 5 none | Z.ai | 10.0 | 6.0 | 2/2 | 1.48s |
| #17 | Gemini 3.1 Flash Lite Preview low | 10.0 | 7.3 | 2/2 | 1.49s | |
| #33 | DeepSeek V3.2 none | DeepSeek | 10.0 | 5.5 | 2/2 | 1.52s |
| #20 | Gemini 3 Flash Preview none | 5.5 | 7.2 | 1/2 | 1.58s | |
| #29 | Qwen3.5 Plus 2026-02-15 none | Qwen | 10.0 | 6.2 | 2/2 | 1.67s |
| #12 | Gemini 3.1 Flash Lite Preview medium | 10.0 | 7.5 | 2/2 | 1.91s | |
| #25 | Claude Sonnet 4.6 none | Anthropic | 5.5 | 6.8 | 1/2 | 1.96s |
| #26 | Claude Opus 4.6 medium | Anthropic | 10.0 | 6.6 | 2/2 | 2.43s |
| #11 | Claude Sonnet 4.6 medium | Anthropic | 10.0 | 7.7 | 2/2 | 2.61s |
| #16 | Gemini 2.5 Flash medium | 9.5 | 7.4 | 2/2 | 2.62s | |
| #46 | Kimi K2.5 none | Moonshot AI | 5.5 | 4.1 | 1/2 | 2.67s |
| #52 | GLM 4.7 Flash medium | Z.ai | 5.0 | 3.1 | 1/2 | 2.97s |
| #3 | GPT-5.3-Codex medium | OpenAI | 10.0 | 8.4 | 2/2 | 3.04s |
| #9 | GPT-5.4 medium | OpenAI | 10.0 | 8.0 | 2/2 | 3.11s |
| #27 | GPT-5.2 medium | OpenAI | 9.5 | 6.5 | 2/2 | 3.12s |
| #6 | Gemini 3 Pro Preview medium | 9.5 | 8.2 | 2/2 | 3.26s | |
| #19 | GPT-5.3 Chat none | OpenAI | 9.0 | 7.3 | 1/2 | 3.29s |
| #21 | MiMo-V2-Flash medium | Xiaomi | 10.0 | 7.2 | 2/2 | 4.28s |
| #43 | MiniMax M2.5 medium | Minimax | 8.0 | 4.7 | 1/2 | 4.64s |
| #13 | Step 3.5 Flash medium | Stepfun | 9.0 | 7.4 | 1/2 | 4.98s |
| #30 | Grok 4.1 Fast medium | X AI | 5.5 | 6.2 | 1/2 | 5.30s |
| #15 | GPT-5.2 Chat none | OpenAI | 6.0 | 7.4 | 1/2 | 5.46s |
| #1 | Gemini 3 Flash Preview medium | 10.0 | 10.0 | 2/2 | 6.10s | |
| #5 | Gemini 3 Flash Preview low | 9.5 | 8.2 | 2/2 | 7.02s | |
| #14 | GLM 5 medium | Z.ai | 10.0 | 7.4 | 2/2 | 7.25s |
| #50 | Qwen3 Coder Next medium | Qwen | 4.5 | 3.5 | 0/2 | 7.34s |
| #39 | gpt-oss-120b medium | OpenAI | 9.5 | 5.1 | 2/2 | 7.63s |
| #48 | Qwen3 Coder Next none | Qwen | 4.5 | 4.0 | 0/2 | 7.71s |
| #37 | Qwen3.5-Flash none | Qwen | 5.0 | 5.2 | 1/2 | 8.81s |
| #2 | Gemini 3.1 Pro Preview medium | 10.0 | 9.4 | 2/2 | 9.56s | |
| #10 | Qwen3.5-122B-A10B medium | Qwen | 10.0 | 7.7 | 2/2 | 9.88s |
| #34 | GPT-5 Nano medium | OpenAI | 9.0 | 5.5 | 1/2 | 11.9s |
| #32 | GPT-5 Mini medium | OpenAI | 7.5 | 6.0 | 1/2 | 15.7s |
| #23 | Seed-2.0-Mini medium | Bytedance Seed | 10.0 | 6.9 | 2/2 | 17.5s |
| #7 | Qwen3.5-27B medium | Qwen | 10.0 | 8.2 | 2/2 | 19.7s |
| #35 | Qwen3.5-35B-A3B medium | Qwen | 10.0 | 5.5 | 2/2 | 24.4s |
| #4 | Qwen3.5 Plus 2026-02-15 medium | Qwen | 10.0 | 8.3 | 2/2 | 31.9s |
| #18 | DeepSeek V3.2 medium | DeepSeek | 10.0 | 7.3 | 2/2 | 35.8s |
| #24 | Qwen3.5-Flash medium | Qwen | 10.0 | 6.9 | 2/2 | 63.5s |
| #8 | Gemini 3.1 Flash Lite Preview high | 9.0 | 8.2 | 1/2 | 70.1s | |
| #28 | Kimi K2.5 medium | Moonshot AI | 10.0 | 6.4 | 2/2 | 92.5s |