Échecs AI BENCHY
Échecs N'a pas suivi les instructions
Voyez quels modèles d'IA rencontrent le plus souvent N'a pas suivi les instructions, pour repérer les risques de fiabilité avant de choisir. Trier par: Temps de réponse (moy.) ↓.
| Rang | Modèle | Entreprise | Nombre de N'a pas suivi les instructions | Score moy. | Tests corrects | Temps de réponse (moy.) |
|---|---|---|---|---|---|---|
| #24 | Qwen3.5-Flash medium | Qwen | 1 | 6.9 | 10/16 | 70.8s |
| #28 | Kimi K2.5 medium | Moonshot AI | 2 | 6.4 | 9/16 | 69.8s |
| #8 | Gemini 3.1 Flash Lite Preview high | 1 | 8.2 | 12/16 | 68.8s | |
| #23 | Seed-2.0-Mini medium | Bytedance Seed | 1 | 6.9 | 10/16 | 65.1s |
| #7 | Qwen3.5-27B medium | Qwen | 2 | 8.2 | 12/16 | 52.1s |
| #34 | GPT-5 Nano medium | OpenAI | 3 | 5.5 | 7/16 | 47.9s |
| #43 | MiniMax M2.5 medium | Minimax | 3 | 4.7 | 5/16 | 43.0s |
| #18 | DeepSeek V3.2 medium | DeepSeek | 1 | 7.3 | 11/16 | 39.5s |
| #52 | GLM 4.7 Flash medium | Z.ai | 2 | 3.1 | 4/16 | 36.8s |
| #13 | Step 3.5 Flash medium | Stepfun | 3 | 7.4 | 10/16 | 29.1s |
| #30 | Grok 4.1 Fast medium | X AI | 3 | 6.2 | 9/16 | 26.3s |
| #21 | MiMo-V2-Flash medium | Xiaomi | 1 | 7.2 | 11/16 | 25.3s |
| #32 | GPT-5 Mini medium | OpenAI | 4 | 6.0 | 8/16 | 25.1s |
| #9 | GPT-5.4 medium | OpenAI | 2 | 8.0 | 12/16 | 20.1s |
| #39 | gpt-oss-120b medium | OpenAI | 4 | 5.1 | 7/16 | 16.7s |
| #3 | GPT-5.3-Codex medium | OpenAI | 2 | 8.4 | 12/16 | 16.6s |
| #14 | GLM 5 medium | Z.ai | 1 | 7.4 | 11/16 | 16.2s |
| #27 | GPT-5.2 medium | OpenAI | 3 | 6.5 | 10/16 | 15.3s |
| #50 | Qwen3 Coder Next medium | Qwen | 5 | 3.5 | 3/16 | 12.5s |
| #16 | Gemini 2.5 Flash medium | 1 | 7.4 | 11/16 | 12.4s | |
| #48 | Qwen3 Coder Next none | Qwen | 1 | 4.0 | 4/16 | 11.7s |
| #15 | GPT-5.2 Chat none | OpenAI | 1 | 7.4 | 11/16 | 7.03s |
| #19 | GPT-5.3 Chat none | OpenAI | 2 | 7.3 | 10/16 | 5.96s |
| #25 | Claude Sonnet 4.6 none | Anthropic | 1 | 6.8 | 10/16 | 5.57s |
| #42 | Qwen3.5-35B-A3B none | Qwen | 2 | 4.7 | 6/16 | 4.10s |
| #12 | Gemini 3.1 Flash Lite Preview medium | 1 | 7.5 | 11/16 | 3.83s | |
| #40 | Qwen3.5-122B-A10B none | Qwen | 1 | 5.0 | 6/16 | 3.72s |
| #37 | Qwen3.5-Flash none | Qwen | 1 | 5.2 | 7/16 | 3.54s |
| #17 | Gemini 3.1 Flash Lite Preview low | 1 | 7.3 | 11/16 | 3.36s | |
| #45 | Trinity Large Preview none | Arcee AI | 2 | 4.2 | 5/16 | 3.15s |
| #49 | GLM 4.7 Flash none | Z.ai | 2 | 3.9 | 4/16 | 2.99s |
| #54 | MiMo-V2-Flash none | Xiaomi | 1 | 2.9 | 3/16 | 2.97s |
| #36 | Mercury 2 medium | Inception | 4 | 5.3 | 7/16 | 2.36s |
| #47 | GPT-4o-mini none | OpenAI | 1 | 4.0 | 4/16 | 2.07s |
| #53 | Grok 4.1 Fast none | X AI | 2 | 2.9 | 3/16 | 1.90s |
| #41 | Qwen3.5-27B none | Qwen | 2 | 4.9 | 5/16 | 1.75s |
| #44 | GPT-5.4 none | OpenAI | 1 | 4.5 | 6/16 | 1.48s |
| #22 | Gemini 3.1 Flash Lite Preview none | 2 | 7.1 | 10/16 | 1.33s | |
| #38 | Gemini 2.5 Flash none | 1 | 5.2 | 6/16 | 923ms | |
| #55 | LFM2-24B-A2B none | Liquid | 2 | 2.6 | 1/16 | 811ms |
| #51 | Mercury 2 none | Inception | 1 | 3.4 | 4/16 | 596ms |