AI BENCHY
Your ad here

AI BENCHY Category

Instructions following Ranking

See which AI models perform best on Instructions following, which ones stay reliable, and where the biggest gaps appear. Sort by: Response Time (avg) ↑.

Models Shown

15

Average Instructions following Score

8.0

Rank Model Company Instructions following Score Score Tests Correct Response Time (avg)
#9 Qwen3.6 Plus Preview medium Qwen 10.0 8.5 2/2 7.54s
#20 Qwen3.6 Plus medium Qwen 10.0 8.1 2/2 7.54s
#68 gpt-oss-120b medium OpenAI 9.9 5.8 2/2 7.63s
#87 Qwen3 Coder Next none Qwen 4.8 5.1 0/2 7.71s
#51 Nemotron 3 Super medium NVIDIA 7.2 6.7 1/2 7.72s
#59 Qwen3.5-Flash none Qwen 6.3 6.2 1/2 8.81s
#2 Gemini 3.1 Pro Preview medium Google 10.0 9.6 2/2 9.56s
#19 Qwen3.5-122B-A10B medium Qwen 10.0 8.1 2/2 9.88s
#57 GPT-5 Nano medium OpenAI 8.5 6.3 1/2 11.9s
#34 Kimi K2.6 medium Moonshot AI 10.0 7.7 2/2 12.5s
#80 MiniMax M2.7 medium Minimax 3.7 5.3 0/2 12.6s
#14 Gemma 4 31B medium Google 10.0 8.3 2/2 12.8s
#45 GPT-5 Mini medium OpenAI 8.0 7.0 1/2 15.7s
#97 Qwen3.5-9B medium Qwen 6.4 4.4 1/2 17.1s
#39 Seed-2.0-Mini medium Bytedance Seed 10.0 7.5 2/2 17.5s

Top Models by Instructions following Score

Instructions following Score vs Total Cost

Top Models by Response Time (avg)