AI BENCHY
Your ad here

AI BENCHY Category

Instructions following Ranking

See which AI models perform best on Instructions following, which ones stay reliable, and where the biggest gaps appear. Sort by: Tests Correct ↓.

Models Shown

15

Average Instructions following Score

8.0

Rank Model Company Instructions following Score Score Tests Correct Response Time (avg)
#21 Gemini 3 Flash Preview none Google 6.4 8.1 1/2 1.58s
#25 Grok 4.20 Beta medium X AI 8.3 8.0 1/2 4.97s
#28 GPT-5.2 Chat none OpenAI 7.5 7.9 1/2 5.46s
#30 Step 3.5 Flash medium Stepfun 8.5 7.9 1/2 4.98s
#33 GLM 5.1 medium Z.ai 6.4 7.8 1/2 7.47s
#35 MiMo-V2-Omni medium Xiaomi 8.3 7.7 1/2 4.92s
#36 GPT-5.3 Chat none OpenAI 8.3 7.7 1/2 3.29s
#42 Claude Sonnet 4.6 none Anthropic 6.5 7.4 1/2 1.96s
#44 GPT-5.4 Mini medium OpenAI 7.4 7.3 1/2 2.50s
#45 GPT-5 Mini medium OpenAI 8.0 7.0 1/2 15.7s
#47 Grok 4.20 medium X AI 7.3 7.0 1/2 4.42s
#48 Gemma 4 31B none Google 6.5 6.9 1/2 2.84s
#51 Nemotron 3 Super medium NVIDIA 7.2 6.7 1/2 7.72s
#52 Grok 4.1 Fast medium X AI 6.6 6.7 1/2 5.30s
#55 MiMo-V2-Omni none Xiaomi 6.5 6.5 1/2 4.18s

Top Models by Instructions following Score

Instructions following Score vs Total Cost

Top Models by Response Time (avg)