AI BENCHY
Your ad here

AI BENCHY Category

Instructions following Ranking

See which AI models perform best on Instructions following, which ones stay reliable, and where the biggest gaps appear. Sort by: Response Time (avg) ↓.

Models Shown

15

Average Instructions following Score

8.0

Best Model

Kimi K2.5 10.0
Rank Model Company Instructions following Score Score Tests Correct Response Time (avg)
#12 Gemini 3 PRO Preview medium Google 9.8 8.4 2/2 3.26s
#40 GPT-5.2 medium OpenAI 9.9 7.5 2/2 3.12s
#16 GPT-5.4 medium OpenAI 10.0 8.2 2/2 3.11s
#7 GPT-5.3-Codex medium OpenAI 10.0 8.6 2/2 3.04s
#93 GLM 4.7 Flash medium Z.ai 6.2 4.6 1/2 2.97s
#48 Gemma 4 31B none Google 6.5 6.9 1/2 2.84s
#72 Hunter Alpha none OpenRouter 6.4 5.7 1/2 2.82s
#76 Kimi K2.5 none Moonshot AI 6.5 5.5 1/2 2.67s
#15 Gemini 2.5 Flash medium Google 9.8 8.2 2/2 2.62s
#26 Claude Sonnet 4.6 medium Anthropic 10.0 8.0 2/2 2.61s
#65 MiMo-V2-Pro none Xiaomi 6.5 6.0 1/2 2.51s
#44 GPT-5.4 Mini medium OpenAI 7.4 7.3 1/2 2.50s
#37 Claude Opus 4.6 medium Anthropic 10.0 7.6 2/2 2.43s
#77 GLM 5 Turbo none Z.ai 6.5 5.5 1/2 2.13s
#58 GLM 5V Turbo none Z.ai 6.5 6.2 1/2 1.97s

Top Models by Instructions following Score

Instructions following Score vs Total Cost

Top Models by Response Time (avg)