AI BENCHY Category Failures

Instructions following

Did not follow instructions

See which AI models are most likely to hit Did not follow instructions on Instructions following, so you can spot weak points faster.

Models Shown

Total Failures

Most Affected Model

Related Failure Reasons

Wrong answer26 Did not follow instructions9

Related Categories

General Intelligence32 Puzzle Solving24 Anti-AI Tricks12 Instructions following9

Rank	Model	Company	Did not follow instructions Count	Category Score	Tests Correct	Response Time (avg)
#8	Gemini 3.1 Flash Lite Preview high	Google	1	9.0	1/2	70.1s
#13	Step 3.5 Flash medium	Stepfun	1	9.0	1/2	4.98s
#30	Grok 4.1 Fast medium	X AI	1	5.5	1/2	5.30s
#32	GPT-5 Mini medium	OpenAI	1	7.5	1/2	15.7s
#34	GPT-5 Nano medium	OpenAI	1	9.0	1/2	11.9s
#43	MiniMax M2.5 medium	Minimax	1	8.0	1/2	4.64s
#45	Trinity Large Preview none	Arcee AI	1	3.5	0/2	1.09s
#47	GPT-4o-mini none	OpenAI	1	4.5	0/2	1.27s
#50	Qwen3 Coder Next medium	Qwen	1	4.5	0/2	7.34s

Top Models by Did not follow instructions Count