AI BENCHY Category Failures

General Intelligence

Wrong answer

See which AI models are most likely to hit Wrong answer on General Intelligence, so you can spot weak points faster.

Models Shown

Total Failures

Most Affected Model

Related Failure Reasons

Did not follow instructions32 Wrong answer6 Timed out3

Related Categories

Domain specific98 Puzzle Solving55 Anti-AI Tricks53 Instructions following26 Combined21 Data parsing and extraction14 General Intelligence6 Tool Calling2

Rank	Model	Company	Wrong answer Count	Category Score	Tests Correct	Response Time (avg)
#29	Qwen3.5 Plus 2026-02-15 none	Qwen	1	4.0	0/1	2.26s
#38	Gemini 2.5 Flash none	Google	1	5.0	0/1	615ms
#44	GPT-5.4 none	OpenAI	1	3.0	0/1	1.78s
#47	GPT-4o-mini none	OpenAI	1	3.0	0/1	909ms
#49	GLM 4.7 Flash none	Z.ai	1	3.0	0/1	1.59s
#52	GLM 4.7 Flash medium	Z.ai	1	10.0	0/1	18.1s

Top Models by Wrong answer Count