General Intelligence Model Ranking

AI BENCHY Category

See which AI models perform best on General Intelligence, which ones stay reliable, and where the biggest gaps appear.

Models Shown

Average General Intelligence Score

5.9

Best Model

Failure Reasons

With failure reason Did not follow instructions74 With failure reason Wrong answer32 With failure reason API error12 With failure reason Timed out4

Rank	Model	Company	General Intelligence Score	Score	Tests Correct	Response Time (avg)
#60	Kimi K2.6 medium	Moonshot AI	10.0	7.2	1/1	17.8s
#68	Claude Opus 4.8 none	Anthropic	10.0	7.0	1/1	3.48s
#69	Claude Opus 4.6 medium	Anthropic	10.0	7.0	1/1	5.04s
#85	Gemma 4 31B none	Google	10.0	6.5	1/1	2.09s
#91	GPT-5.5 none	OpenAI	10.0	6.4	1/1	3.41s
#98	GLM 5 none	Z.ai	10.0	6.1	1/1	3.27s
#108	Qwen3.5-Flash none	Qwen	10.0	5.8	1/1	803ms
#110	Seed-2.0-Lite none	Bytedance Seed	10.0	5.8	1/1	3.45s
#128	Qwen3.6 Flash none	Qwen	10.0	5.4	1/1	947ms
#135	Kimi K2.5 none	Moonshot AI	10.0	5.2	1/1	4.00s
#140	Qwen3 Coder Next none	Qwen	10.0	4.9	1/1	1.34s
#79	Hunter Alpha medium	OpenRouter	7.0	6.7	0/1	6.44s
#19	Seed-2.0-Lite medium	Bytedance Seed	6.7	8.2	0/1	18.2s
#76	Kimi K2.5 medium	Moonshot AI	6.5	6.8	0/1	69.7s
#78	Qwen3.6 27B medium	Qwen	6.5	6.8	0/1	39.5s

General Intelligence Ranking