AI BENCHY
Your ad here

AI BENCHY Category Failures

General Intelligence: Wrong answer

General Intelligence
Wrong answer

See which AI models are most likely to hit Wrong answer on General Intelligence, so you can spot weak points faster.

Models Shown

10

Total Failures

10

Most Affected Model

GLM 5 Turbo 1
Rank Model Company Wrong answer Count Category Score Tests Correct Response Time (avg)
#18 GLM 5 Turbo medium Z.ai 1 6.1 0/1 10.1s
#49 Qwen3.5 Plus 2026-02-15 none Qwen 1 4.4 0/1 2.26s
#62 Gemini 2.5 Flash none Google 1 5.0 0/1 615ms
#66 GPT-5.4 none OpenAI 1 4.4 0/1 1.78s
#74 GLM 4.7 Flash none Z.ai 1 4.0 0/1 1.59s
#75 GLM 5.1 none Z.ai 1 5.0 0/1 790ms
#82 Grok 4.20 none X AI 1 4.8 0/1 659ms
#83 Mistral Small 4 none Mistral 1 4.0 0/1 729ms
#89 GPT-4o-mini none OpenAI 1 4.0 0/1 909ms
#93 GLM 4.7 Flash medium Z.ai 1 3.6 0/1 18.1s

Top Models by Wrong answer Count

Wrong answer Count vs Score

Top Models by Response Time (avg)

Top Models by Estimated Wasted Cost