AI BENCHY
Advertise here

AI BENCHY Category Failures

Domain specific: Wrong answer

Domain specific
Wrong answer

See which AI models are most likely to hit Wrong answer on Domain specific, so you can spot weak points faster. Sort by: Tests Correct ↑.

Models Shown

4

Total Failures

314

Most Affected Model

Qwen3.6 Max Preview 3
Rank Model Company Wrong answer Count Category Score Tests Correct Response Time (avg)
#108 Qwen3.5-Flash none Qwen 1 7.7 2/3 905ms
#117 Qwen3.5-35B-A3B none Qwen 1 7.7 2/3 485ms
#118 Qwen3.6 27B none Qwen 1 7.7 2/3 3.03s
#122 GLM 4.7 Flash none Z.ai 1 7.7 2/3 744ms

Top Models by Wrong answer Count

Wrong answer Count vs Score

Top Models by Response Time (avg)

Top Models by Estimated Wasted Cost