AI BENCHY
Advertise here

AI BENCHY Category Failures

Coding: No answer

Coding
No answer

See which AI models are most likely to hit No answer on Coding, so you can spot weak points faster.

Models Shown

15

Total Failures

18

Most Affected Model

Gemini 3 PRO Preview 1
Rank Model Company No answer Count Category Score Tests Correct Response Time (avg)
#19 Gemini 3 PRO Preview medium Google 1 3.0 0/2 0ms
#23 Gemma 4 31B medium Google 1 3.8 0/2 110.9s
#28 GLM 5 Turbo medium Z.ai 1 7.3 1/2 53.9s
#30 Qwen3.6 35B A3B medium Qwen 1 6.6 1/2 59.3s
#47 Gemma 4 26B A4B medium Google 1 2.9 0/2 258.4s
#51 GLM 5.1 medium Z.ai 1 4.7 0/2 145.6s
#54 Kimi K2.6 medium Moonshot AI 1 6.5 1/2 118.2s
#58 Step 3.5 Flash medium Stepfun 1 3.0 0/1 62.8s
#70 Qwen3.5-35B-A3B medium Qwen 1 6.5 1/2 244.5s
#72 MiMo-V2-Omni medium Xiaomi 1 3.4 0/2 183.9s
#79 Kimi K2.5 medium Moonshot AI 1 4.1 0/2 215.9s
#80 DeepSeek V4 Pro high DeepSeek 1 2.8 0/2 51.8s
#83 Qwen3.6 27B medium Qwen 1 6.6 1/2 165.4s
#122 Elephant Alpha medium Openrouter 1 4.0 0/2 1.30s
#130 Elephant Alpha none Openrouter 1 4.7 0/2 1.39s

Top Models by No answer Count

No answer Count vs Score

Top Models by Response Time (avg)

Top Models by Estimated Wasted Cost