AI BENCHY
Advertise here

AI BENCHY Category

Coding Ranking

See which AI models perform best on Coding, which ones stay reliable, and where the biggest gaps appear.

Models Shown

15

Average Coding Score

6.1

Rank Model Company Coding Score Score Tests Correct Response Time (avg)
#69 Claude Sonnet 4.6 none Anthropic 6.8 7.0 1/2 6.73s
#76 Gemma 4 31B none Google 6.8 6.7 1/2 14.8s
#78 Gemini 3.1 Flash Lite minimal Google 6.8 6.7 1/2 951ms
#85 Gemini 3.1 Flash Lite none Google 6.8 6.6 1/2 1.13s
#86 GPT-5.5 none OpenAI 6.8 6.5 1/2 1.52s
#92 Gemini 2.5 Flash none Google 6.8 6.2 1/2 810ms
#97 Qwen3.5-Flash none Qwen 6.8 5.9 1/2 993ms
#98 GLM 5V Turbo none Z.ai 6.8 5.9 1/2 3.77s
#102 Qwen3.5-35B-A3B none Qwen 6.8 5.8 1/2 1.72s
#104 Qwen3.6 27B none Qwen 6.8 5.8 1/2 5.75s
#107 MiMo-V2-Pro none Xiaomi 6.8 5.7 1/2 2.65s
#112 GPT-5.4 none OpenAI 6.8 5.6 1/2 1.99s
#126 Kimi K2.5 none Moonshot AI 6.8 5.3 1/2 36.0s
#136 GPT-5.4 Mini none OpenAI 6.8 4.9 1/2 1.01s
#137 Qwen3.6 35B A3B none Qwen 6.8 4.9 1/2 12.3s

Top Models by Coding Score

Coding Score vs Total Cost

Top Models by Response Time (avg)