AI BENCHY
Advertise here

AI BENCHY Category

Coding Ranking

See which AI models perform best on Coding, which ones stay reliable, and where the biggest gaps appear.

Models Shown

15

Average Coding Score

6.1

Rank Model Company Coding Score Score Tests Correct Response Time (avg)
#42 Qwen3.5 Plus 2026-04-20 medium Qwen 5.4 7.6 1/2 137.5s
#132 Qwen3 Coder Next none Qwen 5.4 5.1 0/2 2.01s
#150 Grok 4.1 Fast none X AI 5.3 4.4 0/1 1.79s
#45 Grok Build 0.1 medium X AI 5.3 7.6 0/2 67.4s
#153 Granite 4.1 8B none IBM Granite 5.2 4.1 0/2 706ms
#59 Qwen3.6 Flash medium Qwen 5.1 7.4 0/2 51.9s
#121 Mistral Small 4 medium Mistral 5.1 5.4 0/2 44.8s
#93 MiMo-V2-Omni none Xiaomi 5.1 6.2 0/2 2.75s
#115 MiMo-V2.5-Pro none Xiaomi 5.0 5.6 0/2 1.80s
#109 GLM 4.7 Flash none Z.ai 5.0 5.6 0/2 3.35s
#140 Trinity Large Preview none Arcee AI 4.9 4.8 0/1 14.3s
#88 Qwen3.5 Plus 2026-02-15 none Qwen 4.9 6.4 0/2 2.54s
#149 MiMo-V2-Flash none Xiaomi 4.9 4.4 0/2 2.04s
#131 DeepSeek V4 Flash none DeepSeek 4.8 5.1 0/2 24.5s
#130 Elephant Alpha none Openrouter 4.7 5.2 0/2 1.39s

Top Models by Coding Score

Coding Score vs Total Cost

Top Models by Response Time (avg)