AI BENCHY
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Category

Coding Ranking

See which AI models perform best on Coding, which ones stay reliable, and where the biggest gaps appear.

Models Shown

15

Average Coding Score

7.2

Rank Model Company Coding Score Score Tests Correct Response Time (avg)
#25 DeepSeek V3.2 medium DeepSeek 4.7 8.0 0/1 180.9s
#30 Qwen3.5-Flash medium Qwen 4.7 7.8 0/1 45.7s
#31 GLM 5.1 medium Z.ai 4.7 7.8 0/1 118.5s
#38 MiMo-V2-Flash medium Xiaomi 4.7 7.5 0/1 13.0s
#43 Kimi K2.5 medium Moonshot AI 4.7 7.0 0/1 150.8s
#57 Gemma 4 26B A4B none Google 4.7 6.2 0/1 7.07s
#86 Qwen3 Coder Next medium Qwen 4.7 4.7 0/1 1.69s
#78 Mistral Small 4 none Mistral 4.5 5.2 0/1 1.28s
#44 Grok 4.20 medium X AI 4.3 7.0 0/1 24.3s
#65 gpt-oss-120b medium OpenAI 4.3 5.8 0/1 26.3s
#66 Qwen3.5-122B-A10B none Qwen 4.3 5.7 0/1 3.44s
#79 gpt-oss-120b none OpenAI 4.3 5.2 0/1 9.57s
#32 MiMo-V2-Omni medium Xiaomi 4.0 7.7 0/1 68.5s
#87 GLM 4.7 Flash medium Z.ai 3.6 4.6 0/1 21.3s
#85 Mercury 2 none Inception 3.6 4.8 0/1 969ms

Top Models by Coding Score

Coding Score vs Total Cost

Top Models by Response Time (avg)