AI BENCHY
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Category

Coding Ranking

See which AI models perform best on Coding, which ones stay reliable, and where the biggest gaps appear.

Models Shown

15

Average Coding Score

6.1

Rank Model Company Coding Score Score Tests Correct Response Time (avg)
#31 Grok 4.3 medium X AI 7.4 7.8 1/2 55.3s
#28 GLM 5 Turbo medium Z.ai 7.3 7.9 1/2 53.9s
#12 Gemini 3 Flash Preview low Google 7.3 8.6 1/2 6.66s
#103 Qwen3.5-27B none Qwen 7.3 5.8 1/2 1.98s
#90 Mercury 2 medium Inception 7.2 6.3 1/2 2.29s
#63 Claude Opus 4.6 medium Anthropic 7.2 7.2 1/2 29.4s
#21 Seed-2.0-Lite medium Bytedance Seed 7.0 8.1 1/2 107.7s
#40 MiMo-V2.5-Pro medium Xiaomi 7.0 7.6 1/2 81.7s
#106 Owl Alpha none Openrouter 7.0 5.7 1/2 39.7s
#4 Gemini 3.1 Pro Preview medium Google 7.0 9.3 1/2 54.3s
#24 Gemini 3.5 Flash minimal Google 7.0 7.9 1/2 3.39s
#25 Qwen3.5-27B medium Qwen 7.0 7.9 1/2 123.9s
#52 GPT-5.3 Chat none OpenAI 6.9 7.4 1/2 10.5s
#53 MiMo-V2.5 medium Xiaomi 6.9 7.4 1/2 64.5s
#46 Claude Sonnet 4.6 medium Anthropic 6.9 7.6 1/2 33.9s

Top Models by Coding Score

Coding Score vs Total Cost

Top Models by Response Time (avg)