AI BENCHY
Advertise here

AI BENCHY Category

Coding Ranking

See which AI models perform best on Coding, which ones stay reliable, and where the biggest gaps appear.

Models Shown

15

Average Coding Score

6.1

Rank Model Company Coding Score Score Tests Correct Response Time (avg)
#141 Qwen3 Coder Next medium Qwen 4.1 4.7 0/2 1.17s
#79 Kimi K2.5 medium Moonshot AI 4.1 6.7 0/2 215.9s
#77 Grok 4.20 medium X AI 4.1 6.7 0/2 65.1s
#122 Elephant Alpha medium Openrouter 4.0 5.4 0/2 1.30s
#124 Qwen3.5-122B-A10B none Qwen 4.0 5.4 0/2 2.14s
#135 Mistral Small 4 none Mistral 4.0 5.0 0/2 1.03s
#71 DeepSeek V3.2 medium DeepSeek 3.9 7.0 0/2 185.0s
#111 gpt-oss-120b medium OpenAI 3.9 5.6 0/2 47.2s
#23 Gemma 4 31B medium Google 3.8 8.0 0/2 110.9s
#143 Mercury 2 none Inception 3.5 4.6 0/2 831ms
#119 MiniMax M2.5 medium Minimax 3.5 5.4 0/2 125.8s
#120 Grok 4.20 none X AI 3.4 5.4 0/1 1.22s
#72 MiMo-V2-Omni medium Xiaomi 3.4 6.9 0/2 183.9s
#134 Nemotron 3 Super none NVIDIA 3.4 5.0 0/2 3.02s
#148 GLM 4.7 Flash medium Z.ai 3.4 4.5 0/2 55.3s

Top Models by Coding Score

Coding Score vs Total Cost

Top Models by Response Time (avg)