AI BENCHY
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Category

Combined Ranking

See which AI models perform best on Combined, which ones stay reliable, and where the biggest gaps appear.

Models Shown

15

Average Combined Score

6.2

Rank Model Company Combined Score Score Tests Correct Response Time (avg)
#56 Grok 4.20 Multi Agent Beta medium X AI 3.0 6.4 0/1 0ms
#58 GLM 5V Turbo none Z.ai 3.0 6.2 0/1 6.51s
#59 Qwen3.5-Flash none Qwen 3.0 6.2 0/1 6.22s
#60 Gemma 4 26B A4B none Google 3.0 6.2 0/1 30.5s
#61 Seed-2.0-Lite none Bytedance Seed 3.0 6.2 0/1 6.59s
#62 Gemini 2.5 Flash none Google 3.0 6.2 0/1 4.39s
#63 Qwen3.5-35B-A3B none Qwen 3.0 6.1 0/1 47.4s
#65 MiMo-V2-Pro none Xiaomi 3.0 6.0 0/1 6.58s
#66 GPT-5.4 none OpenAI 3.0 5.9 0/1 2.89s
#69 Kimi K2.6 none Moonshot AI 3.0 5.8 0/1 3.38s
#70 Qwen3.5-122B-A10B none Qwen 3.0 5.7 0/1 46.0s
#72 Hunter Alpha none OpenRouter 3.0 5.7 0/1 15.2s
#73 Mistral Small 4 medium Mistral 3.0 5.7 0/1 25.3s
#74 GLM 4.7 Flash none Z.ai 3.0 5.6 0/1 3.22s
#77 GLM 5 Turbo none Z.ai 3.0 5.5 0/1 4.89s

Top Models by Combined Score

Combined Score vs Total Cost

Top Models by Response Time (avg)