AI BENCHY
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Category

Combined Ranking

See which AI models perform best on Combined, which ones stay reliable, and where the biggest gaps appear. Sort by: Tests Correct ↑.

Models Shown

15

Average Combined Score

6.3

Rank Model Company Combined Score Score Tests Correct Response Time (avg)
#16 Gemini 3 Flash Preview low Google 3.0 8.4 0/1 3.27s
#20 Gemini 3.5 Flash none Google 3.0 8.1 0/1 0ms
#27 Gemma 4 31B medium Google 3.0 7.8 0/1 0ms
#32 Gemini 3.5 Flash minimal Google 3.0 7.7 0/1 3.56s
#34 Qwen3.7 Max none Qwen 3.0 7.7 0/1 2.17s
#35 Gemini 3 PRO Preview medium Google 3.0 7.6 0/1 10.4s
#46 Qwen3.6 35B A3B medium Qwen 3.0 7.4 0/1 0ms
#48 Gemini 3 Flash Preview none Google 4.7 7.4 0/1 3.56s
#50 Gemini 3.1 Flash Lite Preview low Google 3.0 7.4 0/1 11.9s
#51 Mimo V2 PRO medium Xiaomi 4.7 7.4 0/1 64.7s
#58 Gemini 3.1 Flash Lite Preview none Google 3.0 7.2 0/1 3.20s
#59 GLM 5V Turbo medium Z.ai 6.9 7.2 0/1 15.1s
#61 Gemini 3.1 Flash Lite low Google 3.0 7.2 0/1 4.48s
#66 Qwen3.5-35B-A3B medium Qwen 4.7 7.1 0/1 75.3s
#74 Qwen3.6 Max Preview none Qwen 3.0 6.9 0/1 20.5s

Top Models by Combined Score

Combined Score vs Total Cost

Top Models by Response Time (avg)