AI BENCHY
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Category

Combined Ranking

See which AI models perform best on Combined, which ones stay reliable, and where the biggest gaps appear. Sort by: Tests Correct ↑.

Models Shown

13

Average Combined Score

6.3

Rank Model Company Combined Score Score Tests Correct Response Time (avg)
#77 Claude Sonnet 4.6 none Anthropic 9.5 6.8 1/1 23.8s
#80 Mimo V2 Omni medium Xiaomi 10.0 6.7 1/1 25.9s
#81 Mercury 2 medium Inception 10.0 6.6 1/1 3.28s
#82 Hy3 preview high Tencent 10.0 6.6 1/1 113.1s
#86 Grok 4.1 Fast medium X AI 10.0 6.5 1/1 37.6s
#88 Qwen3.7 Plus none Qwen 10.0 6.4 1/1 29.4s
#89 Hy3 preview low Tencent 10.0 6.4 1/1 78.7s
#93 Qwen3.6 Plus Preview medium Qwen 10.0 6.3 1/1 35.0s
#94 GPT-5 Nano medium OpenAI 10.0 6.3 1/1 66.0s
#99 gpt-oss-120b medium OpenAI 10.0 6.1 1/1 31.2s
#103 DeepSeek V4 Pro high DeepSeek 10.0 6.0 1/1 65.0s
#105 Nemotron 3 Super medium NVIDIA 10.0 5.8 1/1 87.8s
#113 DeepSeek V4 Pro none DeepSeek 9.5 5.7 1/1 25.5s

Top Models by Combined Score

Combined Score vs Total Cost

Top Models by Response Time (avg)