AI BENCHY
Advertise here

AI BENCHY Category

Combined Ranking

See which AI models perform best on Combined, which ones stay reliable, and where the biggest gaps appear. Sort by: Metric ↑.

Models Shown

15

Average Combined Score

6.3

Rank Model Company Combined Score Score Tests Correct Response Time (avg)
#8 Claude Opus 4.7 none Anthropic 9.5 8.9 1/1 18.3s
#55 GLM 5.1 medium Z.ai 9.5 7.3 1/1 43.1s
#68 Claude Opus 4.8 none Anthropic 9.5 7.0 1/1 17.7s
#77 Claude Sonnet 4.6 none Anthropic 9.5 6.8 1/1 23.8s
#113 DeepSeek V4 Pro none DeepSeek 9.5 5.7 1/1 25.5s
#37 Gemma 4 26B A4B medium Google 9.6 7.6 1/1 73.5s
#10 Claude Opus 4.8 medium Anthropic 9.8 8.7 1/1 38.0s
#41 Nemotron 3 Ultra 550b A55b medium NVIDIA 9.8 7.5 1/1 43.9s
#64 MiMo-V2-Flash medium Xiaomi 9.8 7.2 1/1 75.7s
#70 GPT-5.4 Nano medium OpenAI 9.8 7.0 1/1 24.1s
#1 Gemini 3 Flash Preview medium Google 10.0 9.8 1/1 22.4s
#2 Gemini 3.5 Flash high Google 10.0 9.6 1/1 22.4s
#3 Gemini 3.5 Flash low Google 10.0 9.4 1/1 6.44s
#5 Qwen3.7 Max medium Qwen 10.0 9.1 1/1 19.6s
#6 GPT-5.5 low OpenAI 10.0 9.0 1/1 9.56s

Top Models by Combined Score

Combined Score vs Total Cost

Top Models by Response Time (avg)