AI BENCHY
Advertise here

AI BENCHY Category

Puzzle Solving Ranking

See which AI models perform best on Puzzle Solving, which ones stay reliable, and where the biggest gaps appear.

Models Shown

15

Average Puzzle Solving Score

6.7

Rank Model Company Puzzle Solving Score Score Tests Correct Response Time (avg)
#27 Gemma 4 31B medium Google 9.9 7.8 3/3 26.9s
#15 GPT-5.3-Codex medium OpenAI 9.0 8.4 2/3 5.05s
#19 Seed-2.0-Lite medium Bytedance Seed 9.0 8.2 2/3 10.2s
#23 GLM 5 Turbo medium Z.ai 8.7 8.0 2/3 5.23s
#21 GPT-5.4 medium OpenAI 8.2 8.0 2/3 9.14s
#31 DeepSeek V4 Flash high DeepSeek 8.2 7.7 2/3 26.1s
#36 Qwen3.5 Plus 2026-04-20 medium Qwen 8.2 7.6 2/3 17.7s
#39 Qwen3.6 Flash medium Qwen 8.2 7.5 2/3 6.29s
#49 Qwen3.5-Flash medium Qwen 8.2 7.4 2/3 27.6s
#55 GLM 5.1 medium Z.ai 8.2 7.3 2/3 31.6s
#56 MiMo-V2.5 medium Xiaomi 8.2 7.3 2/3 20.3s
#66 Qwen3.5-35B-A3B medium Qwen 8.2 7.1 2/3 33.1s
#73 Seed-2.0-Mini medium Bytedance Seed 8.2 6.9 2/3 31.8s
#30 Qwen3.5-27B medium Qwen 8.2 7.8 2/3 59.6s
#46 Qwen3.6 35B A3B medium Qwen 8.0 7.4 2/3 5.95s

Top Models by Puzzle Solving Score

Puzzle Solving Score vs Total Cost

Top Models by Response Time (avg)