AI BENCHY
Advertise here

AI BENCHY Category

Puzzle Solving Ranking

See which AI models perform best on Puzzle Solving, which ones stay reliable, and where the biggest gaps appear.

Models Shown

12

Average Puzzle Solving Score

6.7

Rank Model Company Puzzle Solving Score Score Tests Correct Response Time (avg)
#139 DeepSeek V4 Flash none DeepSeek 3.1 5.0 0/3 23.7s
#105 Nemotron 3 Super medium NVIDIA 3.0 5.8 0/3 3.15s
#135 Kimi K2.5 none Moonshot AI 3.0 5.2 0/3 4.04s
#140 Qwen3 Coder Next none Qwen 3.0 4.9 0/3 24.3s
#145 Laguna M.1 none Poolside 3.0 4.8 0/3 891ms
#150 Qwen3 Coder Next medium Qwen 3.0 4.6 0/3 1.25s
#157 Grok 4.1 Fast none X AI 3.0 4.4 0/3 1.10s
#161 Qwen3.5-9B medium Qwen 3.0 4.2 0/3 32.3s
#162 Nemotron 3 Nano Omni 30b A3b Reasoning none NVIDIA 3.0 4.1 0/3 532ms
#138 Ling-2.6-flash none Inclusionai 2.9 5.0 0/3 6.51s
#149 Nemotron 3 Nano Omni 30b A3b Reasoning medium NVIDIA 2.9 4.6 0/3 1.40s
#158 GLM 4.7 Flash medium Z.ai 2.9 4.4 0/3 12.9s

Top Models by Puzzle Solving Score

Puzzle Solving Score vs Total Cost

Top Models by Response Time (avg)