AI BENCHY
Advertise here

AI BENCHY Category

Puzzle Solving Ranking

See which AI models perform best on Puzzle Solving, which ones stay reliable, and where the biggest gaps appear.

Models Shown

15

Average Puzzle Solving Score

6.7

Rank Model Company Puzzle Solving Score Score Tests Correct Response Time (avg)
#134 GLM 5 Turbo none Z.ai 5.5 5.2 1/3 2.65s
#41 Nemotron 3 Ultra 550b A55b medium NVIDIA 5.5 7.5 1/3 3.54s
#81 Mercury 2 medium Inception 5.4 6.6 1/3 949ms
#143 MiMo-V2.5 none Xiaomi 5.4 4.9 1/3 2.13s
#144 GPT-5.4 Mini none OpenAI 5.4 4.9 1/3 836ms
#148 GPT-5.4 Nano none OpenAI 5.4 4.7 1/3 1.25s
#121 Owl Alpha none Openrouter 5.4 5.5 1/3 4.18s
#76 Kimi K2.5 medium Moonshot AI 5.3 6.8 1/3 43.2s
#62 Step 3.5 Flash medium Stepfun 5.3 7.2 1/3 7.22s
#89 Hy3 preview low Tencent 5.3 6.4 1/3 7.51s
#92 Laguna M.1 medium Poolside 5.3 6.4 1/3 10.2s
#93 Qwen3.6 Plus Preview medium Qwen 5.3 6.3 1/3 7.52s
#107 Laguna Xs.2 medium Poolside 5.3 5.8 1/3 1.93s
#109 GLM 5V Turbo none Z.ai 5.3 5.8 1/3 2.40s
#127 Grok 4.20 none X AI 5.3 5.4 1/3 473ms

Top Models by Puzzle Solving Score

Puzzle Solving Score vs Total Cost

Top Models by Response Time (avg)