AI BENCHY
Advertise here

AI BENCHY Category

Puzzle Solving Ranking

See which AI models perform best on Puzzle Solving, which ones stay reliable, and where the biggest gaps appear. Sort by: Response Time (avg) ↓.

Models Shown

15

Average Puzzle Solving Score

6.7

Rank Model Company Puzzle Solving Score Score Tests Correct Response Time (avg)
#21 GPT-5.4 medium OpenAI 8.2 8.0 2/3 9.14s
#5 Qwen3.7 Max medium Qwen 10.0 9.1 3/3 8.84s
#126 gpt-oss-120b none OpenAI 6.0 5.4 1/3 8.21s
#93 Qwen3.6 Plus Preview medium Qwen 5.3 6.3 1/3 7.52s
#89 Hy3 preview low Tencent 5.3 6.4 1/3 7.51s
#86 Grok 4.1 Fast medium X AI 5.3 6.5 1/3 7.40s
#62 Step 3.5 Flash medium Stepfun 5.3 7.2 1/3 7.22s
#133 DeepSeek V3.2 none DeepSeek 7.6 5.2 2/3 6.91s
#4 Gemini 3.1 Pro Preview medium Google 10.0 9.4 3/3 6.90s
#9 GPT-5.5 medium OpenAI 10.0 8.8 3/3 6.76s
#138 Ling-2.6-flash none Inclusionai 2.9 5.0 0/3 6.51s
#26 Qwen3.6 Plus medium Qwen 10.0 7.9 3/3 6.34s
#39 Qwen3.6 Flash medium Qwen 8.2 7.5 2/3 6.29s
#65 Grok 4.20 medium X AI 7.7 7.1 2/3 6.22s
#22 Step 3.7 Flash medium Stepfun 5.7 8.0 1/3 6.19s

Top Models by Puzzle Solving Score

Puzzle Solving Score vs Total Cost

Top Models by Response Time (avg)