Anti-AI Tricks x Wrong answer Ranking

See which AI models are most likely to hit Wrong answer on Anti-AI Tricks, so you can spot weak points faster. Sort by: Tests Correct ↑.

Models Shown

Total Failures

306

Most Affected Model

DeepSeek V4 Pro 2

Failure Reasons

Wrong answer306 Did not follow instructions33 Extra formatting20 API error15 No answer6 Timed out4

Categories

Domain specific433 Anti-AI Tricks306 Coding266 Puzzle Solving214 Trivia176 Combined71 General Intelligence66 Instructions following65 Data parsing and extraction41 Tool Calling4

144/144

Rank	Model	Company	Wrong answer Count	Category Score	Total Cost	Tests Correct	Response Time (avg)
#161	Kimi K2.5 none	Moonshot AI	4	3.6	$0.127	0/4	6.24s
Total Tests 4 Wrong Tests 4 Total Cost $0.127 Response Time (avg) 6.24s
#163	Mimo V2 Omni none	Xiaomi	3	3.6	$0.021	0/4	1.63s
Total Tests 4 Wrong Tests 4 Total Cost $0.021 Response Time (avg) 1.63s
#164	Laguna S 2.1 medium	Poolside	3	3.4	$0.059	0/4	51.8s
Total Tests 4 Wrong Tests 4 Total Cost $0.059 Response Time (avg) 51.8s
#167	Laguna S 2.1 high	Poolside	3	3.4	$0.127	0/4	52.2s
Total Tests 4 Wrong Tests 4 Total Cost $0.127 Response Time (avg) 52.2s
#169	Qwen3.6 35B A3B none	Qwen	4	3.6	$0.061	0/4	2.10s
Total Tests 4 Wrong Tests 4 Total Cost $0.061 Response Time (avg) 2.10s
#170	Ling-2.6-1T none	Inclusionai	4	3.4	$0.016	0/4	6.55s
Total Tests 4 Wrong Tests 4 Total Cost $0.016 Response Time (avg) 6.55s
#173	Mistral Small 4 none	Mistral	4	3.4	$0.022	0/4	395ms
Total Tests 4 Wrong Tests 4 Total Cost $0.022 Response Time (avg) 395ms
#174	Qwen3 Coder Next none	Qwen	2	3.6	$0.025	0/4	3.31s
Total Tests 4 Wrong Tests 4 Total Cost $0.025 Response Time (avg) 3.31s
#176	MiMo-V2.5 none	Xiaomi	4	3.5	$0.025	0/4	2.19s
Total Tests 4 Wrong Tests 4 Total Cost $0.025 Response Time (avg) 2.19s
#177	Qwen3.5-9B none	Qwen	4	3.1	$0.021	0/4	1.71s
Total Tests 4 Wrong Tests 4 Total Cost $0.021 Response Time (avg) 1.71s
#178	GLM 5 Turbo none	Z.ai	4	3.0	$0.047	0/4	2.84s
Total Tests 4 Wrong Tests 4 Total Cost $0.047 Response Time (avg) 2.84s
#179	North Mini Code none	Cohere	2	3.0	$0.000	0/4	22.5s
Total Tests 4 Wrong Tests 4 Total Cost $0.000 Response Time (avg) 22.5s
#181	Laguna S 2.1 low	Poolside	3	3.4	$0.091	0/4	80.7s
Total Tests 4 Wrong Tests 4 Total Cost $0.091 Response Time (avg) 80.7s
#182	DeepSeek V3.2 none	DeepSeek	1	3.2	$0.054	0/4	9.35s
Total Tests 4 Wrong Tests 4 Total Cost $0.054 Response Time (avg) 9.35s
#189	GPT-5.4 Nano none	OpenAI	4	3.5	$0.041	0/4	1.18s
Total Tests 4 Wrong Tests 4 Total Cost $0.041 Response Time (avg) 1.18s

Filter models

Top Models by Wrong answer Count

Wrong answer Count vs Score

Top Models by Response Time (avg)

Top Models by Estimated Wasted Cost

Anti-AI Tricks: Wrong answer

Filter models

Top Models by Wrong answer Count

Wrong answer Count vs Score

Top Models by Response Time (avg)

Top Models by Estimated Wasted Cost