Anti-AI Tricks x Wrong answer Ranking

See which AI models are most likely to hit Wrong answer on Anti-AI Tricks, so you can spot weak points faster. Sort by: Response Time (avg) ↑.

Models Shown

Total Failures

293

Most Affected Model

Mistral Small 4 4

Failure Reasons

Wrong answer293 Did not follow instructions33 Extra formatting20 API error14 No answer4 Timed out4

Categories

Domain specific412 Anti-AI Tricks293 Coding252 Puzzle Solving201 Trivia168 Combined68 Instructions following61 General Intelligence59 Data parsing and extraction41 Tool Calling3

140/140

Rank	Model	Company	Wrong answer Count	Category Score	Total Cost	Tests Correct	Response Time (avg)
#132	GPT-5.6 Terra none	OpenAI	3	4.8	$0.349	1/4	942ms
Total Tests 4 Wrong Tests 3 Total Cost $0.349 Response Time (avg) 942ms
#193	Elephant Alpha none	Openrouter	1	6.6	$0.000	2/4	963ms
Total Tests 4 Wrong Tests 2 Total Cost $0.000 Response Time (avg) 963ms
#106	Gemini 3.1 Flash Lite Preview none	Google	1	7.5	$0.052	2/4	1.04s
Total Tests 4 Wrong Tests 2 Total Cost $0.052 Response Time (avg) 1.04s
#122	Gemini 3.1 Flash Lite none	Google	2	7.5	$0.046	2/4	1.07s
Total Tests 4 Wrong Tests 2 Total Cost $0.046 Response Time (avg) 1.07s
#203	Grok 4.1 Fast none	X AI	3	3.2	$0.008	0/4	1.07s
Total Tests 4 Wrong Tests 4 Total Cost $0.008 Response Time (avg) 1.07s
#59	Qwen3.7 Max none	Qwen	2	6.5	$0.197	2/4	1.08s
Total Tests 4 Wrong Tests 2 Total Cost $0.197 Response Time (avg) 1.08s
#120	Gemini 3.1 Flash Lite minimal	Google	1	8.3	$0.047	3/4	1.10s
Total Tests 4 Wrong Tests 1 Total Cost $0.047 Response Time (avg) 1.10s
#78	Mercury 2 medium	Inception	1	6.9	$0.093	2/4	1.12s
Total Tests 4 Wrong Tests 2 Total Cost $0.093 Response Time (avg) 1.12s
#180	GPT-5.4 Nano none	OpenAI	4	3.5	$0.041	0/4	1.18s
Total Tests 4 Wrong Tests 4 Total Cost $0.041 Response Time (avg) 1.18s
#195	Elephant Alpha medium	Openrouter	2	6.6	$0.000	2/4	1.19s
Total Tests 4 Wrong Tests 2 Total Cost $0.000 Response Time (avg) 1.19s
#200	MiMo-V2-Flash none	Xiaomi	4	3.2	$0.025	0/4	1.19s
Total Tests 4 Wrong Tests 4 Total Cost $0.025 Response Time (avg) 1.19s
#207	Nemotron 3 Nano Omni 30b A3b Reasoning medium	NVIDIA	1	6.4	$0.000	2/4	1.20s
Total Tests 4 Wrong Tests 2 Total Cost $0.000 Response Time (avg) 1.20s
#139	GPT-5.4 none	OpenAI	4	3.2	$0.397	0/4	1.21s
Total Tests 4 Wrong Tests 4 Total Cost $0.397 Response Time (avg) 1.21s
#89	Gemini 3 Flash Preview none	Google	1	8.3	$0.085	3/4	1.25s
Total Tests 4 Wrong Tests 1 Total Cost $0.085 Response Time (avg) 1.25s
#83	GPT-5.6 Sol none	OpenAI	1	8.3	$0.524	3/4	1.27s
Total Tests 4 Wrong Tests 1 Total Cost $0.524 Response Time (avg) 1.27s

Filter models

Top Models by Wrong answer Count

Wrong answer Count vs Score

Top Models by Response Time (avg)

Top Models by Estimated Wasted Cost

Anti-AI Tricks: Wrong answer

Filter models

Top Models by Wrong answer Count

Wrong answer Count vs Score

Top Models by Response Time (avg)

Top Models by Estimated Wasted Cost