Anti-AI Tricks Model Ranking

See which AI models perform best on Anti-AI Tricks, which ones stay reliable, and where the biggest gaps appear. Sort by: Response Time (avg) ↑.

Models Shown

Average Anti-AI Tricks Score

7.1

Best Model

Mistral Small 4 3.4

Failure Reasons

With failure reason Wrong answer293 With failure reason Did not follow instructions33 With failure reason Extra formatting20 With failure reason API error14 With failure reason No answer4 With failure reason Timed out4

210/210

Rank	Model	Company	Anti-AI Tricks Score	Score	Total Cost	Tests Correct	Response Time (avg)
#165	Mistral Small 4 none	Mistral	3.4	5.1	$0.022	0/4	395ms
Total Tests 4 Wrong Tests 4 Total Cost $0.022 Response Time (avg) 395ms
#210	LFM2-24B-A2B none	Liquid	2.5	2.2	$0.001	0/3	471ms
Total Tests 3 Wrong Tests 3 Total Cost $0.001 Response Time (avg) 471ms
#189	Mercury 2 none	Inception	3.0	4.6	$0.030	0/4	483ms
Total Tests 4 Wrong Tests 4 Total Cost $0.030 Response Time (avg) 483ms
#197	Grok 4.20 none	X AI	4.8	4.1	$0.057	1/4	501ms
Total Tests 4 Wrong Tests 3 Total Cost $0.057 Response Time (avg) 501ms
#205	Laguna Xs.2 none	Poolside	3.0	3.8	$0.004	0/4	534ms
Total Tests 4 Wrong Tests 4 Total Cost $0.004 Response Time (avg) 534ms
#118	Gemini 2.5 Flash none	Google	3.0	6.2	$0.017	0/4	582ms
Total Tests 4 Wrong Tests 4 Total Cost $0.017 Response Time (avg) 582ms
#208	Nemotron 3 Nano Omni 30b A3b Reasoning none	NVIDIA	4.8	3.2	$0.000	1/4	584ms
Total Tests 4 Wrong Tests 3 Total Cost $0.000 Response Time (avg) 584ms
#191	Grok 4.20 Beta none	X AI	4.0	4.4	$0.087	0/4	597ms
Total Tests 4 Wrong Tests 4 Total Cost $0.087 Response Time (avg) 597ms
#192	Laguna M.1 none	Poolside	3.4	4.4	$0.009	0/4	705ms
Total Tests 4 Wrong Tests 4 Total Cost $0.009 Response Time (avg) 705ms
#160	Laguna XS 2.1 none	Poolside	5.3	5.3	$0.008	1/4	755ms
Total Tests 4 Wrong Tests 3 Total Cost $0.008 Response Time (avg) 755ms
#103	Qwen3.5-27B none	Qwen	4.8	6.5	$0.090	1/4	788ms
Total Tests 4 Wrong Tests 3 Total Cost $0.090 Response Time (avg) 788ms
#201	Granite 4.1 8B none	IBM Granite	4.9	4.0	$0.007	1/4	844ms
Total Tests 4 Wrong Tests 3 Total Cost $0.007 Response Time (avg) 844ms
#88	Gemini 3.5 Flash minimal	Google	6.5	6.8	$0.300	2/4	892ms
Total Tests 4 Wrong Tests 2 Total Cost $0.300 Response Time (avg) 892ms
#159	GPT-5.6 Luna none	OpenAI	4.8	5.4	$0.142	1/4	901ms
Total Tests 4 Wrong Tests 3 Total Cost $0.142 Response Time (avg) 901ms
#136	GPT-5.4 Mini none	OpenAI	3.1	5.9	$0.095	0/4	929ms
Total Tests 4 Wrong Tests 4 Total Cost $0.095 Response Time (avg) 929ms

Anti-AI Tricks Ranking

Filter models

Top Models by Anti-AI Tricks Score

Anti-AI Tricks Score vs Total Cost

Top Models by Response Time (avg)