Anti-AI Tricks Model Ranking

See which AI models perform best on Anti-AI Tricks, which ones stay reliable, and where the biggest gaps appear. Sort by: Metric ↑.

Models Shown

Average Anti-AI Tricks Score

7.2

Best Model

LFM2-24B-A2B 2.5

Failure Reasons

With failure reason Wrong answer293 With failure reason Did not follow instructions33 With failure reason Extra formatting20 With failure reason API error14 With failure reason No answer4 With failure reason Timed out4

216/216

Rank	Model	Company	Anti-AI Tricks Score	Score	Total Cost	Tests Correct	Response Time (avg)
#182	GLM 4.7 Flash none	Z.ai	5.2	4.9	$0.016	1/4	5.51s
Total Tests 4 Wrong Tests 3 Total Cost $0.016 Response Time (avg) 5.51s
#188	KAT-Coder-Air V2.5 none	Kwaipilot	5.3	4.8	$0.067	1/4	2.68s
Total Tests 4 Wrong Tests 3 Total Cost $0.067 Response Time (avg) 2.68s
#166	Laguna XS 2.1 none	Poolside	5.3	5.3	$0.008	1/4	755ms
Total Tests 4 Wrong Tests 3 Total Cost $0.008 Response Time (avg) 755ms
#118	Claude Sonnet 5 none	Anthropic	5.3	6.3	$0.548	1/4	3.60s
Total Tests 4 Wrong Tests 3 Total Cost $0.548 Response Time (avg) 3.60s
#51	MiniMax M3 medium	Minimax	5.5	7.6	$0.286	1/4	14.9s
Total Tests 4 Wrong Tests 3 Total Cost $0.286 Response Time (avg) 14.9s
#173	Mistral Small 4 medium	Mistral	5.6	5.1	$0.096	1/4	2.67s
Total Tests 4 Wrong Tests 3 Total Cost $0.096 Response Time (avg) 2.67s
#50	DeepSeek V4 Pro high	DeepSeek	5.7	7.7	$0.200	1/4	25.7s
Total Tests 4 Wrong Tests 3 Total Cost $0.200 Response Time (avg) 25.7s
#47	Claude Opus 4.6 medium	Anthropic	6.4	7.7	$3.059	2/4	7.45s
Total Tests 4 Wrong Tests 2 Total Cost $3.059 Response Time (avg) 7.45s
#141	Hy3 preview high	Tencent	6.4	5.9	$0.048	2/4	15.1s
Total Tests 4 Wrong Tests 2 Total Cost $0.048 Response Time (avg) 15.1s
#213	Nemotron 3 Nano Omni 30b A3b Reasoning medium	NVIDIA	6.4	3.4	$0.000	2/4	1.20s
Total Tests 4 Wrong Tests 2 Total Cost $0.000 Response Time (avg) 1.20s
#134	GPT-5 Nano medium	OpenAI	6.5	6.1	$0.114	2/4	25.5s
Total Tests 4 Wrong Tests 2 Total Cost $0.114 Response Time (avg) 25.5s
#44	Claude Sonnet 4.6 medium	Anthropic	6.5	7.8	$2.057	2/4	2.98s
Total Tests 4 Wrong Tests 2 Total Cost $2.057 Response Time (avg) 2.98s
#63	Qwen3.7 Max none	Qwen	6.5	7.4	$0.197	2/4	1.08s
Total Tests 4 Wrong Tests 2 Total Cost $0.197 Response Time (avg) 1.08s
#70	Claude Opus 4.8 none	Anthropic	6.5	7.3	$1.166	2/4	3.40s
Total Tests 4 Wrong Tests 2 Total Cost $1.166 Response Time (avg) 3.40s
#75	Qwen3.7 Plus none	Qwen	6.5	7.2	$0.106	2/4	1.38s
Total Tests 4 Wrong Tests 2 Total Cost $0.106 Response Time (avg) 1.38s

Anti-AI Tricks Ranking

Filter models

Top Models by Anti-AI Tricks Score

Anti-AI Tricks Score vs Total Cost

Top Models by Response Time (avg)