Anti-AI Tricks x Extra formatting Ranking

See which AI models are most likely to hit Extra formatting on Anti-AI Tricks, so you can spot weak points faster. Sort by: Failure Count ↑.

Models Shown

Total Failures

Most Affected Model

Claude Sonnet 4.6 1

Failure Reasons

Wrong answer293 Did not follow instructions33 Extra formatting20 API error14 No answer4 Timed out4

Categories

Anti-AI Tricks20 Coding18 Domain specific17 Puzzle Solving8 Data parsing and extraction6 Instructions following3 Combined1

14/14

Rank	Model	Company	Extra formatting Count	Category Score	Total Cost	Tests Correct	Response Time (avg)
#40	Claude Sonnet 4.6 medium	Anthropic	1	6.5	$2.057	2/4	2.98s
Total Tests 4 Wrong Tests 2 Total Cost $2.057 Response Time (avg) 2.98s
#48	Grok Build 0.1 medium	X AI	1	8.3	$1.097	3/4	7.43s
Total Tests 4 Wrong Tests 1 Total Cost $1.097 Response Time (avg) 7.43s
#58	Qwen3.5-27B medium	Qwen	1	8.7	$1.627	3/4	19.8s
Total Tests 4 Wrong Tests 1 Total Cost $1.627 Response Time (avg) 19.8s
#82	DeepSeek V4 Pro none	DeepSeek	1	3.2	$0.096	0/4	4.02s
Total Tests 4 Wrong Tests 4 Total Cost $0.096 Response Time (avg) 4.02s
#113	MiMo-V2-Flash medium	Xiaomi	1	8.1	$0.043	3/4	15.8s
Total Tests 4 Wrong Tests 1 Total Cost $0.043 Response Time (avg) 15.8s
#137	North Mini Code medium	Cohere	1	8.4	$0.000	3/4	64.8s
Total Tests 4 Wrong Tests 1 Total Cost $0.000 Response Time (avg) 64.8s
#166	Qwen3 Coder Next none	Qwen	1	3.6	$0.025	0/4	3.31s
Total Tests 4 Wrong Tests 4 Total Cost $0.025 Response Time (avg) 3.31s
#181	Grok 4.20 Multi Agent Beta medium	X AI	1	6.9	$5.599	2/4	3.46s
Total Tests 4 Wrong Tests 2 Total Cost $5.599 Response Time (avg) 3.46s
#43	Claude Opus 4.6 medium	Anthropic	2	6.4	$3.059	2/4	7.45s
Total Tests 4 Wrong Tests 2 Total Cost $3.059 Response Time (avg) 7.45s
#63	Claude Sonnet 4.6 none	Anthropic	2	4.8	$0.661	1/4	2.94s
Total Tests 4 Wrong Tests 3 Total Cost $0.661 Response Time (avg) 2.94s
#66	Claude Opus 4.8 none	Anthropic	2	6.5	$1.166	2/4	3.40s
Total Tests 4 Wrong Tests 2 Total Cost $1.166 Response Time (avg) 3.40s
#112	Claude Sonnet 5 none	Anthropic	2	5.3	$0.548	1/4	3.60s
Total Tests 4 Wrong Tests 3 Total Cost $0.548 Response Time (avg) 3.60s
#171	North Mini Code none	Cohere	2	3.0	$0.000	0/4	22.5s
Total Tests 4 Wrong Tests 4 Total Cost $0.000 Response Time (avg) 22.5s
#173	DeepSeek V3.2 none	DeepSeek	2	3.2	$0.054	0/4	9.35s
Total Tests 4 Wrong Tests 4 Total Cost $0.054 Response Time (avg) 9.35s

Filter models

Top Models by Extra formatting Count

Extra formatting Count vs Score

Top Models by Response Time (avg)

Top Models by Estimated Wasted Cost

Anti-AI Tricks: Extra formatting

Filter models

Top Models by Extra formatting Count

Extra formatting Count vs Score

Top Models by Response Time (avg)

Top Models by Estimated Wasted Cost