Domain specific Model Ranking

See which AI models perform best on Domain specific, which ones stay reliable, and where the biggest gaps appear.

Models Shown

Average Domain specific Score

4.7

Best Model

Failure Reasons

With failure reason Wrong answer404 With failure reason Timed out39 With failure reason Extra formatting17 With failure reason No answer8 With failure reason API error7 With failure reason Did not follow instructions1

206/206

Rank	Model	Company	Domain specific Score	Score	Total Cost	Tests Correct	Response Time (avg)
#1	Gemini 3 Flash Preview medium	Google	10.0	9.6	$0.742	3/3	15.3s
Total Tests 3 Wrong Tests 0 Total Cost $0.742 Response Time (avg) 15.3s
#87	Gemini 3.5 Flash minimal	Google	10.0	6.8	$0.300	3/3	899ms
Total Tests 3 Wrong Tests 0 Total Cost $0.300 Response Time (avg) 899ms
#7	Gemini 3.1 Pro Preview medium	Google	7.7	9.2	$1.361	2/3	32.7s
Total Tests 3 Wrong Tests 1 Total Cost $1.361 Response Time (avg) 32.7s
#9	Gemini 3.5 Flash medium	Google	7.7	9.1	$0.642	2/3	5.24s
Total Tests 3 Wrong Tests 1 Total Cost $0.642 Response Time (avg) 5.24s
#11	Gemini 3.5 Flash low	Google	7.7	8.9	$0.433	2/3	3.39s
Total Tests 3 Wrong Tests 1 Total Cost $0.433 Response Time (avg) 3.39s
#15	Claude Opus 4.7 medium	Anthropic	7.7	8.7	$1.477	2/3	1.17s
Total Tests 3 Wrong Tests 1 Total Cost $1.477 Response Time (avg) 1.17s
#23	Claude Sonnet 5 medium	Anthropic	7.7	8.3	$0.922	2/3	20.4s
Total Tests 3 Wrong Tests 1 Total Cost $0.922 Response Time (avg) 20.4s
#28	Inkling high	Thinkingmachines	7.7	8.0	$1.006	2/3	186.4s
Total Tests 3 Wrong Tests 1 Total Cost $1.006 Response Time (avg) 186.4s
#29	Step 3.7 Flash medium	Stepfun	7.7	8.0	$0.515	2/3	48.3s
Total Tests 3 Wrong Tests 1 Total Cost $0.515 Response Time (avg) 48.3s
#44	GPT-5.6 Luna high	OpenAI	7.7	7.7	$1.017	2/3	79.0s
Total Tests 3 Wrong Tests 1 Total Cost $1.017 Response Time (avg) 79.0s
#59	Qwen3.7 Max none	Qwen	7.7	7.4	$0.197	2/3	975ms
Total Tests 3 Wrong Tests 1 Total Cost $0.197 Response Time (avg) 975ms
#62	Claude Sonnet 4.6 none	Anthropic	7.7	7.3	$0.661	2/3	3.54s
Total Tests 3 Wrong Tests 1 Total Cost $0.661 Response Time (avg) 3.54s
#88	Gemini 3 Flash Preview none	Google	7.7	6.8	$0.085	2/3	963ms
Total Tests 3 Wrong Tests 1 Total Cost $0.085 Response Time (avg) 963ms
#92	Claude Opus 4.7 none	Anthropic	7.7	6.6	$0.505	2/3	1.19s
Total Tests 3 Wrong Tests 1 Total Cost $0.505 Response Time (avg) 1.19s
#95	Qwen3.6 Max Preview none	Qwen	7.7	6.6	$0.231	2/3	1.22s
Total Tests 3 Wrong Tests 1 Total Cost $0.231 Response Time (avg) 1.22s

Domain specific Ranking

Filter models

Top Models by Domain specific Score

Domain specific Score vs Total Cost

Top Models by Response Time (avg)