Domain specific Model Ranking

See which AI models perform best on Domain specific, which ones stay reliable, and where the biggest gaps appear. Sort by: Response Time (avg) ↑.

Models Shown

Average Domain specific Score

4.7

Best Model

Claude Sonnet 4.6 2.9

Failure Reasons

With failure reason Wrong answer412 With failure reason Timed out43 With failure reason Extra formatting17 With failure reason No answer8 With failure reason API error7 With failure reason Did not follow instructions1

210/210

Rank	Model	Company	Domain specific Score	Score	Total Cost	Tests Correct	Response Time (avg)
#40	Claude Sonnet 4.6 medium	Anthropic	2.9	7.8	$2.057	0/3	0ms
Total Tests 3 Wrong Tests 3 Total Cost $2.057 Response Time (avg) 0ms
#42	GLM 5 medium	Z.ai	3.5	7.7	$0.307	0/3	0ms
Total Tests 3 Wrong Tests 3 Total Cost $0.307 Response Time (avg) 0ms
#80	Seed-2.0-Mini medium	Bytedance Seed	3.0	7.0	$0.101	0/3	0ms
Total Tests 3 Wrong Tests 3 Total Cost $0.101 Response Time (avg) 0ms
#210	LFM2-24B-A2B none	Liquid	5.9	2.2	$0.001	1/3	287ms
Total Tests 3 Wrong Tests 2 Total Cost $0.001 Response Time (avg) 287ms
#201	Granite 4.1 8B none	IBM Granite	3.0	4.0	$0.007	0/3	357ms
Total Tests 3 Wrong Tests 3 Total Cost $0.007 Response Time (avg) 357ms
#160	Laguna XS 2.1 none	Poolside	5.3	5.3	$0.008	1/3	364ms
Total Tests 3 Wrong Tests 2 Total Cost $0.008 Response Time (avg) 364ms
#165	Mistral Small 4 none	Mistral	5.3	5.1	$0.022	1/3	367ms
Total Tests 3 Wrong Tests 2 Total Cost $0.022 Response Time (avg) 367ms
#205	Laguna Xs.2 none	Poolside	5.3	3.8	$0.004	1/3	371ms
Total Tests 3 Wrong Tests 2 Total Cost $0.004 Response Time (avg) 371ms
#169	Qwen3.5-9B none	Qwen	3.0	5.1	$0.021	0/3	464ms
Total Tests 3 Wrong Tests 3 Total Cost $0.021 Response Time (avg) 464ms
#142	Qwen3.5-122B-A10B none	Qwen	5.3	5.7	$0.247	1/3	465ms
Total Tests 3 Wrong Tests 2 Total Cost $0.247 Response Time (avg) 465ms
#127	Qwen3.5-35B-A3B none	Qwen	7.7	6.1	$0.106	2/3	485ms
Total Tests 3 Wrong Tests 1 Total Cost $0.106 Response Time (avg) 485ms
#208	Nemotron 3 Nano Omni 30b A3b Reasoning none	NVIDIA	3.6	3.2	$0.000	0/3	489ms
Total Tests 3 Wrong Tests 3 Total Cost $0.000 Response Time (avg) 489ms
#118	Gemini 2.5 Flash none	Google	5.9	6.2	$0.017	1/3	495ms
Total Tests 3 Wrong Tests 2 Total Cost $0.017 Response Time (avg) 495ms
#189	Mercury 2 none	Inception	5.3	4.6	$0.030	1/3	534ms
Total Tests 3 Wrong Tests 2 Total Cost $0.030 Response Time (avg) 534ms
#103	Qwen3.5-27B none	Qwen	3.0	6.5	$0.090	0/3	540ms
Total Tests 3 Wrong Tests 3 Total Cost $0.090 Response Time (avg) 540ms

Domain specific Ranking

Filter models

Top Models by Domain specific Score

Domain specific Score vs Total Cost

Top Models by Response Time (avg)