AI BENCHY Compare

Trinity Large Preview vs DeepSeek: DeepSeek V3.2

Last updated at: 2026-06-03

Metric	Trinity Large Preview Trinity Large Preview none Release: 2026-01-27	DeepSeek V3.2 DeepSeek V3.2 none Release: 2025-12-01

Metric	Trinity Large Preview Trinity Large Preview none Release: 2026-01-27	DeepSeek V3.2 DeepSeek V3.2 none Release: 2025-12-01
Score	4.7	5.4
Rank	#148	#130
Reliability	10.0	10.0
Consistency	9.3	7.5
Tests Correct
Attempt pass rate	23.3%	41.7%
Flaky tests	2	6
Total Runs	60	60
Cost per result	0.017	0.296
Total Cost	$0.008	$0.017
Input Price	$0.243 / 1M	$0.229 / 1M
Output Price	$0.243 / 1M	$0.344 / 1M
Total Input Tokens	29,828	53,408
Output Tokens	2,169	11,159
Reasoning Tokens	0	0
Response Time (avg)	2.98s	14.43s
Response Time (max)	14.34s	115.89s
Response Time (total)	56.57s	288.55s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Trinity Large Preview	3.1	10.0	0.0%	0		2.07s	651	550	0
DeepSeek V3.2	3.2	8.0	8.3%	1		9.35s	494	1,073	0

Coding	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Trinity Large Preview	4.0	6.6	16.7%	1		14.34s	738	397	0
DeepSeek V3.2	3.1	5.4	16.7%	1		20.87s	4,690	4,522	0

Combined	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Trinity Large Preview	3.0	10.0	0.0%	0		8.91s	12,053	294	0
DeepSeek V3.2	6.5	10.0	0.0%	0		115.89s	29,843	2,887	0

Data parsing and extraction	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Trinity Large Preview	10.0	10.0	100.0%	0		3.26s	6,900	186	0
DeepSeek V3.2	6.3	5.8	66.7%	1		9.42s	7,890	1,710	0

Domain specific	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Trinity Large Preview	5.3	10.0	33.3%	0		877ms	738	25	0
DeepSeek V3.2	2.9	7.2	11.1%	1		4.17s	624	21	0

General Intelligence	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Trinity Large Preview	4.5	10.0	0.0%	0		873ms	498	104	0
DeepSeek V3.2	4.7	1.6	66.7%	1		9.32s	314	43	0

Instructions following	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Trinity Large Preview	3.5	10.0	0.0%	0		822ms	678	63	0
DeepSeek V3.2	10.0	10.0	100.0%	0		1.52s	627	66	0

Puzzle Solving	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Trinity Large Preview	3.6	7.7	11.1%	1		1.97s	669	265	0
DeepSeek V3.2	7.6	7.2	77.8%	1		6.91s	424	298	0

Tool Calling	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Trinity Large Preview	10.0	10.0	100.0%	0		6.67s	6,699	267	0
DeepSeek V3.2	10.0	10.0	100.0%	0		11.85s	8,319	522	0

Trivia	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Trinity Large Preview	3.0	10.0	0.0%	0		777ms	204	18	0
DeepSeek V3.2	3.0	10.0	0.0%	0		17.23s	183	17	0

Quick Compare

Switch Comparison Pair

Trinity Large PreviewnonevsQwen3 Coder Nextmedium DeepSeek V3.2nonevsMistral Small 4medium DeepSeek V3.2nonevsMiniMax M2.7medium DeepSeek V3.2nonevsMiniMax M2.5medium DeepSeek V3.2nonevsElephant Alphamedium Trinity Large PreviewnonevsGLM 4.7 Flashmedium CobuddymediumvsDeepSeek V3.2none Trinity Large PreviewnonevsQwen3.5-9Bmedium DeepSeek V3.2nonevsOwl Alphamedium DeepSeek V3.2nonevsgpt-oss-120bmediumFree Available Trinity Large PreviewnonevsElephant Alphamedium DeepSeek V3.2nonevsNemotron 3 SupermediumFree Available