AI BENCHY Compare

Mistral: Mistral Small 4 vs Elephant Alpha

Last updated at: 2026-05-29

Metric	Mistral Small 4 Mistral Small 4 medium Release: 2026-03-16	Elephant Alpha Elephant Alpha none Release: 2026-04-14

Metric	Mistral Small 4 Mistral Small 4 medium Release: 2026-03-16	Elephant Alpha Elephant Alpha none Release: 2026-04-14
Score	5.4	5.2
Rank	#126	#136
Reliability	10.0	N/A
Consistency	7.1	9.6
Tests Correct
Attempt pass rate	45.0%	29.8%
Flaky tests	7	1
Total Runs	60	60
Cost per result	1.112	0.000
Total Cost	$0.056	$0.000
Input Price	$0.150 / 1M	$0.000 / 1M
Output Price	$0.600 / 1M	$0.000 / 1M
Output Tokens	21,871	2,573
Reasoning Tokens	68,349	0
Response Time (avg)	8.35s	1.22s
Response Time (max)	59.15s	3.81s
Response Time (total)	167.08s	22.03s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Output Tokens	Reasoning Tokens
Mistral Small 4	5.6	3.8	66.7%	3		2.67s	4,055	4,778
Elephant Alpha	6.6	10.0	50.0%	0		963ms	610	0

Coding	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Output Tokens	Reasoning Tokens
Mistral Small 4	5.1	6.8	33.3%	1		44.82s	9,322	38,386
Elephant Alpha	4.7	6.7	33.3%	1		1.39s	375	0

Combined	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Output Tokens	Reasoning Tokens
Mistral Small 4	3.0	10.0	0.0%	0		25.25s	2,612	10,700
Elephant Alpha	3.0	10.0	0.0%	0		3.81s	731	0

Data parsing and extraction	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Output Tokens	Reasoning Tokens
Mistral Small 4	7.3	5.9	83.3%	1		1.23s	335	723
Elephant Alpha	6.5	10.0	50.0%	0		1.04s	246	0

Domain specific	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Output Tokens	Reasoning Tokens
Mistral Small 4	5.3	7.2	44.4%	1		6.11s	2,621	6,904
Elephant Alpha	3.0	10.0	0.0%	0		927ms	24	0

General Intelligence	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Output Tokens	Reasoning Tokens
Mistral Small 4	4.8	10.0	0.0%	0		2.05s	821	828
Elephant Alpha	4.0	10.0	0.0%	0		854ms	106	0

Instructions following	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Output Tokens	Reasoning Tokens
Mistral Small 4	7.3	5.8	83.3%	1		1.38s	540	1,031
Elephant Alpha	9.8	10.0	100.0%	0		1.03s	81	0

Puzzle Solving	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Output Tokens	Reasoning Tokens
Mistral Small 4	3.4	9.7	0.0%	0		2.17s	1,226	2,632
Elephant Alpha	4.2	10.0	0.0%	0		807ms	170	0

Tool Calling	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Output Tokens	Reasoning Tokens
Mistral Small 4	10.0	10.0	100.0%	0		3.50s	321	810
Elephant Alpha	3.0	10.0	0.0%	0		2.79s	230	0

Trivia	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Output Tokens	Reasoning Tokens
Mistral Small 4	3.0	10.0	0.0%	0		5.92s	18	1,557
Elephant Alpha	0.0	0.0	0.0%	0		0ms	0	0

Quick Compare

Switch Comparison Pair

Mistral Small 4mediumvsGrok 4.20none Mistral Small 4mediumvsgpt-oss-120bnoneFree Available Mistral Small 4mediumvsQwen3.5-122B-A10Bnone Mistral Small 4mediumvsGLM 5 Turbonone Mistral Small 4mediumvsKimi K2.5none Ling-2.6-flashnonevsMistral Small 4medium Mistral Small 4mediumvsQwen3.6 Flashnone MiniMax M2.7mediumvsElephant Alphanone Mistral Small 4mediumvsMiMo-V2.5-Pronone Mistral Small 4mediumvsGPT-5.4none Mistral Small 4mediumvsKimi K2.6noneFree Available MiniMax M2.5mediumFree AvailablevsElephant Alphanone