AI BENCHY Compare

Mistral: Mistral Small 4 vs xAI: Grok 4.20

Summary

Mistral Small 4 vs Grok 4.20 benchmark comparison: Mistral Small 4 leads on average score with 5.1 vs 4.4. Mistral Small 4 has the lower benchmark cost at $0.007 vs $0.057. Mistral Small 4 is faster at 630ms vs 1.11s, with pass rates of 27.0% vs 28.6%.

Recommended model: Mistral Small 4 - It has the best score here (5.1), while costing about 8.2x less than Grok 4.20.

Last updated at: 2026-07-02

Metric	Mistral Small 4 Mistral Small 4 none Release: 2026-03-16	Grok 4.20 Grok 4.20 none Release: 2026-03-31

Metric	Mistral Small 4 Mistral Small 4 none Release: 2026-03-16	Grok 4.20 Grok 4.20 none Release: 2026-03-31
Score	5.1	4.4
Rank	#134	#160
Reliability	10.0	N/A
Consistency	9.5	8.5
Tests Correct
Attempt pass rate	27.0%	28.6%
Flaky tests	1	0
Total Runs	63	54
Cost per result	0.139	1.570
Total Cost	$0.007	$0.057
Input Price	$0.150 / 1M	$1.250 / 1M
Output Price	$0.600 / 1M	$2.500 / 1M
Total Input Tokens	37,309	41,313
Output Tokens	2,201	1,923
Reasoning Tokens	0	0
Response Time (avg)	630ms	1.11s
Response Time (max)	1.72s	6.04s
Response Time (total)	13.22s	19.96s

Generation showcase

Hamster playing table tennis

Prompt: Create a detailed SVG illustration of a hamster playing table tennis.

#134 Mistral Small 4

none

Cost: $0.002
Time: 10.4s
Tokens: 2,370 tok

#160 xAI: Grok 4.20

none

Cost: $0.004
Time: 6.5s
Tokens: 1,367 tok

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Mistral Small 4	3.4	7.9	16.7%	1		395ms	708	182	0
Grok 4.20	4.8	10.0	25.0%	0		501ms	1,986	267	0

Coding	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Mistral Small 4	3.7	9.7	0.0%	0		901ms	7,636	619	0
Grok 4.20	1.1	3.1	0.0%	0		1.22s	1,074	312	0

Combined	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Mistral Small 4	3.0	10.0	0.0%	0		1.72s	11,640	496	0
Grok 4.20	3.0	10.0	0.0%	0		6.04s	17,673	282	0

Data parsing and extraction	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Mistral Small 4	10.0	10.0	100.0%	0		822ms	7,914	261	0
Grok 4.20	10.0	10.0	100.0%	0		522ms	7,749	207	0

Domain specific	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Mistral Small 4	5.3	10.0	33.3%	0		367ms	798	28	0
Grok 4.20	3.0	10.0	0.0%	0		687ms	1,746	325	0

General Intelligence	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Mistral Small 4	4.0	10.0	0.0%	0		729ms	519	205	0
Grok 4.20	4.8	10.0	0.0%	0		659ms	819	83	0

Instructions following	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Mistral Small 4	6.5	10.0	50.0%	0		380ms	729	69	0
Grok 4.20	6.3	10.0	50.0%	0		445ms	1,350	60	0

Puzzle Solving	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Mistral Small 4	3.1	9.9	0.0%	0		399ms	735	111	0
Grok 4.20	5.3	10.0	33.3%	0		473ms	1,671	198	0

Tool Calling	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Mistral Small 4	10.0	10.0	100.0%	0		1.40s	6,420	213	0
Grok 4.20	10.0	10.0	100.0%	0		4.63s	7,245	189	0

Trivia	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Mistral Small 4	3.0	10.0	0.0%	0		397ms	210	17	0
Grok 4.20	0.0	0.0	0.0%	0		0ms	0	0	0

Quick Compare

Switch Comparison Pair

MiniMax M2.7mediumvsMistral Small 4none Grok 4.20nonevsGLM 4.7 Flashmedium CobuddymediumvsMistral Small 4none Qwen3 Coder NextmediumvsGrok 4.20none MiniMax M2.5mediumvsGrok 4.20none MiniMax M2.5mediumvsMistral Small 4none Mistral Small 4nonevsQwen3 Coder Nextmedium CobuddymediumvsGrok 4.20none Qwen3.5-9BmediumvsGrok 4.20none North Mini CodemediumFree AvailablevsMistral Small 4none Mistral Small 4mediumvsGrok 4.20none MiniMax M2.7mediumvsGrok 4.20none