Granite 4.1 8B vs Grok 4.20

Recommended model Grok 4.20

It has the strongest score in this comparison (4.1) and the best overall balance of cost and response time across all 2 models.

Detailed comparison

Metric	Granite 4.1 8B Granite 4.1 8B none Release: 2026-05-01	Grok 4.20 Grok 4.20 none Release: 2026-03-31

Metric	Granite 4.1 8B Granite 4.1 8B none Release: 2026-05-01	Grok 4.20 Grok 4.20 none Release: 2026-03-31
Score	4.0	4.1
Rank	#224	#220
Reliability	10.0	N/A
Consistency	10.0	8.1
Tests Correct
Attempt pass rate	9.1%	27.3%
Flaky tests	0	0
Total Runs	66	54
Cost per result	0.315	1.570
Total Cost	$0.007	$0.057
Input Price	$0.050 / 1M	$1.250 / 1M
Output Price	$0.100 / 1M	$2.500 / 1M
Total Input Tokens	113,827	41,313
Output Tokens	5,996	1,923
Reasoning Tokens	0	0
Response Time (avg)	1.45s	1.11s
Response Time (max)	16.67s	6.04s
Response Time (total)	31.96s	19.96s

Prompt: Create a detailed SVG illustration of a hamster playing table tennis.

none

none

Category:

Anti-AI Tricks	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Granite 4.1 8B	4.9	10.0	25.0%	0		844ms	645	903	0
Grok 4.20	4.8	10.0	25.0%	0		501ms	1,986	267	0

Coding	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Granite 4.1 8B	4.5	10.0	0.0%	0		775ms	8,344	525	0
Grok 4.20	1.1	3.1	0.0%	0		1.22s	1,074	312	0

Combined	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Granite 4.1 8B	3.0	10.0	0.0%	0		9.28s	86,631	3,481	0
Grok 4.20	1.5	5.0	0.0%	0		6.04s	17,673	282	0

Data parsing and extraction	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Granite 4.1 8B	3.0	10.0	0.0%	0		575ms	7,617	195	0
Grok 4.20	10.0	10.0	100.0%	0		522ms	7,749	207	0

Domain specific	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Granite 4.1 8B	3.0	10.0	0.0%	0		357ms	768	24	0
Grok 4.20	3.0	10.0	0.0%	0		687ms	1,746	325	0

General Intelligence	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Granite 4.1 8B	4.0	10.0	0.0%	0		499ms	528	115	0
Grok 4.20	4.8	10.0	0.0%	0		659ms	819	83	0

Instructions following	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Granite 4.1 8B	3.6	9.9	0.0%	0		344ms	687	66	0
Grok 4.20	6.3	10.0	50.0%	0		445ms	1,350	60	0

Puzzle Solving	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Granite 4.1 8B	3.2	10.0	0.0%	0		608ms	672	432	0
Grok 4.20	5.3	10.0	33.3%	0		473ms	1,671	198	0

Tool Calling	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Granite 4.1 8B	10.0	10.0	100.0%	0		2.17s	7,719	243	0
Grok 4.20	10.0	10.0	100.0%	0		4.63s	7,245	189	0

Trivia	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Granite 4.1 8B	3.0	10.0	0.0%	0		306ms	216	12	0
Grok 4.20	0.0	0.0	0.0%	0		0ms	0	0	0

Switch Comparison Pair