Gemma 4 31B (medium) vs Nemotron 3 Ultra

Recommended model Nemotron 3 Ultra

Its score stays close to the best score here (6.1 vs 6.3), while responding about 19.5x faster than Gemma 4 31B (medium).

Detailed comparison

Metric	Gemma 4 31B Gemma 4 31B medium Release: 2026-04-02 Free Available	Nemotron 3 Ultra Nemotron 3 Ultra none Release: 2026-06-04 Free Available

Metric	Gemma 4 31B Gemma 4 31B medium Release: 2026-04-02 Free Available	Nemotron 3 Ultra Nemotron 3 Ultra none Release: 2026-06-04 Free Available
Score	6.3	6.1
Rank	#124	#144
Reliability	10.0	10.0
Consistency	9.0	9.3
Tests Correct
Attempt pass rate	68.2%	42.4%
Flaky tests	2	2
Total Runs	66	66
Cost per result	1.044	0.000
Total Cost	$0.116	$0.095
Input Price	$0.140 / 1M	$0.600 / 1M
Output Price	$0.400 / 1M	$3.600 / 1M
Total Input Tokens	94,992	101,275
Output Tokens	34,468	9,474
Reasoning Tokens	223,278	0
Response Time (avg)	75.38s	3.87s
Response Time (max)	437.40s	37.50s
Response Time (total)	1507.52s	85.15s

Prompt: Create a detailed SVG illustration of a hamster playing table tennis.

medium

none

Category:

Anti-AI Tricks	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Gemma 4 31B	10.0	10.0	100.0%	0		12.89s	816	962	2,046
Nemotron 3 Ultra	3.5	8.0	16.7%	1		2.35s	696	239	0

Coding	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Gemma 4 31B	4.3	5.8	22.2%	1		219.76s	5,568	11,098	33,212
Nemotron 3 Ultra	5.5	10.0	33.3%	0		1.02s	7,623	369	0

Combined	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Gemma 4 31B	2.9	5.8	16.7%	1		433.11s	77,035	12,112	157,552
Nemotron 3 Ultra	3.0	10.0	0.0%	0		21.14s	73,507	7,693	0

Data parsing and extraction	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Gemma 4 31B	10.0	10.0	100.0%	0		21.11s	8,334	1,822	2,951
Nemotron 3 Ultra	10.0	10.0	100.0%	0		1.94s	7,944	249	0

Domain specific	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Gemma 4 31B	7.7	10.0	66.7%	0		38.48s	876	4,349	8,985
Nemotron 3 Ultra	5.3	10.0	33.3%	0		698ms	789	27	0

General Intelligence	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Gemma 4 31B	10.0	10.0	100.0%	0		9.57s	567	105	888
Nemotron 3 Ultra	5.0	10.0	0.0%	0		13.49s	516	101	0

Instructions following	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Gemma 4 31B	10.0	10.0	100.0%	0		12.76s	777	533	2,035
Nemotron 3 Ultra	10.0	10.0	100.0%	0		1.46s	723	69	0

Puzzle Solving	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Gemma 4 31B	9.9	10.0	100.0%	0		26.91s	801	1,795	5,595
Nemotron 3 Ultra	5.9	7.2	55.6%	1		1.06s	726	352	0

Tool Calling	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Gemma 4 31B	3.0	10.0	0.0%	0		0ms	0	0	0
Nemotron 3 Ultra	10.0	10.0	100.0%	0		2.99s	8,544	264	0

Trivia	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Gemma 4 31B	3.0	10.0	0.0%	0		90.14s	218	1,692	10,014
Nemotron 3 Ultra	3.0	10.0	0.0%	0		1.83s	207	111	0

Switch Comparison Pair