AI BENCHY Compare

NVIDIA: Nemotron 3 Super vs OpenAI: gpt-oss-120b

Last updated at: 2026-04-26

Metric	Nemotron 3 Super Nemotron 3 Super none Release: 2026-03-11 Free Available	gpt-oss-120b gpt-oss-120b none Release: 2025-08-05 Free Available

Metric	Nemotron 3 Super Nemotron 3 Super none Release: 2026-03-11 Free Available	gpt-oss-120b gpt-oss-120b none Release: 2025-08-05 Free Available
Score	5.1	5.2
Rank	#103	#98
Reliability	N/A	N/A
Consistency	8.2	7.9
Tests Correct
Attempt pass rate	35.2%	38.9%
Flaky tests	4	5
Total Runs	52	54
Cost per result	0.000	0.221
Total Cost	$0.000	$0.009
Input Price	$0.090 / 1M	$0.000 / 1M
Output Price	$0.450 / 1M	$0.000 / 1M
Output Tokens	4,760	44,652
Reasoning Tokens	0	0
Response Time (avg)	8.54s	11.96s
Response Time (max)	24.97s	68.97s
Response Time (total)	153.69s	179.34s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Output Tokens	Reasoning Tokens
Nemotron 3 Super	4.8	10.0	25.0%	0		7.43s	2,174	0
gpt-oss-120b	6.6	8.0	58.3%	1		6.03s	4,867	0

Coding	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Output Tokens	Reasoning Tokens
Nemotron 3 Super	3.3	1.6	33.3%	1		2.99s	535	0
gpt-oss-120b	4.3	1.1	66.7%	1		9.57s	3,232	0

Combined	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Output Tokens	Reasoning Tokens
Nemotron 3 Super	3.0	10.0	0.0%	0		19.98s	124	0
gpt-oss-120b	3.0	10.0	0.0%	0		0ms	0	0

Data parsing and extraction	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Output Tokens	Reasoning Tokens
Nemotron 3 Super	10.0	10.0	100.0%	0		7.92s	249	0
gpt-oss-120b	6.5	10.0	50.0%	0		7.12s	598	0

Domain specific	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Output Tokens	Reasoning Tokens
Nemotron 3 Super	3.6	7.2	22.2%	1		6.23s	26	0
gpt-oss-120b	3.0	10.0	0.0%	0		34.98s	29,483	0

General Intelligence	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Output Tokens	Reasoning Tokens
Nemotron 3 Super	4.2	9.9	0.0%	0		24.97s	170	0
gpt-oss-120b	4.6	10.0	0.0%	0		2.83s	586	0

Instructions following	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Output Tokens	Reasoning Tokens
Nemotron 3 Super	4.9	6.9	33.3%	1		1.50s	66	0
gpt-oss-120b	8.4	6.9	83.3%	1		5.10s	1,982	0

Puzzle Solving	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Output Tokens	Reasoning Tokens
Nemotron 3 Super	5.7	10.0	33.3%	0		7.50s	1,135	0
gpt-oss-120b	4.5	4.8	44.5%	2		6.86s	3,904	0

Tool Calling	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Output Tokens	Reasoning Tokens
Nemotron 3 Super	4.7	1.6	66.7%	1		16.00s	281	0
gpt-oss-120b	3.0	10.0	0.0%	0		0ms	0	0

Quick Compare

Switch Comparison Pair

gpt-oss-120bnoneFree AvailablevsElephant Alphamedium MiniMax M2.7mediumvsgpt-oss-120bnoneFree Available Nemotron 3 SupernoneFree AvailablevsElephant Alphamedium MiniMax M2.7mediumvsNemotron 3 SupernoneFree Available Nemotron 3 SupernoneFree AvailablevsQwen3 Coder Nextmedium Mistral Small 4mediumvsgpt-oss-120bnoneFree Available gpt-oss-120bnoneFree AvailablevsQwen3 Coder Nextmedium Nemotron 3 SupernoneFree AvailablevsGLM 4.7 Flashmedium MiniMax M2.5mediumFree Availablevsgpt-oss-120bnoneFree Available Mistral Small 4mediumvsNemotron 3 SupernoneFree Available gpt-oss-120bnoneFree AvailablevsGLM 4.7 Flashmedium MiniMax M2.5mediumFree AvailablevsNemotron 3 SupernoneFree Available