Navigate
AI BENCHY
Your ad here

AI BENCHY Compare

Nemotron 3 Super 120b A12b vs OpenAI: GPT-4o-mini

Last updated at: 2026-03-12

Metric Nemotron 3 Super 120b A12b Nemotron 3 Super 120b A12b none Release: 2026-03-11 Free Available GPT-4o-mini GPT-4o-mini none Release: 2024-07-18
Rank #59 #55
Avg Score 3.4 4.0
Consistency 8.6 10.0
Cost per result 0.000 0.114
Total Cost $0.000 $0.005
Tests Correct
Attempt pass rate 31.3% 25.0%
Flaky tests 3 0
Total Runs 48 48
Output Tokens 4,222 1,594
Reasoning Tokens 0 0
Response Time (avg) 8.90s 2.07s
Response Time (max) 24.97s 7.58s
Response Time (total) 142.40s 18.60s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Avg Score vs Response Time (avg)

Total Output Tokens

Avg Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Nemotron 3 Super 120b A12b 10.0 10.0 0.0% 0 7.14s 2,171 0
GPT-4o-mini 4.0 10.0 33.3% 0 1.83s 180 0
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Nemotron 3 Super 120b A12b 10.0 10.0 0.0% 0 19.98s 124 0
GPT-4o-mini 10.0 10.0 0.0% 0 7.58s 568 0
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Nemotron 3 Super 120b A12b 9.9 10.0 100.0% 0 7.92s 249 0
GPT-4o-mini 9.9 10.0 100.0% 0 1.27s 183 0
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Nemotron 3 Super 120b A12b 10.0 7.2 22.2% 1 6.23s 26 0
GPT-4o-mini 10.0 10.0 0.0% 0 637ms 15 0
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Nemotron 3 Super 120b A12b 3.0 9.9 0.0% 0 24.97s 170 0
GPT-4o-mini 3.0 10.0 0.0% 0 909ms 66 0
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Nemotron 3 Super 120b A12b 4.5 6.9 33.3% 1 1.50s 66 0
GPT-4o-mini 4.5 10.0 0.0% 0 1.27s 69 0
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Nemotron 3 Super 120b A12b 4.7 10.0 33.3% 0 7.50s 1,135 0
GPT-4o-mini 2.3 10.0 0.0% 0 1.30s 308 0
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Nemotron 3 Super 120b A12b 10.0 1.6 66.7% 1 16.00s 281 0
GPT-4o-mini 10.0 10.0 100.0% 0 2.51s 205 0

Quick Compare

Switch Comparison Pair