Navigate
AI BENCHY
Advertise here

AI BENCHY Compare

Inception: Mercury 2 vs NVIDIA: Nemotron 3 Super

Last updated at: 2026-06-03

Metric Mercury 2 Mercury 2 none Release: 2026-02-24 Nemotron 3 Super Nemotron 3 Super medium Release: 2026-03-11 Free Available
Score 4.6 5.9
Rank #153 #102
Reliability 10.0 10.0
Consistency 9.1 9.2
Tests Correct
Attempt pass rate 25.0% 43.3%
Flaky tests 2 2
Total Runs 60 60
Cost per result 0.216 0.004
Total Cost $0.009 $0.019
Input Price $0.250 / 1M $0.090 / 1M
Output Price $0.750 / 1M $0.450 / 1M
Total Input Tokens 25,515 36,614
Output Tokens 3,001 14,505
Reasoning Tokens 0 30,178
Response Time (avg) 614ms 20.87s
Response Time (max) 1.27s 87.80s
Response Time (total) 12.28s 375.66s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Mercury 2 3.0 10.0 0.0% 0 483ms 631 286 0
Nemotron 3 Super 8.3 10.0 75.0% 0 7.85s 686 748 1,305
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Mercury 2 3.5 9.4 0.0% 0 831ms 4,631 1,650 0
Nemotron 3 Super 3.1 9.9 0.0% 0 62.38s 1,362 452 848
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Mercury 2 3.0 10.0 0.0% 0 606ms 4,821 131 0
Nemotron 3 Super 10.0 10.0 100.0% 0 87.80s 15,561 2,021 9,996
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Mercury 2 7.3 5.9 83.3% 1 667ms 6,362 180 0
Nemotron 3 Super 10.0 10.0 100.0% 0 18.16s 7,944 877 2,607
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Mercury 2 5.3 7.2 44.4% 1 534ms 784 46 0
Nemotron 3 Super 2.9 4.4 22.2% 2 16.19s 456 5,255 6,072
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Mercury 2 4.8 10.0 0.0% 0 628ms 495 159 0
Nemotron 3 Super 4.1 10.0 0.0% 0 6.91s 492 105 363
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Mercury 2 6.5 10.0 50.0% 0 551ms 691 82 0
Nemotron 3 Super 7.3 10.0 50.0% 0 6.97s 723 956 2,383
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Mercury 2 3.1 10.0 0.0% 0 535ms 694 251 0
Nemotron 3 Super 3.0 10.0 0.0% 0 3.15s 708 570 1,322
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Mercury 2 10.0 10.0 100.0% 0 1.27s 6,193 197 0
Nemotron 3 Super 10.0 10.0 100.0% 0 39.75s 8,544 270 1,969
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Mercury 2 3.0 10.0 0.0% 0 548ms 213 19 0
Nemotron 3 Super 3.0 10.0 0.0% 0 55.32s 138 3,251 3,313

Quick Compare

Switch Comparison Pair