Navigate
AI BENCHY
Your ad here

AI BENCHY Compare

Mistral: Mistral Small 4 vs Nemotron 3 Super 120b A12b

Last updated at: 2026-03-17

Metric Mistral Small 4 Mistral Small 4 none Release: 2026-03-16 Nemotron 3 Super 120b A12b Nemotron 3 Super 120b A12b none Release: 2026-03-11 Free Available
Rank #61 #62
Score 5.3 5.2
Consistency 9.5 8.6
Cost per result 0.108 0.000
Total Cost $0.006 $0.000
Tests Correct
Attempt pass rate 33.3% 35.3%
Flaky tests 1 3
Total Runs 51 49
Output Tokens 1,624 4,225
Reasoning Tokens 0 0
Response Time (avg) 629ms 8.86s
Response Time (max) 1.72s 24.97s
Response Time (total) 10.70s 150.70s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Mistral Small 4 3.4 7.9 16.7% 1 395ms 182 0
Nemotron 3 Super 120b A12b 4.8 10.0 25.0% 0 7.43s 2,174 0
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Mistral Small 4 3.0 10.0 0.0% 0 1.72s 496 0
Nemotron 3 Super 120b A12b 3.0 10.0 0.0% 0 19.98s 124 0
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Mistral Small 4 10.0 10.0 100.0% 0 822ms 261 0
Nemotron 3 Super 120b A12b 10.0 10.0 100.0% 0 7.92s 249 0
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Mistral Small 4 5.3 10.0 33.3% 0 367ms 28 0
Nemotron 3 Super 120b A12b 3.6 7.2 22.2% 1 6.23s 26 0
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Mistral Small 4 4.0 10.0 0.0% 0 729ms 205 0
Nemotron 3 Super 120b A12b 4.2 9.9 0.0% 0 24.97s 170 0
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Mistral Small 4 6.5 10.0 50.0% 0 380ms 69 0
Nemotron 3 Super 120b A12b 4.9 6.9 33.3% 1 1.50s 66 0
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Mistral Small 4 3.1 9.9 0.0% 0 589ms 170 0
Nemotron 3 Super 120b A12b 5.7 10.0 33.3% 0 7.50s 1,135 0
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Mistral Small 4 10.0 10.0 100.0% 0 1.40s 213 0
Nemotron 3 Super 120b A12b 4.7 1.6 66.7% 1 16.00s 281 0

Quick Compare

Switch Comparison Pair