Navigate
AI BENCHY
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Compare

OpenAI: GPT-5.4 vs OpenAI: gpt-oss-120b

Last updated at: 2026-05-22

Metric GPT-5.4 GPT-5.4 none Release: 2026-03-05 gpt-oss-120b gpt-oss-120b none Release: 2025-08-05 Free Available
Score 5.6 5.2
Rank #112 #129
Reliability 10.0 10.0
Consistency 9.1 8.7
Tests Correct
Attempt pass rate 38.3% 36.8%
Flaky tests 2 3
Total Runs 60 57
Cost per result 1.638 0.201
Total Cost $0.115 $0.011
Input Price $2.500 / 1M $0.000 / 1M
Output Price $15.000 / 1M $0.000 / 1M
Output Tokens 2,378 51,505
Reasoning Tokens 0 0
Response Time (avg) 1.46s 21.86s
Response Time (max) 2.95s 113.71s
Response Time (total) 29.23s 349.78s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
GPT-5.4 3.2 8.0 8.3% 1 1.21s 406 0
gpt-oss-120b 6.5 10.0 50.0% 0 32.84s 8,676 0
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
GPT-5.4 6.8 10.0 50.0% 0 1.99s 501 0
gpt-oss-120b 4.3 1.1 66.7% 1 9.57s 3,232 0
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
GPT-5.4 3.0 10.0 0.0% 0 2.89s 291 0
gpt-oss-120b 3.0 10.0 0.0% 0 0ms 0 0
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
GPT-5.4 10.0 10.0 100.0% 0 1.04s 222 0
gpt-oss-120b 6.5 10.0 50.0% 0 7.12s 598 0
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
GPT-5.4 5.3 7.2 44.4% 1 1.07s 50 0
gpt-oss-120b 3.0 10.0 0.0% 0 34.98s 29,483 0
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
GPT-5.4 4.4 9.9 0.0% 0 1.78s 184 0
gpt-oss-120b 4.8 10.0 0.0% 0 10.79s 615 0
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
GPT-5.4 6.5 10.0 50.0% 0 1.07s 81 0
gpt-oss-120b 9.8 10.0 100.0% 0 5.10s 1,982 0
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
GPT-5.4 5.6 9.8 33.3% 0 1.52s 357 0
gpt-oss-120b 4.4 4.5 44.5% 2 9.51s 3,781 0
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
GPT-5.4 10.0 10.0 100.0% 0 2.75s 246 0
gpt-oss-120b 3.0 10.0 0.0% 0 0ms 0 0
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
GPT-5.4 3.0 10.0 0.0% 0 990ms 40 0
gpt-oss-120b 3.0 10.0 0.0% 0 47.29s 3,138 0

Quick Compare

Switch Comparison Pair