AI BENCHY
Compare
❤️ Made by XCS

Model Name

OpenAI: gpt-oss-120b

Last updated at : Feb 19, 2026

Metric OpenAI: gpt-oss-120b
Rank#12
CompanyOpenAI
Score 5.75
Consistency 7.19
Cost per result 0.0951
Total Cost $0.00571
Tests Correct 6/12
Attempt pass rate 63.9%
Flaky tests 4
Output Tokens 8,060
Reasoning Tokens 23,792

Category Breakdown

Category Fully passed tests Score Consistency Attempt pass rate Flaky tests Reasoning score Cost
Anti-AI Tricks 2/2 10.00 10.00 100.0% 0 10.00 $0.00029
Data parsing and extraction 1/2 5.50 5.81 83.3% 1 10.00 $0.00052
Domain specific 0/3 1.00 4.41 22.2% 2 8.53 $0.00393
Instructions following 2/2 10.00 10.00 100.0% 0 9.50 $0.00040
Puzzle Solving 1/3 5.00 7.13 44.4% 1 7.89 $0.00059

Compared models

Compare OpenAI: gpt-oss-120b against...

#11 · OpenAI

OpenAI: GPT-5 Nano

Reasoning (medium)

Score: 5.92

Consistency: 6.03

Attempt pass rate: 72.2%

Flaky tests: 6

Cost per result: 0.4675

Tests Correct: 6/12

Total Cost: $0.02806

Compare

#13 · Anthropic

Anthropic: Claude Sonnet 4.6

No Reasoning

Score: 5.75

Consistency: 9.42

Attempt pass rate: 52.8%

Flaky tests: 1

Cost per result: 0.9480

Tests Correct: 6/12

Total Cost: $0.05688

Compare

#10 · Google

Google: Gemini 3 Flash Preview

No Reasoning

Score: 6.25

Consistency: 8.60

Attempt pass rate: 66.7%

Flaky tests: 2

Cost per result: 0.0754

Tests Correct: 7/12

Total Cost: $0.00528

Compare

Quick Compare

Compare OpenAI: gpt-oss-120b against...