Navigate
AI BENCHY
Advertise here

AI BENCHY Compare

Google: Gemini 3.5 Flash vs OpenAI: gpt-oss-120b

Summary

Gemini 3.5 Flash vs gpt-oss-120b benchmark comparison: Gemini 3.5 Flash leads on average score with 7.0 vs 6.7. gpt-oss-120b has the lower benchmark cost at $0.013 vs $1.079. Gemini 3.5 Flash is faster at 9.93s vs 22.28s, with pass rates of 77.8% vs 52.4%.

Recommended model: Gemini 3.5 Flash - It has the best score here (7.0), while responding about 2.2x faster than gpt-oss-120b.

Last updated at: 2026-06-12

Metric Gemini 3.5 Flash Gemini 3.5 Flash none Release: 2026-05-19 gpt-oss-120b gpt-oss-120b medium Release: 2025-08-05 Free Available
Score 7.0 6.7
Rank #66 #78
Reliability 10.0 10.0
Consistency 8.9 8.0
Tests Correct
Attempt pass rate 77.8% 52.4%
Flaky tests 3 5
Total Runs 63 63
Cost per result 7.190 0.141
Total Cost $1.079 $0.013
Input Price $1.500 / 1M $0.039 / 1M
Output Price $9.000 / 1M $0.180 / 1M
Total Input Tokens 13,843 39,084
Output Tokens 117,518 20,013
Reasoning Tokens 0 50,233
Response Time (avg) 9.93s 22.28s
Response Time (max) 64.36s 68.16s
Response Time (total) 178.68s 311.96s

Generation showcase

Hamster playing table tennis

Prompt: Create a detailed SVG illustration of a hamster playing table tennis.

#66 Gemini 3.5 Flash

none
Cost
$0.225
Time
125.5s
Tokens
25,004 tok

#78 gpt-oss-120b

medium
Cost
$0.001
Time
26.7s
Tokens
555 tok

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Gemini 3.5 Flash 10.0 10.0 100.0% 0 2.53s 492 5,101 0
gpt-oss-120b 6.7 9.9 50.0% 0 10.21s 1,314 3,518 2,177
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Gemini 3.5 Flash 8.8 7.8 88.9% 1 34.69s 8,122 75,927 0
gpt-oss-120b 5.9 7.0 55.6% 1 38.37s 7,782 3,365 11,973
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Gemini 3.5 Flash 3.0 10.0 0.0% 0 0ms 0 0 0
gpt-oss-120b 10.0 10.0 100.0% 0 31.18s 11,535 694 5,072
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Gemini 3.5 Flash 6.5 10.0 50.0% 0 8.10s 2,781 5,895 0
gpt-oss-120b 6.4 5.9 66.7% 1 1.98s 7,476 241 1,114
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Gemini 3.5 Flash 7.6 7.2 77.8% 1 10.64s 633 17,910 0
gpt-oss-120b 2.9 4.4 22.2% 2 50.92s 1,266 6,784 20,606
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Gemini 3.5 Flash 10.0 10.0 100.0% 0 3.46s 486 1,620 0
gpt-oss-120b 4.3 10.0 0.0% 0 7.90s 659 107 387
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Gemini 3.5 Flash 9.8 10.0 100.0% 0 3.38s 615 3,928 0
gpt-oss-120b 9.9 10.0 100.0% 0 7.63s 1,036 126 1,799
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Gemini 3.5 Flash 10.0 10.0 100.0% 0 3.13s 558 4,640 0
gpt-oss-120b 5.3 7.2 44.4% 1 21.71s 1,190 1,790 2,264
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Gemini 3.5 Flash 3.0 10.0 0.0% 0 0ms 0 0 0
gpt-oss-120b 9.8 10.0 100.0% 0 6.91s 6,514 287 1,083
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Gemini 3.5 Flash 2.8 1.6 33.3% 1 4.87s 156 2,497 0
gpt-oss-120b 3.0 10.0 0.0% 0 26.51s 312 3,101 3,758

Quick Compare

Switch Comparison Pair