Navigate
AI BENCHY
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Compare

OpenAI: gpt-oss-120b vs Qwen: Qwen3.5-27B

Last updated at: 2026-05-26

Metric gpt-oss-120b gpt-oss-120b none Release: 2025-08-05 Free Available Qwen3.5-27B Qwen3.5-27B none Release: 2026-02-24
Score 5.4 5.8
Rank #119 #106
Reliability 10.0 10.0
Consistency 9.1 9.3
Tests Correct
Attempt pass rate 38.6% 40.0%
Flaky tests 2 2
Total Runs 133 136
Cost per result 0.302 0.509
Total Cost $0.019 $0.036
Input Price $0.000 / 1M $0.195 / 1M
Output Price $0.000 / 1M $1.560 / 1M
Output Tokens 91,564 10,539
Reasoning Tokens 0 0
Response Time (avg) 21.61s 1.69s
Response Time (max) 113.71s 9.39s
Response Time (total) 345.79s 33.82s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
gpt-oss-120b 6.5 10.0 50.0% 0 32.84s 8,676 0
Qwen3.5-27B 4.8 10.0 25.0% 0 788ms 267 0
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
gpt-oss-120b 4.3 1.1 66.7% 1 9.57s 3,232 0
Qwen3.5-27B 7.3 10.0 50.0% 0 1.98s 408 0
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
gpt-oss-120b 3.0 10.0 0.0% 0 0ms 0 0
Qwen3.5-27B 2.8 1.6 33.3% 1 9.39s 1,461 0
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
gpt-oss-120b 6.5 10.0 50.0% 0 7.12s 598 0
Qwen3.5-27B 10.0 10.0 100.0% 0 1.43s 243 0
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
gpt-oss-120b 3.0 10.0 0.0% 0 34.98s 29,483 0
Qwen3.5-27B 3.0 10.0 0.0% 0 540ms 15 0
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
gpt-oss-120b 4.8 10.0 0.0% 0 10.79s 615 0
Qwen3.5-27B 5.0 10.0 0.0% 0 2.51s 126 0
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
gpt-oss-120b 9.8 10.0 100.0% 0 5.06s 10,870 0
Qwen3.5-27B 6.3 10.0 50.0% 0 1.03s 791 0
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
gpt-oss-120b 6.0 7.2 55.6% 1 8.21s 34,952 0
Qwen3.5-27B 6.7 7.9 55.6% 1 1.38s 6,915 0
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
gpt-oss-120b 3.0 10.0 0.0% 0 0ms 0 0
Qwen3.5-27B 10.0 10.0 100.0% 0 3.54s 303 0
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
gpt-oss-120b 3.0 10.0 0.0% 0 47.29s 3,138 0
Qwen3.5-27B 3.0 10.0 0.0% 0 599ms 10 0

Quick Compare

Switch Comparison Pair