Navigate
AI BENCHY
Advertise here

AI BENCHY Compare

OpenAI: GPT-4o-mini vs Qwen: Qwen3.5-122B-A10B

Last updated at: 2026-05-19

Metric GPT-4o-mini GPT-4o-mini none Release: 2024-07-18 Qwen3.5-122B-A10B Qwen3.5-122B-A10B none Release: 2026-02-24
Score 4.9 5.5
Rank #134 #117
Reliability 10.0 10.0
Consistency 9.9 9.2
Tests Correct
Attempt pass rate 26.3% 36.8%
Flaky tests 0 2
Total Runs 57 57
Cost per result 0.099 0.361
Total Cost $0.005 $0.022
Input Price $0.150 / 1M $0.260 / 1M
Output Price $0.600 / 1M $2.080 / 1M
Output Tokens 1,962 3,350
Reasoning Tokens 0 0
Response Time (avg) 1.90s 3.52s
Response Time (max) 7.58s 46.00s
Response Time (total) 22.79s 66.80s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
GPT-4o-mini 4.8 10.0 25.0% 0 1.34s 186 0
Qwen3.5-122B-A10B 4.8 10.0 25.0% 0 1.59s 312 0
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
GPT-4o-mini 3.0 8.7 0.0% 0 2.55s 347 0
Qwen3.5-122B-A10B 4.3 1.1 66.7% 1 3.44s 659 0
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
GPT-4o-mini 3.0 10.0 0.0% 0 7.58s 568 0
Qwen3.5-122B-A10B 3.0 10.0 0.0% 0 46.00s 1,137 0
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
GPT-4o-mini 10.0 10.0 100.0% 0 1.27s 183 0
Qwen3.5-122B-A10B 10.0 10.0 100.0% 0 1.01s 243 0
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
GPT-4o-mini 3.0 10.0 0.0% 0 637ms 15 0
Qwen3.5-122B-A10B 5.3 10.0 33.3% 0 465ms 15 0
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
GPT-4o-mini 4.0 10.0 0.0% 0 909ms 66 0
Qwen3.5-122B-A10B 5.0 10.0 0.0% 0 1.12s 66 0
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
GPT-4o-mini 6.3 10.0 50.0% 0 1.27s 69 0
Qwen3.5-122B-A10B 6.3 10.0 50.0% 0 585ms 70 0
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
GPT-4o-mini 3.5 10.0 0.0% 0 1.30s 308 0
Qwen3.5-122B-A10B 3.7 7.7 11.1% 1 982ms 575 0
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
GPT-4o-mini 10.0 10.0 100.0% 0 2.51s 205 0
Qwen3.5-122B-A10B 10.0 10.0 100.0% 0 2.04s 264 0
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
GPT-4o-mini 3.0 10.0 0.0% 0 794ms 15 0
Qwen3.5-122B-A10B 3.0 10.0 0.0% 0 295ms 9 0

Quick Compare

Switch Comparison Pair