Navigate
AI BENCHY
Advertise here

AI BENCHY Compare

Cobuddy vs OpenAI: GPT-5.4 Nano

Summary

Cobuddy vs GPT-5.4 Nano benchmark comparison: Cobuddy leads on average score with 4.9 vs 4.8. Cobuddy has the lower benchmark cost at $0.000 vs $0.011. GPT-5.4 Nano is faster at 1.48s vs 39.90s, with pass rates of 47.6% vs 30.2%.

Recommended model: GPT-5.4 Nano - Its score stays close to the best score here (4.8 vs 4.9), while responding about 27.0x faster than Cobuddy.

Last updated at: 2026-06-12

Metric Cobuddy Cobuddy medium Release: 2026-05-06 GPT-5.4 Nano GPT-5.4 Nano none Release: 2026-03-17
Score 4.9 4.8
Rank #144 #149
Reliability 10.0 10.0
Consistency 7.5 8.2
Tests Correct
Attempt pass rate 47.6% 30.2%
Flaky tests 6 5
Total Runs 63 63
Cost per result 0.000 0.259
Total Cost $0.000 $0.011
Input Price $0.000 / 1M $0.200 / 1M
Output Price $0.000 / 1M $1.250 / 1M
Total Input Tokens 37,449 34,212
Output Tokens 1,677 2,784
Reasoning Tokens 116,703 0
Response Time (avg) 39.90s 1.48s
Response Time (max) 309.02s 4.47s
Response Time (total) 797.98s 31.01s

Generation showcase

Hamster playing table tennis

Prompt: Create a detailed SVG illustration of a hamster playing table tennis.

#144 Cobuddy

medium
No showcase result has been generated for this model yet.
Cost
$0.000
Time
-
Tokens
0 tok

#149 GPT-5.4 Nano

none
Cost
$0.008
Time
46.1s
Tokens
5,735 tok

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Cobuddy 8.7 7.9 91.7% 1 10.00s 453 98 4,666
GPT-5.4 Nano 3.5 8.0 16.7% 1 1.18s 606 800 0
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Cobuddy 3.7 6.7 22.2% 1 79.17s 4,726 358 30,138
GPT-5.4 Nano 4.6 7.9 22.2% 1 2.22s 7,305 613 0
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Cobuddy 3.0 10.0 0.0% 0 47.38s 18,324 465 7,265
GPT-5.4 Nano 3.0 10.0 0.0% 0 3.84s 11,019 280 0
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Cobuddy 6.3 5.8 66.7% 1 17.36s 8,181 275 5,591
GPT-5.4 Nano 6.5 10.0 50.0% 0 1.11s 7,140 219 0
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Cobuddy 2.9 4.4 22.2% 2 128.15s 540 10 49,454
GPT-5.4 Nano 2.9 4.4 22.2% 2 926ms 723 52 0
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Cobuddy 4.2 9.9 0.0% 0 23.23s 498 76 3,782
GPT-5.4 Nano 3.8 2.5 33.3% 1 1.31s 477 180 0
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Cobuddy 9.8 10.0 100.0% 0 11.60s 508 64 2,842
GPT-5.4 Nano 6.3 10.0 50.0% 0 784ms 660 89 0
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Cobuddy 3.6 7.2 22.2% 1 12.83s 561 189 5,808
GPT-5.4 Nano 5.4 10.0 33.3% 0 1.25s 642 308 0
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Cobuddy 10.0 10.0 100.0% 0 11.19s 3,505 133 294
GPT-5.4 Nano 10.0 10.0 100.0% 0 3.40s 5,445 222 0
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Cobuddy 3.0 10.0 0.0% 0 36.98s 153 9 6,863
GPT-5.4 Nano 3.0 10.0 0.0% 0 773ms 195 21 0

Quick Compare

Switch Comparison Pair