AI BENCHY
Linganisha Chati
โค๏ธ Made by XCS
Your ad here

#7

GPT-5.4

OpenAI ยท Toleo: 2026-03-05 ยท openai/gpt-5.4::medium

Wastani wa alama

8.2

Gharama kwa matokeo

6.533

Uthabiti

8.9

Jumla ya gharama

$0.784

Majaribio sahihi

12

Jaribio huhesabiwa kuwa limepita kikamilifu tu ikiwa run zake zote zimepita.

Majaribio yenye makosa

3

Kiwango cha kupita kwa kila jaribio: 86.7%

Majaribio yasiyo thabiti

2

Majaribio yasiyo thabiti yalikuwa na matokeo mchanganyiko kati ya run (angalau kupita moja na kufeli moja).

Muda wa majibu (wastani)

21.06s

Muda wa majibu (upeo): 100.41s

Muda wa majibu (jumla): 315.95s

Jibu lisilo sahihi: 2 Hakufuata maelekezo: 1

Modeli bora kwa alama

Chagua modeli ya kwanza, kisha bofya modeli ya pili kufungua ukurasa wa kulinganisha bega kwa bega.

Ulinganisho wa haraka

Mgawanyo wa kategoria

Kategoria Wastani wa alama Uthabiti Majaribio sahihi
Anti-AI Tricks 10.0 10.0 3/3
Combined 10.0 10.0 1/1
Data parsing and extraction 9.9 10.0 2/2
Domain specific 4.0 7.2 1/3
Instructions following 10.0 10.0 2/2
Puzzle Solving 7.0 7.2 2/3
Tool Calling 10.0 10.0 1/1