AI BENCHY
Linganisha
❤️ Made by XCS

Jina la modeli

Anthropic: Claude Sonnet 4.6

Benchmark zimetengenezwa kutoka seti za majaribio za Aibenchy tarehe : 19 Feb 2026

Kipimo Anthropic: Claude Sonnet 4.6
Nafasi#13
KampuniAnthropic
Score 5.75
Uthabiti 9.42
Gharama kwa matokeo 0.9480
Jumla ya gharama $0.05688
Majaribio sahihi 6/12
Kiwango cha kupita kwa kila jaribio 52.8%
Majaribio yasiyo thabiti 1
Tokeni za matokeo 1,659
Tokeni za hoja 0

Mgawanyo wa kategoria

Kategoria Majaribio yaliyopita kikamilifu Score Uthabiti Kiwango cha kupita kwa kila jaribio Majaribio yasiyo thabiti Alama ya hoja Gharama
Anti-AI Tricks 0/2 1.00 10.00 0.0% 0 - $0.01092
Data parsing and extraction 2/2 10.00 10.00 100.0% 0 - $0.02854
Domain specific 2/3 7.00 10.00 66.7% 0 - $0.00309
Instructions following 1/2 5.50 10.00 50.0% 0 - $0.00342
Puzzle Solving 1/3 5.00 7.68 44.4% 1 - $0.01092

Modeli zilizolinganishwa

Linganisha Anthropic: Claude Sonnet 4.6 dhidi ya...

#12 · OpenAI

OpenAI: gpt-oss-120b

Uchambuzi (medium)

Score: 5.75

Uthabiti: 7.19

Kiwango cha kupita kwa kila jaribio: 63.9%

Majaribio yasiyo thabiti: 4

Gharama kwa matokeo: 0.0951

Majaribio sahihi: 6/12

Jumla ya gharama: $0.00571

Linganisha

#14 · Qwen

Qwen: Qwen3.5 Plus 2026-02-15

Bila uchambuzi

Score: 5.67

Uthabiti: 9.99

Kiwango cha kupita kwa kila jaribio: 50.0%

Majaribio yasiyo thabiti: 0

Gharama kwa matokeo: 0.0997

Majaribio sahihi: 6/12

Jumla ya gharama: $0.00599

Linganisha

#11 · OpenAI

OpenAI: GPT-5 Nano

Uchambuzi (medium)

Score: 5.92

Uthabiti: 6.03

Kiwango cha kupita kwa kila jaribio: 72.2%

Majaribio yasiyo thabiti: 6

Gharama kwa matokeo: 0.4675

Majaribio sahihi: 6/12

Jumla ya gharama: $0.02806

Linganisha

Ulinganisho wa haraka

Linganisha Anthropic: Claude Sonnet 4.6 dhidi ya...