AI BENCHY
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Category

Domain specific Ranking

See which AI models perform best on Domain specific, which ones stay reliable, and where the biggest gaps appear. Sort by: Tests Correct ↓.

Models Shown

15

Average Domain specific Score

4.8

Rank Model Company Domain specific Score Score Tests Correct Response Time (avg)
#77 Claude Sonnet 4.6 none Anthropic 7.7 6.8 2/3 3.54s
#85 Gemma 4 31B none Google 7.7 6.5 2/3 3.22s
#108 Qwen3.5-Flash none Qwen 7.7 5.8 2/3 905ms
#117 Qwen3.5-35B-A3B none Qwen 7.7 5.6 2/3 485ms
#118 Qwen3.6 27B none Qwen 7.7 5.6 2/3 3.03s
#122 GLM 4.7 Flash none Z.ai 7.7 5.5 2/3 744ms
#5 Qwen3.7 Max medium Qwen 5.9 9.1 1/3 24.9s
#6 GPT-5.5 low OpenAI 5.3 9.0 1/3 28.1s
#9 GPT-5.5 medium OpenAI 5.3 8.8 1/3 164.1s
#10 Claude Opus 4.8 medium Anthropic 5.3 8.7 1/3 14.2s
#12 Gemini 3.1 Flash Lite Preview high Google 5.3 8.6 1/3 127.6s
#13 Grok 4.20 Beta medium X AI 5.3 8.5 1/3 21.3s
#15 GPT-5.3-Codex medium OpenAI 5.9 8.4 1/3 64.3s
#16 Gemini 3 Flash Preview low Google 5.3 8.4 1/3 8.05s
#19 Seed-2.0-Lite medium Bytedance Seed 5.9 8.2 1/3 88.7s

Top Models by Domain specific Score

Domain specific Score vs Total Cost

Top Models by Response Time (avg)