AI BENCHY
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Category

Domain specific Ranking

See which AI models perform best on Domain specific, which ones stay reliable, and where the biggest gaps appear. Sort by: Score ↓.

Models Shown

15

Average Domain specific Score

4.8

Rank Model Company Domain specific Score Score Tests Correct Response Time (avg)
#1 Gemini 3 Flash Preview medium Google 10.0 9.8 3/3 15.3s
#2 Gemini 3.5 Flash high Google 7.6 9.6 2/3 14.1s
#3 Gemini 3.5 Flash low Google 7.7 9.4 2/3 3.39s
#4 Gemini 3.1 Pro Preview medium Google 7.7 9.4 2/3 32.7s
#5 Qwen3.7 Max medium Qwen 5.9 9.1 1/3 24.9s
#6 GPT-5.5 low OpenAI 5.3 9.0 1/3 28.1s
#7 Gemini 3.5 Flash medium Google 7.7 9.0 2/3 5.24s
#8 Claude Opus 4.7 none Anthropic 7.7 8.9 2/3 1.19s
#9 GPT-5.5 medium OpenAI 5.3 8.8 1/3 164.1s
#10 Claude Opus 4.8 medium Anthropic 5.3 8.7 1/3 14.2s
#11 Claude Opus 4.7 medium Anthropic 7.7 8.7 2/3 1.17s
#12 Gemini 3.1 Flash Lite Preview high Google 5.3 8.6 1/3 127.6s
#13 Grok 4.20 Beta medium X AI 5.3 8.5 1/3 21.3s
#14 Qwen3.6 Max Preview medium Qwen 2.9 8.5 0/3 95.9s
#15 GPT-5.3-Codex medium OpenAI 5.9 8.4 1/3 64.3s

Top Models by Domain specific Score

Domain specific Score vs Total Cost

Top Models by Response Time (avg)