AI BENCHY
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Category

Domain specific Ranking

See which AI models perform best on Domain specific, which ones stay reliable, and where the biggest gaps appear. Sort by: Tests Correct ↑.

Models Shown

13

Average Domain specific Score

4.8

Rank Model Company Domain specific Score Score Tests Correct Response Time (avg)
#27 Gemma 4 31B medium Google 7.7 7.8 2/3 38.5s
#34 Qwen3.7 Max none Qwen 7.7 7.7 2/3 975ms
#48 Gemini 3 Flash Preview none Google 7.7 7.4 2/3 963ms
#74 Qwen3.6 Max Preview none Qwen 7.7 6.9 2/3 1.22s
#77 Claude Sonnet 4.6 none Anthropic 7.7 6.8 2/3 3.54s
#85 Gemma 4 31B none Google 7.7 6.5 2/3 3.22s
#108 Qwen3.5-Flash none Qwen 7.7 5.8 2/3 905ms
#117 Qwen3.5-35B-A3B none Qwen 7.7 5.6 2/3 485ms
#118 Qwen3.6 27B none Qwen 7.7 5.6 2/3 3.03s
#122 GLM 4.7 Flash none Z.ai 7.7 5.5 2/3 744ms
#1 Gemini 3 Flash Preview medium Google 10.0 9.8 3/3 15.3s
#32 Gemini 3.5 Flash minimal Google 10.0 7.7 3/3 899ms
#83 Step 3.5 Flash none Stepfun 10.0 6.6 1/1 34.5s

Top Models by Domain specific Score

Domain specific Score vs Total Cost

Top Models by Response Time (avg)