AI BENCHY
Your ad here

AI BENCHY Category

Domain specific Ranking

See which AI models perform best on Domain specific, which ones stay reliable, and where the biggest gaps appear. Sort by: Score ↓.

Models Shown

15

Average Domain specific Score

4.8

Rank Model Company Domain specific Score Score Tests Correct Response Time (avg)
#1 Gemini 3 Flash Preview medium Google 10.0 10.0 3/3 21.1s
#2 Gemini 3.1 Pro Preview medium Google 7.7 9.6 2/3 32.7s
#3 Claude Opus 4.7 medium Anthropic 7.7 9.2 2/3 1.17s
#4 Claude Opus 4.7 none Anthropic 7.7 9.2 2/3 1.19s
#5 Gemini 3 Flash Preview low Google 5.3 8.8 1/3 8.05s
#6 Seed-2.0-Lite medium Bytedance Seed 5.9 8.6 1/3 88.7s
#7 GPT-5.3-Codex medium OpenAI 5.9 8.6 1/3 64.3s
#8 Qwen3.5 Plus 2026-02-15 medium Qwen 5.3 8.5 1/3 17.5s
#9 Qwen3.6 Plus Preview medium Qwen 3.0 8.5 0/3 22.1s
#10 Qwen3.5-27B medium Qwen 5.3 8.4 1/3 79.5s
#11 Gemini 3.1 Flash Lite Preview high Google 5.3 8.4 1/3 127.6s
#12 Gemini 3 PRO Preview medium Google 5.3 8.4 1/3 7.01s
#13 GLM 5 medium Z.ai 3.5 8.4 0/3 0ms
#14 Gemma 4 31B medium Google 7.7 8.3 2/3 38.5s
#15 Gemini 2.5 Flash medium Google 5.9 8.2 1/3 37.3s

Top Models by Domain specific Score

Domain specific Score vs Total Cost

Top Models by Response Time (avg)