AI BENCHY श्रेणी
डोमेन-विशिष्ट क्रमवारी
डोमेन-विशिष्ट मध्ये कोणती AI मॉडेल्स सर्वोत्तम काम करतात, कोणती विश्वासार्ह राहतात आणि सर्वात मोठी दरी कुठे दिसते ते पाहा.
| क्रमांक | मॉडेल | कंपनी | डोमेन-विशिष्ट स्कोअर | स्कोअर | बरोबर चाचण्या | प्रतिसाद वेळ (सरासरी) |
|---|---|---|---|---|---|---|
| #147 | GPT-4o-mini none | OpenAI | 3.0 | 4.8 | 0/3 | 637ms |
| #154 | Qwen3.5-9B none | Qwen | 3.0 | 4.6 | 0/3 | 464ms |
| #159 | Ling-2.6-1T none | Inclusionai | 3.0 | 4.3 | 0/3 | 1.04s |
| #163 | Granite 4.1 8B none | IBM Granite | 3.0 | 4.0 | 0/3 | 357ms |
| #14 | Qwen3.6 Max Preview medium | Qwen | 2.9 | 8.5 | 0/3 | 95.9s |
| #26 | Qwen3.6 Plus medium | Qwen | 2.9 | 7.9 | 0/3 | 29.6s |
| #29 | Qwen3.5-122B-A10B medium | Qwen | 2.9 | 7.8 | 0/3 | 63.4s |
| #36 | Qwen3.5 Plus 2026-04-20 medium | Qwen | 2.9 | 7.6 | 0/3 | 53.1s |
| #44 | Gemini 3.1 Flash Lite medium | 2.9 | 7.5 | 0/3 | 3.16s | |
| #52 | Claude Sonnet 4.6 medium | Anthropic | 2.9 | 7.4 | 0/3 | 0ms |
| #78 | Qwen3.6 27B medium | Qwen | 2.9 | 6.8 | 0/3 | 73.4s |
| #81 | Mercury 2 medium | Inception | 2.9 | 6.6 | 0/3 | 6.48s |
| #84 | Grok 4.20 Multi Agent Beta medium | X AI | 2.9 | 6.6 | 0/3 | 24.7s |
| #87 | Gemini 3.1 Flash Lite minimal | 2.9 | 6.4 | 0/3 | 1.02s | |
| #90 | Gemini 3.1 Flash Lite none | 2.9 | 6.4 | 0/3 | 762ms |