AI BENCHY
Benchmark Karyapranali
Yah page hamari benchmarking approach ko high-level par samjhata hai. Test integrity bachane ke liye ham exact prompts aur grading internals private rakhte hain.
Yeh Kaise Kaam Karta Hai (High Level)
- Private tests: Ham exact test content, prompts, ya full grading details publish nahi karte.
- Repeated runs: Har model ko kai baar chalaya jata hai taki results stability dikhayen, sirf ek lucky attempt nahi.
- Reasoning modes: Jahan supported ho, models ko multiple reasoning configurations me evaluate kiya jata hai.
- OpenRouter execution: Benchmark requests OpenRouter ke through run hoti hain.
- Real-world reliability: Timeout, downtime, aur API errors ko failed attempts maana jata hai.
- Fast coverage with evolving suite: Hamara suite chhota hai, isliye ham naye models ko jaldi test karte hain aur tests lagatar add ya remove karte hain.
- Generic intelligence signal: Score kisi ek category tak simit nahi hai. Yeh ek practical sawal ka indicator hai: agar aap AI se kuch bhi poochhen, sahi jawab milne ki sambhavana kitni hai?
Transparency ke liye ham broad methodology share karte hain, lekin sensitive benchmark details private rakhte hain.