Anti-AI Tricks
: 8.7 A test is fully passed only if every run passed for that test.Extra formatting: 1Response Time (avg)19.75sResponse Time (max)49.95sResponse Time (total)79.01s
Coding
: 7.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)123.86sResponse Time (max)177.36sResponse Time (total)247.71s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)163.96sResponse Time (max)163.96sResponse Time (total)163.96s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)30.26sResponse Time (max)32.03sResponse Time (total)60.52s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)79.53sResponse Time (max)95.52sResponse Time (total)238.59s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)19.66sResponse Time (max)32.25sResponse Time (total)39.32s
Puzzle Solving
: 8.2 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)59.60sResponse Time (max)123.57sResponse Time (total)178.80s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.45sResponse Time (max)7.45sResponse Time (total)7.45s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)85.11sResponse Time (max)85.11sResponse Time (total)85.11s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)42.21sResponse Time (max)89.34sResponse Time (total)168.84s
Coding
: 6.5 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)59.65sResponse Time (max)59.65sResponse Time (total)59.65s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)304.19sResponse Time (max)304.19sResponse Time (total)304.19s
Data parsing and extraction
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)37.36sResponse Time (max)54.24sResponse Time (total)74.71s
Domain specific
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)64.92sResponse Time (max)150.55sResponse Time (total)194.76s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.78sResponse Time (max)17.75sResponse Time (total)23.55s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)104.44sResponse Time (max)104.44sResponse Time (total)104.44s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)113.91sResponse Time (max)113.91sResponse Time (total)113.91s
Anti-AI Tricks
: 8.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)37.16sResponse Time (max)140.53sResponse Time (total)148.65s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)137.63sResponse Time (max)137.63sResponse Time (total)137.63s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)149.23sResponse Time (max)149.23sResponse Time (total)149.23s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.49sResponse Time (max)4.96sResponse Time (total)8.98s
Domain specific
: 3.6 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)139.90sResponse Time (max)141.40sResponse Time (total)419.69s
Instructions following
: 7.3 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)23.26sResponse Time (max)43.87sResponse Time (total)46.51s
Puzzle Solving
: 5.7 A test is fully passed only if every run passed for that test.Did not follow instructions: 2Response Time (avg)50.83sResponse Time (max)144.85sResponse Time (total)152.49s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.44sResponse Time (max)6.44sResponse Time (total)6.44s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)59.11sResponse Time (max)168.31sResponse Time (total)236.44s
Coding
: 4.1 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)54.23sResponse Time (max)62.72sResponse Time (total)108.47s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)17.78sResponse Time (max)17.78sResponse Time (total)17.78s
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)56.99sResponse Time (max)80.14sResponse Time (total)113.98s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)146.50sResponse Time (max)234.29sResponse Time (total)439.49s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)63.49sResponse Time (max)111.61sResponse Time (total)126.98s
Puzzle Solving
: 8.2 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)27.61sResponse Time (max)31.84sResponse Time (total)55.21s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)10.33sResponse Time (max)10.33sResponse Time (total)10.33s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)48.98sResponse Time (max)48.98sResponse Time (total)48.98s
A test is fully passed only if every run passed for that test.Wrong answer: 4Timed out: 2Response Time (avg)67.58sResponse Time (max)266.69sResponse Time (total)878.57s…
Anti-AI Tricks
: 8.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)45.78sResponse Time (max)81.20sResponse Time (total)91.57s
Coding
: 7.6 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)193.80sResponse Time (max)266.69sResponse Time (total)387.60s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)46.85sResponse Time (max)46.85sResponse Time (total)46.85s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)46.91sResponse Time (max)46.91sResponse Time (total)46.91s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)17.50sResponse Time (max)17.50sResponse Time (total)17.50s
General Intelligence
: 4.7 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)79.86sResponse Time (max)79.86sResponse Time (total)79.86s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)31.93sResponse Time (max)31.93sResponse Time (total)31.93s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)32.50sResponse Time (max)49.12sResponse Time (total)65.01s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.54sResponse Time (max)7.54sResponse Time (total)7.54s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)103.81sResponse Time (max)103.81sResponse Time (total)103.81s
A test is fully passed only if every run passed for that test.Wrong answer: 2Did not follow instructions: 1Response Time (avg)68.14sResponse Time (max)280.52sResponse Time (total)1090.28s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)43.87sResponse Time (max)121.88sResponse Time (total)131.62s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)280.52sResponse Time (max)280.52sResponse Time (total)280.52s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.16sResponse Time (max)8.54sResponse Time (total)14.31s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)127.58sResponse Time (max)133.93sResponse Time (total)382.74s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.25sResponse Time (max)5.25sResponse Time (total)5.25s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)64.03sResponse Time (max)124.45sResponse Time (total)128.06s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)46.68sResponse Time (max)134.22sResponse Time (total)140.04s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.73sResponse Time (max)7.73sResponse Time (total)7.73s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)185.58sResponse Time (max)218.40sResponse Time (total)371.16s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)65.30sResponse Time (max)65.30sResponse Time (total)65.30s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)14.92sResponse Time (max)16.89sResponse Time (total)29.85s
Domain specific
: 5.5 A test is fully passed only if every run passed for that test.Timed out: 2Response Time (avg)233.13sResponse Time (max)431.03sResponse Time (total)466.26s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.14sResponse Time (max)6.80sResponse Time (total)12.27s
Puzzle Solving
: 7.9 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)49.91sResponse Time (max)128.09sResponse Time (total)149.74s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.91sResponse Time (max)11.91sResponse Time (total)11.91s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)100.80sResponse Time (max)100.80sResponse Time (total)100.80s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)21.13sResponse Time (max)34.96sResponse Time (total)84.53s
Coding
: 6.5 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)244.54sResponse Time (max)409.98sResponse Time (total)489.08s
Combined
: 4.7 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)75.34sResponse Time (max)75.34sResponse Time (total)75.34s
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)59.33sResponse Time (max)97.12sResponse Time (total)118.65s
Domain specific
: 4.1 A test is fully passed only if every run passed for that test.Timed out: 2Wrong answer: 1Response Time (avg)88.34sResponse Time (max)106.00sResponse Time (total)265.01s
General Intelligence
: 2.8 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)30.30sResponse Time (max)30.30sResponse Time (total)30.30s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)24.45sResponse Time (max)43.36sResponse Time (total)48.89s
Puzzle Solving
: 8.2 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)33.13sResponse Time (max)64.81sResponse Time (total)99.38s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.65sResponse Time (max)4.65sResponse Time (total)4.65s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)177.35sResponse Time (max)177.35sResponse Time (total)177.35s
Anti-AI Tricks
: 6.6 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)74.75sResponse Time (max)182.10sResponse Time (total)298.98s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)220.48sResponse Time (max)243.66sResponse Time (total)440.97s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)262.83sResponse Time (max)262.83sResponse Time (total)262.83s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)24.27sResponse Time (max)27.52sResponse Time (total)48.54s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Timed out: 3Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)17.47sResponse Time (max)19.46sResponse Time (total)34.93s
Puzzle Solving
: 8.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)31.79sResponse Time (max)50.78sResponse Time (total)95.38s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)88.68sResponse Time (max)88.68sResponse Time (total)88.68s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)56.76sResponse Time (max)56.76sResponse Time (total)56.76s
Anti-AI Tricks
: 5.1 A test is fully passed only if every run passed for that test.Timed out: 2Wrong answer: 1Response Time (avg)34.44sResponse Time (max)57.86sResponse Time (total)103.31s
Coding
: 2.8 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Timed out: 1Response Time (avg)135.61sResponse Time (max)135.61sResponse Time (total)135.61s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Domain specific
: 3.6 A test is fully passed only if every run passed for that test.Timed out: 3Response Time (avg)137.75sResponse Time (max)202.61sResponse Time (total)413.24s
General Intelligence
: 2.8 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)226.38sResponse Time (max)226.38sResponse Time (total)226.38s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)5.75sResponse Time (max)5.75sResponse Time (total)5.75s
Puzzle Solving
: 3.0 A test is fully passed only if every run passed for that test.Timed out: 2Wrong answer: 1Response Time (avg)32.27sResponse Time (max)47.31sResponse Time (total)96.80s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.31sResponse Time (max)4.31sResponse Time (total)4.31s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)177.02sResponse Time (max)177.02sResponse Time (total)177.02s
Anti-AI Tricks
: 7.3 A test is fully passed only if every run passed for that test.No answer: 1Wrong answer: 1Response Time (avg)51.38sResponse Time (max)85.28sResponse Time (total)102.75s
Coding
: 4.1 A test is fully passed only if every run passed for that test.No answer: 1Timed out: 1Response Time (avg)215.89sResponse Time (max)281.00sResponse Time (total)431.77s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)71.37sResponse Time (max)71.37sResponse Time (total)71.37s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)49.78sResponse Time (max)49.78sResponse Time (total)49.78s
Domain specific
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 2Timed out: 1Response Time (avg)137.29sResponse Time (max)137.29sResponse Time (total)137.29s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)92.47sResponse Time (max)92.47sResponse Time (total)92.47s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)31.74sResponse Time (max)31.74sResponse Time (total)31.74s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)83.95sResponse Time (max)83.95sResponse Time (total)83.95s