A test is fully passed only if every run passed for that test.Wrong answer: 12Did not follow instructions: 2Response Time (avg)3.38sResponse Time (max)46.00sResponse Time (total)67.64s…
Anti-AI Tricks
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)1.59sResponse Time (max)3.60sResponse Time (total)6.38s
Coding
: 4.0 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)2.14sResponse Time (max)3.44sResponse Time (total)4.29s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)46.00sResponse Time (max)46.00sResponse Time (total)46.00s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.01sResponse Time (max)1.06sResponse Time (total)2.02s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)465msResponse Time (max)492msResponse Time (total)1.39s
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)585msResponse Time (max)715msResponse Time (total)1.17s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.04sResponse Time (max)2.04sResponse Time (total)2.04s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)295msResponse Time (max)295msResponse Time (total)295ms
Anti-AI Tricks
: 6.6 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.19sResponse Time (max)2.04sResponse Time (total)4.75s
Coding
: 4.0 A test is fully passed only if every run passed for that test.No answer: 1Wrong answer: 1Response Time (avg)1.30sResponse Time (max)1.30sResponse Time (total)1.30s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.70sResponse Time (max)3.70sResponse Time (total)3.70s
Data parsing and extraction
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)979msResponse Time (max)1.02sResponse Time (total)1.96s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)925msResponse Time (max)1.16sResponse Time (total)2.77s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)987msResponse Time (max)1.13sResponse Time (total)1.97s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)2.83sResponse Time (max)2.83sResponse Time (total)2.83s
Trivia
: 0.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Anti-AI Tricks
: 5.6 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)2.67sResponse Time (max)5.03sResponse Time (total)10.66s
Coding
: 5.1 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)44.82sResponse Time (max)59.15sResponse Time (total)89.64s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)25.25sResponse Time (max)25.25sResponse Time (total)25.25s
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)1.23sResponse Time (max)1.96sResponse Time (total)2.46s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.API error: 1Wrong answer: 1Response Time (avg)6.11sResponse Time (max)13.72sResponse Time (total)18.34s
Instructions following
: 7.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.38sResponse Time (max)1.61sResponse Time (total)2.75s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.50sResponse Time (max)3.50sResponse Time (total)3.50s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)5.92sResponse Time (max)5.92sResponse Time (total)5.92s
Anti-AI Tricks
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)501msResponse Time (max)839msResponse Time (total)2.01s
Coding
: 3.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.22sResponse Time (max)1.22sResponse Time (total)1.22s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)6.04sResponse Time (max)6.04sResponse Time (total)6.04s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)522msResponse Time (max)537msResponse Time (total)1.04s
General Intelligence
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)659msResponse Time (max)659msResponse Time (total)659ms
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)455msResponse Time (max)505msResponse Time (total)910ms
Puzzle Solving
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)487msResponse Time (max)539msResponse Time (total)1.46s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.63sResponse Time (max)4.63sResponse Time (total)4.63s
Coding
: 3.5 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)125.80sResponse Time (max)125.80sResponse Time (total)125.80s
Combined
: 4.5 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)60.39sResponse Time (max)60.39sResponse Time (total)60.39s
Data parsing and extraction
: 4.6 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)7.48sResponse Time (max)7.48sResponse Time (total)7.48s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Timed out: 1Response Time (avg)237.27sResponse Time (max)237.27sResponse Time (total)237.27s
Puzzle Solving
: 5.3 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)11.54sResponse Time (max)17.37sResponse Time (total)23.08s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)15.35sResponse Time (max)15.35sResponse Time (total)15.35s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)80.79sResponse Time (max)80.79sResponse Time (total)80.79s
Anti-AI Tricks
: 7.6 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.20sResponse Time (max)1.48sResponse Time (total)3.59s
Coding
: 3.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)38.09sResponse Time (max)38.09sResponse Time (total)38.09s
Combined
: 0.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.72sResponse Time (max)3.88sResponse Time (total)5.43s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 2No answer: 1Response Time (avg)56.67sResponse Time (max)147.45sResponse Time (total)170.02s
General Intelligence
: 0.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Puzzle Solving
: 3.8 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.37sResponse Time (max)1.57sResponse Time (total)2.74s
Tool Calling
: 0.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Trivia
: 0.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Anti-AI Tricks
: 4.0 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)597msResponse Time (max)866msResponse Time (total)2.39s
Coding
: 5.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.14sResponse Time (max)1.14sResponse Time (total)1.14s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)6.48sResponse Time (max)6.48sResponse Time (total)6.48s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)601msResponse Time (max)634msResponse Time (total)1.20s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)611msResponse Time (max)616msResponse Time (total)1.83s
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)687msResponse Time (max)952msResponse Time (total)1.37s
Puzzle Solving
: 5.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)541msResponse Time (max)677msResponse Time (total)1.62s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.79sResponse Time (max)4.79sResponse Time (total)4.79s
Anti-AI Tricks
: 3.1 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)1.63sResponse Time (max)4.60sResponse Time (total)6.51s
Coding
: 6.6 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.34sResponse Time (max)2.46sResponse Time (total)4.68s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)4.22sResponse Time (max)4.22sResponse Time (total)4.22s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.13sResponse Time (max)3.35sResponse Time (total)4.26s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.11sResponse Time (max)1.89sResponse Time (total)3.32s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)947msResponse Time (max)947msResponse Time (total)947ms
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.10sResponse Time (max)1.36sResponse Time (total)2.19s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.49sResponse Time (max)2.49sResponse Time (total)2.49s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)649msResponse Time (max)649msResponse Time (total)649ms
Combined
: 6.5 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)115.89sResponse Time (max)115.89sResponse Time (total)115.89s
Data parsing and extraction
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)9.42sResponse Time (max)16.20sResponse Time (total)18.84s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)4.17sResponse Time (max)9.09sResponse Time (total)12.51s
General Intelligence
: 7.6 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.32sResponse Time (max)9.32sResponse Time (total)9.32s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.52sResponse Time (max)1.99sResponse Time (total)3.04s
Puzzle Solving
: 7.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)7.13sResponse Time (max)10.09sResponse Time (total)21.40s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.85sResponse Time (max)11.85sResponse Time (total)11.85s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)17.23sResponse Time (max)17.23sResponse Time (total)17.23s
A test is fully passed only if every run passed for that test.Wrong answer: 10Did not follow instructions: 4Response Time (avg)1.84sResponse Time (max)8.32sResponse Time (total)36.84s…
Combined
: 3.0 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)3.54sResponse Time (max)3.54sResponse Time (total)3.54s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.32sResponse Time (max)1.42sResponse Time (total)2.64s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)877msResponse Time (max)904msResponse Time (total)2.63s
General Intelligence
: 4.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.58sResponse Time (max)2.58sResponse Time (total)2.58s
Instructions following
: 6.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.03sResponse Time (max)1.10sResponse Time (total)2.06s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.30sResponse Time (max)3.30sResponse Time (total)3.30s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.89sResponse Time (max)1.89sResponse Time (total)1.89s
Anti-AI Tricks
: 4.0 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)2.11sResponse Time (max)3.94sResponse Time (total)8.46s
Coding
: 4.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)6.33sResponse Time (max)9.79sResponse Time (total)12.65s
Combined
: 2.8 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)32.57sResponse Time (max)32.57sResponse Time (total)32.57s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.08sResponse Time (max)1.62sResponse Time (total)2.15s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)1.99sResponse Time (max)3.99sResponse Time (total)5.98s
General Intelligence
: 5.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)790msResponse Time (max)790msResponse Time (total)790ms
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.58sResponse Time (max)1.69sResponse Time (total)3.17s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)10.68sResponse Time (max)10.68sResponse Time (total)10.68s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.34sResponse Time (max)2.34sResponse Time (total)2.34s
A test is fully passed only if every run passed for that test.Wrong answer: 12Did not follow instructions: 1Response Time (avg)1.46sResponse Time (max)2.95sResponse Time (total)29.23s…
Anti-AI Tricks
: 3.2 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)1.21sResponse Time (max)2.58sResponse Time (total)4.85s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.99sResponse Time (max)2.95sResponse Time (total)3.97s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.89sResponse Time (max)2.89sResponse Time (total)2.89s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.04sResponse Time (max)1.06sResponse Time (total)2.08s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.07sResponse Time (max)1.54sResponse Time (total)3.22s
General Intelligence
: 4.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.78sResponse Time (max)1.78sResponse Time (total)1.78s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.07sResponse Time (max)1.17sResponse Time (total)2.15s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.75sResponse Time (max)2.75sResponse Time (total)2.75s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)990msResponse Time (max)990msResponse Time (total)990ms
A test is fully passed only if every run passed for that test.Wrong answer: 9Did not follow instructions: 4Response Time (avg)20.89sResponse Time (max)68.16sResponse Time (total)271.54s…
Coding
: 3.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)47.24sResponse Time (max)68.16sResponse Time (total)94.49s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)31.18sResponse Time (max)31.18sResponse Time (total)31.18s
Data parsing and extraction
: 6.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.98sResponse Time (max)1.98sResponse Time (total)1.98s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)50.92sResponse Time (max)50.92sResponse Time (total)50.92s
Instructions following
: 9.9 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.63sResponse Time (max)7.63sResponse Time (total)7.63s
Tool Calling
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.91sResponse Time (max)6.91sResponse Time (total)6.91s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)26.51sResponse Time (max)26.51sResponse Time (total)26.51s
A test is fully passed only if every run passed for that test.Wrong answer: 10Did not follow instructions: 3Response Time (avg)13.86sResponse Time (max)238.89sResponse Time (total)277.18s…
Anti-AI Tricks
: 4.6 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)1.39sResponse Time (max)2.96sResponse Time (total)5.56s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)122.77sResponse Time (max)238.89sResponse Time (total)245.54s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.38sResponse Time (max)3.38sResponse Time (total)3.38s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.32sResponse Time (max)1.39sResponse Time (total)2.64s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.48sResponse Time (max)1.85sResponse Time (total)4.45s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.64sResponse Time (max)1.80sResponse Time (total)3.28s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.46sResponse Time (max)4.46sResponse Time (total)4.46s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.36sResponse Time (max)1.36sResponse Time (total)1.36s
Anti-AI Tricks
: 5.2 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)5.51sResponse Time (max)6.59sResponse Time (total)11.02s
Coding
: 5.0 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)3.35sResponse Time (max)5.57sResponse Time (total)6.70s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)3.22sResponse Time (max)3.22sResponse Time (total)3.22s
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.82sResponse Time (max)4.82sResponse Time (total)4.82s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)744msResponse Time (max)744msResponse Time (total)744ms
General Intelligence
: 4.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.59sResponse Time (max)1.59sResponse Time (total)1.59s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)888msResponse Time (max)888msResponse Time (total)888ms
Tool Calling
: 2.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)7.05sResponse Time (max)7.05sResponse Time (total)7.05s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)692msResponse Time (max)692msResponse Time (total)692ms
Anti-AI Tricks
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)3.81sResponse Time (max)6.85sResponse Time (total)15.23s
Coding
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)15.17sResponse Time (max)15.17sResponse Time (total)15.17s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)8.49sResponse Time (max)14.02sResponse Time (total)16.98s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)2.33sResponse Time (max)2.94sResponse Time (total)6.99s
Instructions following
: 6.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.82sResponse Time (max)2.92sResponse Time (total)5.65s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.02sResponse Time (max)6.02sResponse Time (total)6.02s
A test is fully passed only if every run passed for that test.Wrong answer: 11Did not follow instructions: 2Response Time (avg)2.31sResponse Time (max)6.58sResponse Time (total)46.17s…
Anti-AI Tricks
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)1.80sResponse Time (max)2.62sResponse Time (total)7.19s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.65sResponse Time (max)3.82sResponse Time (total)5.30s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)6.58sResponse Time (max)6.58sResponse Time (total)6.58s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.39sResponse Time (max)1.42sResponse Time (total)2.78s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.78sResponse Time (max)2.49sResponse Time (total)5.34s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.51sResponse Time (max)2.95sResponse Time (total)5.02s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.39sResponse Time (max)4.39sResponse Time (total)4.39s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.63sResponse Time (max)1.63sResponse Time (total)1.63s
Anti-AI Tricks
: 8.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)10.00sResponse Time (max)11.53sResponse Time (total)39.99s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)47.38sResponse Time (max)47.38sResponse Time (total)47.38s
Data parsing and extraction
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)17.36sResponse Time (max)26.57sResponse Time (total)34.71s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)128.15sResponse Time (max)309.02sResponse Time (total)384.46s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.60sResponse Time (max)14.49sResponse Time (total)23.20s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.19sResponse Time (max)11.19sResponse Time (total)11.19s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)36.98sResponse Time (max)36.98sResponse Time (total)36.98s
Coding
: 7.0 A test is fully passed only if every run passed for that test.Extra formatting: 1Response Time (avg)39.68sResponse Time (max)47.10sResponse Time (total)79.37s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)21.74sResponse Time (max)21.74sResponse Time (total)21.74s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.60sResponse Time (max)3.92sResponse Time (total)7.19s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)3.00sResponse Time (max)4.69sResponse Time (total)8.99s
Instructions following
: 6.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.63sResponse Time (max)2.77sResponse Time (total)5.27s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)22.78sResponse Time (max)22.78sResponse Time (total)22.78s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.50sResponse Time (max)2.50sResponse Time (total)2.50s
A test is fully passed only if every run passed for that test.Wrong answer: 11Did not follow instructions: 2Response Time (avg)1.67sResponse Time (max)9.39sResponse Time (total)33.38s…
Anti-AI Tricks
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)788msResponse Time (max)1.34sResponse Time (total)3.15s
Coding
: 7.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.98sResponse Time (max)2.51sResponse Time (total)3.97s
Combined
: 2.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)9.39sResponse Time (max)9.39sResponse Time (total)9.39s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.43sResponse Time (max)1.45sResponse Time (total)2.86s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)540msResponse Time (max)649msResponse Time (total)1.62s
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)815msResponse Time (max)973msResponse Time (total)1.63s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.54sResponse Time (max)3.54sResponse Time (total)3.54s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)599msResponse Time (max)599msResponse Time (total)599ms
Anti-AI Tricks
: 3.8 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)2.83sResponse Time (max)7.62sResponse Time (total)11.33s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)5.75sResponse Time (max)10.18sResponse Time (total)11.51s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)9.95sResponse Time (max)9.95sResponse Time (total)9.95s
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.06sResponse Time (max)2.39sResponse Time (total)4.11s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.03sResponse Time (max)4.83sResponse Time (total)9.08s
Instructions following
: 6.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.92sResponse Time (max)1.94sResponse Time (total)3.83s
Tool Calling
: 9.5 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.74sResponse Time (max)6.74sResponse Time (total)6.74s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.03sResponse Time (max)4.03sResponse Time (total)4.03s
A test is fully passed only if every run passed for that test.Wrong answer: 11Did not follow instructions: 2Response Time (avg)3.50sResponse Time (max)47.43sResponse Time (total)69.99s…
Anti-AI Tricks
: 3.4 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)1.43sResponse Time (max)4.39sResponse Time (total)5.71s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.72sResponse Time (max)2.67sResponse Time (total)3.43s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)47.43sResponse Time (max)47.43sResponse Time (total)47.43s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.16sResponse Time (max)1.42sResponse Time (total)2.33s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)485msResponse Time (max)549msResponse Time (total)1.45s
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)809msResponse Time (max)983msResponse Time (total)1.62s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.30sResponse Time (max)2.30sResponse Time (total)2.30s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)493msResponse Time (max)493msResponse Time (total)493ms
A test is fully passed only if every run passed for that test.Wrong answer: 11Did not follow instructions: 2Response Time (avg)4.58sResponse Time (max)33.34sResponse Time (total)91.55s…
Anti-AI Tricks
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)1.88sResponse Time (max)4.81sResponse Time (total)7.53s
Combined
: 2.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)13.32sResponse Time (max)13.32sResponse Time (total)13.32s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.82sResponse Time (max)3.86sResponse Time (total)5.65s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)4.43sResponse Time (max)10.83sResponse Time (total)13.28s
Instructions following
: 6.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.17sResponse Time (max)1.33sResponse Time (total)2.35s
Puzzle Solving
: 6.7 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)2.03sResponse Time (max)3.60sResponse Time (total)6.09s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.42sResponse Time (max)4.42sResponse Time (total)4.42s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)33.34sResponse Time (max)33.34sResponse Time (total)33.34s
A test is fully passed only if every run passed for that test.Wrong answer: 9Did not follow instructions: 3Response Time (avg)11.59sResponse Time (max)58.63sResponse Time (total)231.84s…
Anti-AI Tricks
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)3.97sResponse Time (max)7.48sResponse Time (total)15.89s
Coding
: 6.6 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)19.08sResponse Time (max)30.81sResponse Time (total)38.16s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)10.01sResponse Time (max)10.01sResponse Time (total)10.01s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)21.64sResponse Time (max)29.16sResponse Time (total)43.28s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)8.58sResponse Time (max)9.48sResponse Time (total)25.74s
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)9.59sResponse Time (max)15.94sResponse Time (total)19.18s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)8.26sResponse Time (max)8.26sResponse Time (total)8.26s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.38sResponse Time (max)2.38sResponse Time (total)2.38s
A test is fully passed only if every run passed for that test.Wrong answer: 12Response Time (avg)2.44sResponse Time (max)6.70sResponse Time (total)48.71s…
Anti-AI Tricks
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)2.43sResponse Time (max)6.70sResponse Time (total)9.73s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.95sResponse Time (max)4.61sResponse Time (total)5.89s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)6.59sResponse Time (max)6.59sResponse Time (total)6.59s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.82sResponse Time (max)1.97sResponse Time (total)3.63s
Domain specific
: 3.6 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)1.33sResponse Time (max)1.53sResponse Time (total)4.00s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.45sResponse Time (max)3.45sResponse Time (total)3.45s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.06sResponse Time (max)1.09sResponse Time (total)2.12s
Puzzle Solving
: 5.2 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)2.46sResponse Time (max)4.23sResponse Time (total)7.37s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.94sResponse Time (max)3.94sResponse Time (total)3.94s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.96sResponse Time (max)1.96sResponse Time (total)1.96s
A test is fully passed only if every run passed for that test.Wrong answer: 10Did not follow instructions: 2Response Time (avg)3.02sResponse Time (max)6.51sResponse Time (total)60.34s…
Anti-AI Tricks
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)3.13sResponse Time (max)5.90sResponse Time (total)12.50s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.77sResponse Time (max)5.30sResponse Time (total)7.54s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)6.51sResponse Time (max)6.51sResponse Time (total)6.51s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.81sResponse Time (max)5.69sResponse Time (total)7.62s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)2.09sResponse Time (max)2.39sResponse Time (total)6.26s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.97sResponse Time (max)2.43sResponse Time (total)3.93s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.86sResponse Time (max)4.86sResponse Time (total)4.86s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.23sResponse Time (max)2.23sResponse Time (total)2.23s
A test is fully passed only if every run passed for that test.Wrong answer: 12Response Time (avg)2.99sResponse Time (max)13.73sResponse Time (total)59.73s…
Anti-AI Tricks
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)1.32sResponse Time (max)3.89sResponse Time (total)5.30s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)993msResponse Time (max)1.29sResponse Time (total)1.99s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)6.22sResponse Time (max)6.22sResponse Time (total)6.22s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.57sResponse Time (max)1.83sResponse Time (total)3.14s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)905msResponse Time (max)1.10sResponse Time (total)2.71s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)803msResponse Time (max)803msResponse Time (total)803ms
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)8.81sResponse Time (max)13.73sResponse Time (total)17.61s
Puzzle Solving
: 3.1 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)5.90sResponse Time (max)12.19sResponse Time (total)17.69s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.67sResponse Time (max)3.67sResponse Time (total)3.67s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)588msResponse Time (max)588msResponse Time (total)588ms
Anti-AI Tricks
: 8.3 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)7.85sResponse Time (max)22.30sResponse Time (total)31.40s
Coding
: 3.1 A test is fully passed only if every run passed for that test.API error: 1Did not follow instructions: 1Response Time (avg)62.38sResponse Time (max)62.38sResponse Time (total)62.38s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)87.80sResponse Time (max)87.80sResponse Time (total)87.80s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)18.16sResponse Time (max)20.65sResponse Time (total)36.33s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Timed out: 1Response Time (avg)16.19sResponse Time (max)21.56sResponse Time (total)32.39s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)39.75sResponse Time (max)39.75sResponse Time (total)39.75s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)55.32sResponse Time (max)55.32sResponse Time (total)55.32s
Coding
: 5.4 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)8.27sResponse Time (max)14.69sResponse Time (total)16.54s
Combined
: 9.5 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)25.49sResponse Time (max)25.49sResponse Time (total)25.49s
Data parsing and extraction
: 8.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)30.54sResponse Time (max)58.65sResponse Time (total)61.08s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)3.17sResponse Time (max)6.59sResponse Time (total)9.52s
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)8.23sResponse Time (max)13.43sResponse Time (total)16.45s
Puzzle Solving
: 7.6 A test is fully passed only if every run passed for that test.Extra formatting: 1Response Time (avg)19.72sResponse Time (max)38.42sResponse Time (total)59.15s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.92sResponse Time (max)5.92sResponse Time (total)5.92s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)15.59sResponse Time (max)15.59sResponse Time (total)15.59s
Anti-AI Tricks
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)25.50sResponse Time (max)37.73sResponse Time (total)51.00s
Coding
: 5.4 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)47.80sResponse Time (max)54.86sResponse Time (total)95.59s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)65.96sResponse Time (max)65.96sResponse Time (total)65.96s
Data parsing and extraction
: 3.7 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)21.42sResponse Time (max)21.42sResponse Time (total)21.42s
Domain specific
: 5.2 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)204.02sResponse Time (max)204.02sResponse Time (total)204.02s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.90sResponse Time (max)11.90sResponse Time (total)11.90s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)33.30sResponse Time (max)33.30sResponse Time (total)33.30s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)20.13sResponse Time (max)20.13sResponse Time (total)20.13s