A test is fully passed only if every run passed for that test.Wrong answer: 11Did not follow instructions: 2Response Time (avg)2.27sResponse Time (max)6.58sResponse Time (total)45.50s…
Anti-AI Tricks
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)1.80sResponse Time (max)2.62sResponse Time (total)7.19s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.65sResponse Time (max)3.82sResponse Time (total)5.30s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)6.58sResponse Time (max)6.58sResponse Time (total)6.58s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.39sResponse Time (max)1.42sResponse Time (total)2.78s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.78sResponse Time (max)2.49sResponse Time (total)5.34s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.51sResponse Time (max)2.95sResponse Time (total)5.02s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.39sResponse Time (max)4.39sResponse Time (total)4.39s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.63sResponse Time (max)1.63sResponse Time (total)1.63s
A test is fully passed only if every run passed for that test.Wrong answer: 11Response Time (avg)2.40sResponse Time (max)6.65sResponse Time (total)33.56s…
Anti-AI Tricks
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)1.91sResponse Time (max)2.74sResponse Time (total)3.82s
Coding
: 4.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)2.54sResponse Time (max)3.63sResponse Time (total)5.09s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)6.65sResponse Time (max)6.65sResponse Time (total)6.65s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.89sResponse Time (max)1.89sResponse Time (total)1.89s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.17sResponse Time (max)1.44sResponse Time (total)2.33s
General Intelligence
: 4.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.26sResponse Time (max)2.26sResponse Time (total)2.26s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.67sResponse Time (max)1.67sResponse Time (total)1.67s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.71sResponse Time (max)3.29sResponse Time (total)5.41s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.33sResponse Time (max)3.33sResponse Time (total)3.33s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.11sResponse Time (max)1.11sResponse Time (total)1.11s
Coding
: 5.1 A test is fully passed only if every run passed for that test.Extra formatting: 1Wrong answer: 1Response Time (avg)2.75sResponse Time (max)3.79sResponse Time (total)5.50s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)5.96sResponse Time (max)5.96sResponse Time (total)5.96s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.76sResponse Time (max)2.60sResponse Time (total)3.51s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)2.10sResponse Time (max)3.58sResponse Time (total)6.30s
General Intelligence
: 4.1 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.33sResponse Time (max)2.33sResponse Time (total)2.33s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.26sResponse Time (max)6.81sResponse Time (total)8.51s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.16sResponse Time (max)1.55sResponse Time (total)3.48s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.40sResponse Time (max)5.40sResponse Time (total)5.40s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.30sResponse Time (max)1.30sResponse Time (total)1.30s
A test is fully passed only if every run passed for that test.Wrong answer: 12Response Time (avg)2.48sResponse Time (max)6.70sResponse Time (total)49.67s…
Anti-AI Tricks
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)2.43sResponse Time (max)6.70sResponse Time (total)9.73s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.95sResponse Time (max)4.61sResponse Time (total)5.89s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)6.59sResponse Time (max)6.59sResponse Time (total)6.59s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.82sResponse Time (max)1.97sResponse Time (total)3.63s
Domain specific
: 3.6 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)1.33sResponse Time (max)1.53sResponse Time (total)4.00s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.45sResponse Time (max)3.45sResponse Time (total)3.45s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.06sResponse Time (max)1.09sResponse Time (total)2.12s
Puzzle Solving
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)2.78sResponse Time (max)5.20sResponse Time (total)8.34s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.94sResponse Time (max)3.94sResponse Time (total)3.94s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.96sResponse Time (max)1.96sResponse Time (total)1.96s
Anti-AI Tricks
: 3.2 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)1.19sResponse Time (max)2.73sResponse Time (total)4.76s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.87sResponse Time (max)2.87sResponse Time (total)2.87s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)564msResponse Time (max)564msResponse Time (total)564ms
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)857msResponse Time (max)955msResponse Time (total)1.71s
Puzzle Solving
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.86sResponse Time (max)2.70sResponse Time (total)3.71s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.28sResponse Time (max)2.28sResponse Time (total)2.28s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.82sResponse Time (max)1.82sResponse Time (total)1.82s
A test is fully passed only if every run passed for that test.Wrong answer: 6Did not follow instructions: 1Response Time (avg)2.85sResponse Time (max)11.91sResponse Time (total)57.08s…
Anti-AI Tricks
: 8.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.12sResponse Time (max)3.18sResponse Time (total)8.50s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.56sResponse Time (max)2.20sResponse Time (total)3.13s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)11.91sResponse Time (max)11.91sResponse Time (total)11.91s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.00sResponse Time (max)3.74sResponse Time (total)5.99s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)2.36sResponse Time (max)3.51sResponse Time (total)7.07s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.49sResponse Time (max)1.66sResponse Time (total)2.99s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.69sResponse Time (max)1.89sResponse Time (total)5.08s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.54sResponse Time (max)9.54sResponse Time (total)9.54s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.35sResponse Time (max)1.35sResponse Time (total)1.35s
A test is fully passed only if every run passed for that test.Wrong answer: 12Did not follow instructions: 2Response Time (avg)2.86sResponse Time (max)8.21sResponse Time (total)57.24s…
Anti-AI Tricks
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)2.84sResponse Time (max)4.15sResponse Time (total)11.35s
Coding
: 4.4 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)2.58sResponse Time (max)3.93sResponse Time (total)5.16s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.89sResponse Time (max)4.89sResponse Time (total)4.89s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.47sResponse Time (max)2.48sResponse Time (total)4.95s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.97sResponse Time (max)2.65sResponse Time (total)5.92s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.13sResponse Time (max)2.53sResponse Time (total)4.27s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)8.21sResponse Time (max)8.21sResponse Time (total)8.21s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.37sResponse Time (max)2.37sResponse Time (total)2.37s
Anti-AI Tricks
: 3.4 A test is fully passed only if every run passed for that test.Wrong answer: 3API error: 1Response Time (avg)705msResponse Time (max)975msResponse Time (total)2.12s
Coding
: 7.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.93sResponse Time (max)2.93sResponse Time (total)2.93s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)4.32sResponse Time (max)4.32sResponse Time (total)4.32s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.37sResponse Time (max)5.76sResponse Time (total)6.73s
Domain specific
: 3.6 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)5.50sResponse Time (max)15.42sResponse Time (total)16.50s
General Intelligence
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)683msResponse Time (max)691msResponse Time (total)1.37s
Puzzle Solving
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 2API error: 1Response Time (avg)891msResponse Time (max)1.21sResponse Time (total)1.78s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.54sResponse Time (max)7.54sResponse Time (total)7.54s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
A test is fully passed only if every run passed for that test.Wrong answer: 9Did not follow instructions: 1Response Time (avg)2.95sResponse Time (max)29.38sResponse Time (total)58.96s…
Anti-AI Tricks
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.38sResponse Time (max)2.69sResponse Time (total)5.51s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.77sResponse Time (max)4.39sResponse Time (total)5.54s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)29.38sResponse Time (max)29.38sResponse Time (total)29.38s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.43sResponse Time (max)1.57sResponse Time (total)2.86s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)868msResponse Time (max)1.02sResponse Time (total)2.60s
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)929msResponse Time (max)1.05sResponse Time (total)1.86s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.71sResponse Time (max)2.65sResponse Time (total)5.13s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.54sResponse Time (max)3.54sResponse Time (total)3.54s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.21sResponse Time (max)1.21sResponse Time (total)1.21s
Anti-AI Tricks
: 3.1 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)2.07sResponse Time (max)4.40sResponse Time (total)8.30s
Coding
: 4.0 A test is fully passed only if every run passed for that test.API error: 1Wrong answer: 1Response Time (avg)14.34sResponse Time (max)14.34sResponse Time (total)14.34s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)8.91sResponse Time (max)8.91sResponse Time (total)8.91s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.26sResponse Time (max)4.66sResponse Time (total)6.52s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)877msResponse Time (max)894msResponse Time (total)2.63s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.67sResponse Time (max)6.67sResponse Time (total)6.67s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)777msResponse Time (max)777msResponse Time (total)777ms
A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)2.98sResponse Time (max)6.44sResponse Time (total)59.59s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.52sResponse Time (max)5.40sResponse Time (total)10.08s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)5.54sResponse Time (max)5.59sResponse Time (total)11.08s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.44sResponse Time (max)6.44sResponse Time (total)6.44s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.81sResponse Time (max)2.32sResponse Time (total)3.63s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.39sResponse Time (max)4.44sResponse Time (total)10.16s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.27sResponse Time (max)2.27sResponse Time (total)2.27s
Instructions following
: 9.9 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.86sResponse Time (max)2.10sResponse Time (total)3.73s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.35sResponse Time (max)3.25sResponse Time (total)7.06s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.27sResponse Time (max)3.27sResponse Time (total)3.27s
Trivia
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.88sResponse Time (max)1.88sResponse Time (total)1.88s
Anti-AI Tricks
: 5.2 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)5.51sResponse Time (max)6.59sResponse Time (total)11.02s
Coding
: 5.0 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)3.35sResponse Time (max)5.57sResponse Time (total)6.70s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)3.22sResponse Time (max)3.22sResponse Time (total)3.22s
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.82sResponse Time (max)4.82sResponse Time (total)4.82s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)744msResponse Time (max)744msResponse Time (total)744ms
General Intelligence
: 4.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.59sResponse Time (max)1.59sResponse Time (total)1.59s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)888msResponse Time (max)888msResponse Time (total)888ms
Tool Calling
: 2.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)7.05sResponse Time (max)7.05sResponse Time (total)7.05s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)692msResponse Time (max)692msResponse Time (total)692ms
A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)3.02sResponse Time (max)18.27sResponse Time (total)57.44s…
Anti-AI Tricks
: 8.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.12sResponse Time (max)3.75sResponse Time (total)8.50s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.84sResponse Time (max)2.84sResponse Time (total)2.84s
Combined
: 9.5 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)18.27sResponse Time (max)18.27sResponse Time (total)18.27s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.15sResponse Time (max)2.33sResponse Time (total)4.29s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.19sResponse Time (max)1.40sResponse Time (total)3.58s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.47sResponse Time (max)3.47sResponse Time (total)3.47s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.46sResponse Time (max)1.68sResponse Time (total)2.91s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.46sResponse Time (max)3.72sResponse Time (total)7.38s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.74sResponse Time (max)4.74sResponse Time (total)4.74s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.46sResponse Time (max)1.46sResponse Time (total)1.46s
A test is fully passed only if every run passed for that test.Wrong answer: 10Did not follow instructions: 2Response Time (avg)3.04sResponse Time (max)6.51sResponse Time (total)60.88s…
Anti-AI Tricks
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)3.13sResponse Time (max)5.90sResponse Time (total)12.50s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.77sResponse Time (max)5.30sResponse Time (total)7.54s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)6.51sResponse Time (max)6.51sResponse Time (total)6.51s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.81sResponse Time (max)5.69sResponse Time (total)7.62s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)2.09sResponse Time (max)2.39sResponse Time (total)6.26s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.97sResponse Time (max)2.43sResponse Time (total)3.93s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.86sResponse Time (max)4.86sResponse Time (total)4.86s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.23sResponse Time (max)2.23sResponse Time (total)2.23s
A test is fully passed only if every run passed for that test.Wrong answer: 6Did not follow instructions: 1Response Time (avg)3.18sResponse Time (max)10.87sResponse Time (total)63.55s…
Anti-AI Tricks
: 9.1 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)2.39sResponse Time (max)3.58sResponse Time (total)9.57s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.59sResponse Time (max)3.93sResponse Time (total)7.19s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)10.87sResponse Time (max)10.87sResponse Time (total)10.87s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.60sResponse Time (max)2.69sResponse Time (total)5.19s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)3.16sResponse Time (max)3.89sResponse Time (total)9.49s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.60sResponse Time (max)2.60sResponse Time (total)2.60s
Instructions following
: 9.9 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.59sResponse Time (max)3.04sResponse Time (total)5.17s
Puzzle Solving
: 7.6 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.95sResponse Time (max)2.48sResponse Time (total)5.84s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.55sResponse Time (max)4.55sResponse Time (total)4.55s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.08sResponse Time (max)3.08sResponse Time (total)3.08s
A test is fully passed only if every run passed for that test.Wrong answer: 9Response Time (avg)3.31sResponse Time (max)20.51sResponse Time (total)66.17s…
Anti-AI Tricks
: 5.2 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)2.63sResponse Time (max)5.57sResponse Time (total)10.53s
Coding
: 4.2 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)3.06sResponse Time (max)3.45sResponse Time (total)6.12s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)20.51sResponse Time (max)20.51sResponse Time (total)20.51s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.87sResponse Time (max)3.54sResponse Time (total)5.74s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.22sResponse Time (max)1.25sResponse Time (total)3.67s
General Intelligence
: 4.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.62sResponse Time (max)1.62sResponse Time (total)1.62s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.40sResponse Time (max)1.46sResponse Time (total)2.79s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.65sResponse Time (max)3.59sResponse Time (total)7.94s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.27sResponse Time (max)5.27sResponse Time (total)5.27s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.97sResponse Time (max)1.97sResponse Time (total)1.97s
A test is fully passed only if every run passed for that test.Wrong answer: 12Did not follow instructions: 2Response Time (avg)3.38sResponse Time (max)46.00sResponse Time (total)67.55s…
Anti-AI Tricks
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)1.59sResponse Time (max)3.60sResponse Time (total)6.38s
Coding
: 4.0 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)2.14sResponse Time (max)3.44sResponse Time (total)4.29s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)46.00sResponse Time (max)46.00sResponse Time (total)46.00s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.01sResponse Time (max)1.06sResponse Time (total)2.02s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)465msResponse Time (max)492msResponse Time (total)1.39s
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)513msResponse Time (max)570msResponse Time (total)1.03s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.04sResponse Time (max)2.04sResponse Time (total)2.04s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)295msResponse Time (max)295msResponse Time (total)295ms
A test is fully passed only if every run passed for that test.Wrong answer: 11Did not follow instructions: 2Response Time (avg)3.50sResponse Time (max)47.43sResponse Time (total)70.00s…
Anti-AI Tricks
: 3.4 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)1.43sResponse Time (max)4.39sResponse Time (total)5.71s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.72sResponse Time (max)2.67sResponse Time (total)3.43s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)47.43sResponse Time (max)47.43sResponse Time (total)47.43s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.16sResponse Time (max)1.42sResponse Time (total)2.33s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)485msResponse Time (max)549msResponse Time (total)1.45s
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)809msResponse Time (max)983msResponse Time (total)1.62s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.30sResponse Time (max)2.30sResponse Time (total)2.30s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)493msResponse Time (max)493msResponse Time (total)493ms
Anti-AI Tricks
: 6.5 A test is fully passed only if every run passed for that test.Extra formatting: 2Response Time (avg)3.40sResponse Time (max)6.36sResponse Time (total)13.58s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)3.59sResponse Time (max)4.34sResponse Time (total)7.17s
Combined
: 9.5 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)17.73sResponse Time (max)17.73sResponse Time (total)17.73s
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.77sResponse Time (max)1.93sResponse Time (total)3.53s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.66sResponse Time (max)2.16sResponse Time (total)4.99s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.48sResponse Time (max)3.48sResponse Time (total)3.48s
Instructions following
: 9.9 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.37sResponse Time (max)1.40sResponse Time (total)2.73s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Extra formatting: 1Response Time (avg)2.74sResponse Time (max)3.46sResponse Time (total)8.22s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.35sResponse Time (max)5.35sResponse Time (total)5.35s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)3.41sResponse Time (max)3.41sResponse Time (total)3.41s
A test is fully passed only if every run passed for that test.Wrong answer: 12Response Time (avg)3.74sResponse Time (max)27.18sResponse Time (total)74.71s…
Anti-AI Tricks
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)1.32sResponse Time (max)3.89sResponse Time (total)5.30s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)993msResponse Time (max)1.29sResponse Time (total)1.99s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)6.22sResponse Time (max)6.22sResponse Time (total)6.22s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.57sResponse Time (max)1.83sResponse Time (total)3.14s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)905msResponse Time (max)1.10sResponse Time (total)2.71s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)803msResponse Time (max)803msResponse Time (total)803ms
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)8.81sResponse Time (max)13.73sResponse Time (total)17.61s
Puzzle Solving
: 3.1 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)10.89sResponse Time (max)27.18sResponse Time (total)32.68s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.67sResponse Time (max)3.67sResponse Time (total)3.67s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)588msResponse Time (max)588msResponse Time (total)588ms
Anti-AI Tricks
: 3.6 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)2.10sResponse Time (max)6.15sResponse Time (total)8.41s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)12.29sResponse Time (max)22.52sResponse Time (total)24.58s
Combined
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.46sResponse Time (max)2.03sResponse Time (total)2.93s
Domain specific
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)7.45sResponse Time (max)12.46sResponse Time (total)22.35s
General Intelligence
: 4.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.51sResponse Time (max)3.51sResponse Time (total)3.51s
Instructions following
: 6.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.86sResponse Time (max)2.83sResponse Time (total)3.73s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)414msResponse Time (max)414msResponse Time (total)414ms
Anti-AI Tricks
: 3.8 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)2.83sResponse Time (max)7.62sResponse Time (total)11.33s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)5.75sResponse Time (max)10.18sResponse Time (total)11.51s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)9.95sResponse Time (max)9.95sResponse Time (total)9.95s
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.06sResponse Time (max)2.39sResponse Time (total)4.11s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.03sResponse Time (max)4.83sResponse Time (total)9.08s
Instructions following
: 6.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.92sResponse Time (max)1.94sResponse Time (total)3.83s
Tool Calling
: 9.5 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.74sResponse Time (max)6.74sResponse Time (total)6.74s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.03sResponse Time (max)4.03sResponse Time (total)4.03s
A test is fully passed only if every run passed for that test.Wrong answer: 6Did not follow instructions: 1Response Time (avg)3.94sResponse Time (max)14.93sResponse Time (total)78.74s…
Anti-AI Tricks
: 9.1 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)2.33sResponse Time (max)3.89sResponse Time (total)9.30s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.98sResponse Time (max)4.34sResponse Time (total)7.95s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)14.93sResponse Time (max)14.93sResponse Time (total)14.93s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.29sResponse Time (max)2.31sResponse Time (total)4.59s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)4.21sResponse Time (max)5.86sResponse Time (total)12.62s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.16sResponse Time (max)3.16sResponse Time (total)3.16s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.91sResponse Time (max)1.93sResponse Time (total)3.82s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)5.30sResponse Time (max)9.55sResponse Time (total)15.89s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.80sResponse Time (max)3.80sResponse Time (total)3.80s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.68sResponse Time (max)2.68sResponse Time (total)2.68s
A test is fully passed only if every run passed for that test.Wrong answer: 11Response Time (avg)3.95sResponse Time (max)11.07sResponse Time (total)51.38s…
Anti-AI Tricks
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)2.37sResponse Time (max)3.39sResponse Time (total)4.75s
Coding
: 4.6 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)5.18sResponse Time (max)8.84sResponse Time (total)10.37s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.98sResponse Time (max)4.98sResponse Time (total)4.98s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.78sResponse Time (max)5.78sResponse Time (total)5.78s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)2.24sResponse Time (max)2.24sResponse Time (total)2.24s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.27sResponse Time (max)3.27sResponse Time (total)3.27s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.48sResponse Time (max)1.48sResponse Time (total)1.48s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.91sResponse Time (max)2.08sResponse Time (total)3.82s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.07sResponse Time (max)11.07sResponse Time (total)11.07s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.62sResponse Time (max)3.62sResponse Time (total)3.62s
Anti-AI Tricks
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.85sResponse Time (max)4.45sResponse Time (total)7.40s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)14.84sResponse Time (max)26.13sResponse Time (total)29.68s
Combined
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.25sResponse Time (max)3.02sResponse Time (total)4.51s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.22sResponse Time (max)4.68sResponse Time (total)9.67s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.09sResponse Time (max)2.09sResponse Time (total)2.09s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.84sResponse Time (max)4.45sResponse Time (total)5.68s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.25sResponse Time (max)1.25sResponse Time (total)1.25s
A test is fully passed only if every run passed for that test.Wrong answer: 12Invalid tool call: 1Response Time (avg)4.20sResponse Time (max)32.57sResponse Time (total)83.95s…
Anti-AI Tricks
: 4.0 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)2.11sResponse Time (max)3.94sResponse Time (total)8.46s
Coding
: 4.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)6.33sResponse Time (max)9.79sResponse Time (total)12.65s
Combined
: 2.8 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)32.57sResponse Time (max)32.57sResponse Time (total)32.57s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.08sResponse Time (max)1.62sResponse Time (total)2.15s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)1.99sResponse Time (max)3.99sResponse Time (total)5.98s
General Intelligence
: 5.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)790msResponse Time (max)790msResponse Time (total)790ms
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.98sResponse Time (max)2.28sResponse Time (total)3.97s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.45sResponse Time (max)2.09sResponse Time (total)4.36s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)10.68sResponse Time (max)10.68sResponse Time (total)10.68s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.34sResponse Time (max)2.34sResponse Time (total)2.34s
A test is fully passed only if every run passed for that test.Wrong answer: 2Did not follow instructions: 1Response Time (avg)4.29sResponse Time (max)12.05sResponse Time (total)85.72s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.09sResponse Time (max)2.56sResponse Time (total)8.35s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)9.91sResponse Time (max)11.59sResponse Time (total)19.82s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)12.05sResponse Time (max)12.05sResponse Time (total)12.05s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.07sResponse Time (max)5.60sResponse Time (total)8.14s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)5.24sResponse Time (max)6.43sResponse Time (total)15.73s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.52sResponse Time (max)2.52sResponse Time (total)2.52s
Instructions following
: 9.9 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.70sResponse Time (max)3.07sResponse Time (total)5.40s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.38sResponse Time (max)2.55sResponse Time (total)7.15s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.81sResponse Time (max)3.81sResponse Time (total)3.81s
Trivia
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.75sResponse Time (max)2.75sResponse Time (total)2.75s
A test is fully passed only if every run passed for that test.Wrong answer: 2Timed out: 1Response Time (avg)4.48sResponse Time (max)23.18sResponse Time (total)85.21s…
Anti-AI Tricks
: 8.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.85sResponse Time (max)2.71sResponse Time (total)7.38s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)14.79sResponse Time (max)23.18sResponse Time (total)29.59s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)21.45sResponse Time (max)21.45sResponse Time (total)21.45s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.37sResponse Time (max)3.30sResponse Time (total)4.74s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)1.17sResponse Time (max)1.40sResponse Time (total)2.35s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.87sResponse Time (max)2.87sResponse Time (total)2.87s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.57sResponse Time (max)1.66sResponse Time (total)3.14s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.43sResponse Time (max)2.89sResponse Time (total)7.28s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.17sResponse Time (max)4.17sResponse Time (total)4.17s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.25sResponse Time (max)2.25sResponse Time (total)2.25s
A test is fully passed only if every run passed for that test.Wrong answer: 11Did not follow instructions: 2Response Time (avg)4.57sResponse Time (max)33.34sResponse Time (total)91.37s…
Anti-AI Tricks
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)1.88sResponse Time (max)4.81sResponse Time (total)7.53s
Combined
: 2.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)13.32sResponse Time (max)13.32sResponse Time (total)13.32s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.82sResponse Time (max)3.86sResponse Time (total)5.65s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)4.43sResponse Time (max)10.83sResponse Time (total)13.28s
Instructions following
: 6.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.17sResponse Time (max)1.33sResponse Time (total)2.35s
Puzzle Solving
: 6.7 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.97sResponse Time (max)3.43sResponse Time (total)5.91s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.42sResponse Time (max)4.42sResponse Time (total)4.42s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)33.34sResponse Time (max)33.34sResponse Time (total)33.34s
Anti-AI Tricks
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)3.81sResponse Time (max)6.85sResponse Time (total)15.23s
Coding
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)15.17sResponse Time (max)15.17sResponse Time (total)15.17s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)8.49sResponse Time (max)14.02sResponse Time (total)16.98s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)2.33sResponse Time (max)2.94sResponse Time (total)6.99s
Instructions following
: 6.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.82sResponse Time (max)2.92sResponse Time (total)5.65s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.02sResponse Time (max)6.02sResponse Time (total)6.02s