Anti-AI Tricks
: 3.1 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)1.71sResponse Time (max)3.79sResponse Time (total)6.84s
Coding
: 5.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)5.69sResponse Time (max)5.69sResponse Time (total)5.69s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)5.91sResponse Time (max)5.91sResponse Time (total)5.91s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)847msResponse Time (max)1.09sResponse Time (total)1.69s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)464msResponse Time (max)622msResponse Time (total)1.39s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)514msResponse Time (max)582msResponse Time (total)1.03s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.27sResponse Time (max)1.27sResponse Time (total)1.27s
Anti-AI Tricks
: 3.1 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)1.63sResponse Time (max)4.60sResponse Time (total)6.51s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.23sResponse Time (max)2.23sResponse Time (total)2.23s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)4.22sResponse Time (max)4.22sResponse Time (total)4.22s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.13sResponse Time (max)3.35sResponse Time (total)4.26s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.11sResponse Time (max)1.89sResponse Time (total)3.32s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)947msResponse Time (max)947msResponse Time (total)947ms
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.10sResponse Time (max)1.36sResponse Time (total)2.19s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.49sResponse Time (max)2.49sResponse Time (total)2.49s
A test is fully passed only if every run passed for that test.Wrong answer: 9Did not follow instructions: 2Response Time (avg)1.74sResponse Time (max)9.39sResponse Time (total)31.32s…
Anti-AI Tricks
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)788msResponse Time (max)1.34sResponse Time (total)3.15s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.51sResponse Time (max)2.51sResponse Time (total)2.51s
Combined
: 2.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)9.39sResponse Time (max)9.39sResponse Time (total)9.39s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.43sResponse Time (max)1.45sResponse Time (total)2.86s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)540msResponse Time (max)649msResponse Time (total)1.62s
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)815msResponse Time (max)973msResponse Time (total)1.63s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.54sResponse Time (max)3.54sResponse Time (total)3.54s
A test is fully passed only if every run passed for that test.Wrong answer: 9Response Time (avg)2.60sResponse Time (max)6.65sResponse Time (total)31.23s…
Anti-AI Tricks
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)1.91sResponse Time (max)2.74sResponse Time (total)3.82s
Coding
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.63sResponse Time (max)3.63sResponse Time (total)3.63s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)6.65sResponse Time (max)6.65sResponse Time (total)6.65s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.89sResponse Time (max)1.89sResponse Time (total)1.89s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.17sResponse Time (max)1.44sResponse Time (total)2.33s
General Intelligence
: 4.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.26sResponse Time (max)2.26sResponse Time (total)2.26s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.67sResponse Time (max)1.67sResponse Time (total)1.67s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.82sResponse Time (max)3.52sResponse Time (total)5.65s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.33sResponse Time (max)3.33sResponse Time (total)3.33s
A test is fully passed only if every run passed for that test.Wrong answer: 10Did not follow instructions: 2Response Time (avg)2.87sResponse Time (max)12.46sResponse Time (total)46.00s…
Anti-AI Tricks
: 3.6 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)2.10sResponse Time (max)6.15sResponse Time (total)8.41s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.05sResponse Time (max)2.05sResponse Time (total)2.05s
Combined
: 0.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.46sResponse Time (max)2.03sResponse Time (total)2.93s
Domain specific
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)7.45sResponse Time (max)12.46sResponse Time (total)22.35s
General Intelligence
: 4.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.51sResponse Time (max)3.51sResponse Time (total)3.51s
Instructions following
: 6.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.86sResponse Time (max)2.83sResponse Time (total)3.73s
Tool Calling
: 0.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
A test is fully passed only if every run passed for that test.Wrong answer: 10Did not follow instructions: 1Response Time (avg)3.18sResponse Time (max)13.32sResponse Time (total)57.24s…
Anti-AI Tricks
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)1.88sResponse Time (max)4.81sResponse Time (total)7.53s
Coding
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.20sResponse Time (max)3.20sResponse Time (total)3.20s
Combined
: 2.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)13.32sResponse Time (max)13.32sResponse Time (total)13.32s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.82sResponse Time (max)3.86sResponse Time (total)5.65s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)4.43sResponse Time (max)10.83sResponse Time (total)13.28s
Instructions following
: 6.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.17sResponse Time (max)1.33sResponse Time (total)2.35s
Puzzle Solving
: 6.7 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)2.03sResponse Time (max)3.60sResponse Time (total)6.09s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.42sResponse Time (max)4.42sResponse Time (total)4.42s
A test is fully passed only if every run passed for that test.Wrong answer: 10Response Time (avg)3.25sResponse Time (max)13.73sResponse Time (total)58.44s…
Anti-AI Tricks
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)1.32sResponse Time (max)3.89sResponse Time (total)5.30s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.29sResponse Time (max)1.29sResponse Time (total)1.29s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)6.22sResponse Time (max)6.22sResponse Time (total)6.22s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.57sResponse Time (max)1.83sResponse Time (total)3.14s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)905msResponse Time (max)1.10sResponse Time (total)2.71s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)803msResponse Time (max)803msResponse Time (total)803ms
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)8.81sResponse Time (max)13.73sResponse Time (total)17.61s
Puzzle Solving
: 3.1 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)5.90sResponse Time (max)12.19sResponse Time (total)17.69s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.67sResponse Time (max)3.67sResponse Time (total)3.67s
A test is fully passed only if every run passed for that test.Wrong answer: 7Response Time (avg)3.38sResponse Time (max)20.51sResponse Time (total)60.83s…
Anti-AI Tricks
: 5.2 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)2.63sResponse Time (max)5.57sResponse Time (total)10.53s
Coding
: 5.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.45sResponse Time (max)3.45sResponse Time (total)3.45s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)20.51sResponse Time (max)20.51sResponse Time (total)20.51s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.87sResponse Time (max)3.54sResponse Time (total)5.74s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.22sResponse Time (max)1.25sResponse Time (total)3.67s
General Intelligence
: 4.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.62sResponse Time (max)1.62sResponse Time (total)1.62s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.45sResponse Time (max)1.56sResponse Time (total)2.89s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.38sResponse Time (max)2.80sResponse Time (total)7.15s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.27sResponse Time (max)5.27sResponse Time (total)5.27s
A test is fully passed only if every run passed for that test.Wrong answer: 10Did not follow instructions: 2Response Time (avg)3.69sResponse Time (max)46.00sResponse Time (total)66.50s…
Anti-AI Tricks
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)1.59sResponse Time (max)3.60sResponse Time (total)6.38s
Coding
: 4.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.44sResponse Time (max)3.44sResponse Time (total)3.44s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)46.00sResponse Time (max)46.00sResponse Time (total)46.00s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.01sResponse Time (max)1.06sResponse Time (total)2.02s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)465msResponse Time (max)492msResponse Time (total)1.39s
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)585msResponse Time (max)715msResponse Time (total)1.17s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.04sResponse Time (max)2.04sResponse Time (total)2.04s
A test is fully passed only if every run passed for that test.Wrong answer: 9Did not follow instructions: 2Response Time (avg)3.82sResponse Time (max)47.43sResponse Time (total)68.74s…
Anti-AI Tricks
: 3.4 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)1.43sResponse Time (max)4.39sResponse Time (total)5.71s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.67sResponse Time (max)2.67sResponse Time (total)2.67s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)47.43sResponse Time (max)47.43sResponse Time (total)47.43s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.16sResponse Time (max)1.42sResponse Time (total)2.33s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)485msResponse Time (max)549msResponse Time (total)1.45s
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)809msResponse Time (max)983msResponse Time (total)1.62s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.30sResponse Time (max)2.30sResponse Time (total)2.30s
Anti-AI Tricks
: 3.8 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)2.83sResponse Time (max)7.62sResponse Time (total)11.33s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)10.18sResponse Time (max)10.18sResponse Time (total)10.18s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)9.95sResponse Time (max)9.95sResponse Time (total)9.95s
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.06sResponse Time (max)2.39sResponse Time (total)4.11s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.03sResponse Time (max)4.83sResponse Time (total)9.08s
Instructions following
: 6.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.92sResponse Time (max)1.94sResponse Time (total)3.83s
Tool Calling
: 9.5 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.74sResponse Time (max)6.74sResponse Time (total)6.74s
A test is fully passed only if every run passed for that test.Wrong answer: 5Did not follow instructions: 2Response Time (avg)9.90sResponse Time (max)26.85sResponse Time (total)178.26s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.10sResponse Time (max)9.60sResponse Time (total)24.39s
Coding
: 6.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)25.84sResponse Time (max)25.84sResponse Time (total)25.84s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)20.28sResponse Time (max)20.28sResponse Time (total)20.28s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.65sResponse Time (max)10.35sResponse Time (total)19.31s
Domain specific
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)14.65sResponse Time (max)26.85sResponse Time (total)43.95s
General Intelligence
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)9.88sResponse Time (max)9.88sResponse Time (total)9.88s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.05sResponse Time (max)6.94sResponse Time (total)12.10s
Puzzle Solving
: 6.1 A test is fully passed only if every run passed for that test.Did not follow instructions: 2Response Time (avg)6.17sResponse Time (max)8.18sResponse Time (total)18.52s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.00sResponse Time (max)4.00sResponse Time (total)4.00s
Coding
: 7.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.14sResponse Time (max)3.14sResponse Time (total)3.14s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)45.14sResponse Time (max)45.14sResponse Time (total)45.14s
Data parsing and extraction
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.32sResponse Time (max)1.32sResponse Time (total)1.32s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)962msResponse Time (max)962msResponse Time (total)962ms
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.34sResponse Time (max)1.34sResponse Time (total)1.34s
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)7.71sResponse Time (max)14.65sResponse Time (total)15.42s
Puzzle Solving
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)22.86sResponse Time (max)42.58sResponse Time (total)45.73s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.47sResponse Time (max)2.47sResponse Time (total)2.47s
Coding
: 4.7 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)1.69sResponse Time (max)1.69sResponse Time (total)1.69s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.28sResponse Time (max)4.28sResponse Time (total)4.28s
Data parsing and extraction
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)81.80sResponse Time (max)81.80sResponse Time (total)81.80s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)638msResponse Time (max)638msResponse Time (total)638ms
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)7.34sResponse Time (max)13.67sResponse Time (total)14.68s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.64sResponse Time (max)2.64sResponse Time (total)2.64s
A test is fully passed only if every run passed for that test.Wrong answer: 3Did not follow instructions: 1Response Time (avg)11.98sResponse Time (max)45.02sResponse Time (total)191.76s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.02sResponse Time (max)8.79sResponse Time (total)24.07s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)32.58sResponse Time (max)32.58sResponse Time (total)32.58s
Combined
: 0.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)12.99sResponse Time (max)13.75sResponse Time (total)25.99s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)22.50sResponse Time (max)45.02sResponse Time (total)67.51s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.50sResponse Time (max)10.22sResponse Time (total)15.00s
Puzzle Solving
: 7.9 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)5.98sResponse Time (max)8.42sResponse Time (total)17.95s
Tool Calling
: 0.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
A test is fully passed only if every run passed for that test.Wrong answer: 3Did not follow instructions: 1Response Time (avg)13.94sResponse Time (max)43.55sResponse Time (total)237.01s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.90sResponse Time (max)19.37sResponse Time (total)39.60s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)34.95sResponse Time (max)34.95sResponse Time (total)34.95s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)14.95sResponse Time (max)15.40sResponse Time (total)29.90s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)22.08sResponse Time (max)43.55sResponse Time (total)66.23s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.54sResponse Time (max)11.67sResponse Time (total)15.07s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.11sResponse Time (max)7.52sResponse Time (total)18.34s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.87sResponse Time (max)5.87sResponse Time (total)5.87s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.90sResponse Time (max)19.37sResponse Time (total)39.60s
Coding
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)34.95sResponse Time (max)34.95sResponse Time (total)34.95s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)14.95sResponse Time (max)15.40sResponse Time (total)29.90s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)29.59sResponse Time (max)43.55sResponse Time (total)88.77s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.54sResponse Time (max)11.67sResponse Time (total)15.07s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.11sResponse Time (max)7.52sResponse Time (total)18.34s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.87sResponse Time (max)5.87sResponse Time (total)5.87s
A test is fully passed only if every run passed for that test.Wrong answer: 3Timed out: 2Response Time (avg)31.38sResponse Time (max)119.29sResponse Time (total)564.84s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.75sResponse Time (max)18.03sResponse Time (total)39.01s
Coding
: 4.7 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)70.98sResponse Time (max)70.98sResponse Time (total)70.98s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)107.79sResponse Time (max)107.79sResponse Time (total)107.79s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)23.41sResponse Time (max)29.79sResponse Time (total)46.83s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)63.40sResponse Time (max)119.29sResponse Time (total)190.20s
General Intelligence
: 3.4 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)34.11sResponse Time (max)34.11sResponse Time (total)34.11s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.88sResponse Time (max)15.44sResponse Time (total)19.76s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)17.18sResponse Time (max)31.99sResponse Time (total)51.55s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.60sResponse Time (max)4.60sResponse Time (total)4.60s
A test is fully passed only if every run passed for that test.Wrong answer: 5Response Time (avg)32.81sResponse Time (max)92.41sResponse Time (total)590.65s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)10.84sResponse Time (max)15.11sResponse Time (total)43.36s
Coding
: 7.6 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)85.72sResponse Time (max)85.72sResponse Time (total)85.72s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)92.41sResponse Time (max)92.41sResponse Time (total)92.41s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)38.32sResponse Time (max)41.70sResponse Time (total)76.63s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)53.10sResponse Time (max)90.70sResponse Time (total)159.30s
General Intelligence
: 4.9 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)25.30sResponse Time (max)25.30sResponse Time (total)25.30s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)20.25sResponse Time (max)21.65sResponse Time (total)40.50s
Puzzle Solving
: 8.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)17.58sResponse Time (max)24.83sResponse Time (total)52.73s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)14.72sResponse Time (max)14.72sResponse Time (total)14.72s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)21.13sResponse Time (max)34.96sResponse Time (total)84.53s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)79.09sResponse Time (max)79.09sResponse Time (total)79.09s
Combined
: 4.7 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)75.34sResponse Time (max)75.34sResponse Time (total)75.34s
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)59.33sResponse Time (max)97.12sResponse Time (total)118.65s
Domain specific
: 4.1 A test is fully passed only if every run passed for that test.Timed out: 2Wrong answer: 1Response Time (avg)88.34sResponse Time (max)106.00sResponse Time (total)265.01s
General Intelligence
: 2.8 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)30.30sResponse Time (max)30.30sResponse Time (total)30.30s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)24.45sResponse Time (max)43.36sResponse Time (total)48.89s
Puzzle Solving
: 6.4 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)31.58sResponse Time (max)60.18sResponse Time (total)94.75s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.65sResponse Time (max)4.65sResponse Time (total)4.65s
A test is fully passed only if every run passed for that test.Timed out: 2Wrong answer: 2Response Time (avg)46.56sResponse Time (max)120.91sResponse Time (total)512.20s…
Anti-AI Tricks
: 8.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)45.78sResponse Time (max)81.20sResponse Time (total)91.57s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)120.91sResponse Time (max)120.91sResponse Time (total)120.91s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)46.85sResponse Time (max)46.85sResponse Time (total)46.85s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)46.91sResponse Time (max)46.91sResponse Time (total)46.91s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)17.50sResponse Time (max)17.50sResponse Time (total)17.50s
General Intelligence
: 4.7 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)79.86sResponse Time (max)79.86sResponse Time (total)79.86s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)31.93sResponse Time (max)31.93sResponse Time (total)31.93s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)34.57sResponse Time (max)49.12sResponse Time (total)69.13s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.54sResponse Time (max)7.54sResponse Time (total)7.54s
A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)48.31sResponse Time (max)186.74sResponse Time (total)869.64s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)22.13sResponse Time (max)28.70sResponse Time (total)88.50s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)117.87sResponse Time (max)117.87sResponse Time (total)117.87s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)121.49sResponse Time (max)121.49sResponse Time (total)121.49s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)41.15sResponse Time (max)48.02sResponse Time (total)82.30s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)95.91sResponse Time (max)186.74sResponse Time (total)287.73s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)32.24sResponse Time (max)32.24sResponse Time (total)32.24s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)24.31sResponse Time (max)27.94sResponse Time (total)48.63s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)24.19sResponse Time (max)37.68sResponse Time (total)72.57s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)18.32sResponse Time (max)18.32sResponse Time (total)18.32s
Anti-AI Tricks
: 8.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)12.62sResponse Time (max)18.61sResponse Time (total)50.50s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)168.22sResponse Time (max)168.22sResponse Time (total)168.22s
Combined
: 7.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)83.07sResponse Time (max)83.07sResponse Time (total)83.07s
Data parsing and extraction
: 3.5 A test is fully passed only if every run passed for that test.No answer: 2Response Time (avg)37.30sResponse Time (max)54.01sResponse Time (total)74.60s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)73.38sResponse Time (max)101.55sResponse Time (total)220.15s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)37.96sResponse Time (max)47.48sResponse Time (total)75.92s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)60.21sResponse Time (max)97.76sResponse Time (total)180.63s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)16.88sResponse Time (max)16.88sResponse Time (total)16.88s
Anti-AI Tricks
: 8.7 A test is fully passed only if every run passed for that test.Extra formatting: 1Response Time (avg)19.75sResponse Time (max)49.95sResponse Time (total)79.01s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)70.35sResponse Time (max)70.35sResponse Time (total)70.35s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)163.96sResponse Time (max)163.96sResponse Time (total)163.96s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)30.26sResponse Time (max)32.03sResponse Time (total)60.52s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)79.53sResponse Time (max)95.52sResponse Time (total)238.59s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)19.66sResponse Time (max)32.25sResponse Time (total)39.32s
Puzzle Solving
: 8.2 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)64.61sResponse Time (max)123.57sResponse Time (total)193.84s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.45sResponse Time (max)7.45sResponse Time (total)7.45s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)59.11sResponse Time (max)168.31sResponse Time (total)236.44s
Coding
: 4.7 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)45.75sResponse Time (max)45.75sResponse Time (total)45.75s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)17.78sResponse Time (max)17.78sResponse Time (total)17.78s
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)56.99sResponse Time (max)80.14sResponse Time (total)113.98s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)146.50sResponse Time (max)234.29sResponse Time (total)439.49s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)63.49sResponse Time (max)111.61sResponse Time (total)126.98s
Puzzle Solving
: 6.6 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)56.74sResponse Time (max)115.01sResponse Time (total)170.23s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)10.33sResponse Time (max)10.33sResponse Time (total)10.33s
Anti-AI Tricks
: 5.1 A test is fully passed only if every run passed for that test.Timed out: 2Wrong answer: 1Response Time (avg)34.44sResponse Time (max)57.86sResponse Time (total)103.31s
Coding
: 2.6 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)135.61sResponse Time (max)135.61sResponse Time (total)135.61s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Domain specific
: 3.6 A test is fully passed only if every run passed for that test.Timed out: 3Response Time (avg)137.75sResponse Time (max)202.61sResponse Time (total)413.24s
General Intelligence
: 2.8 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)226.38sResponse Time (max)226.38sResponse Time (total)226.38s
Instructions following
: 6.4 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)17.15sResponse Time (max)28.54sResponse Time (total)34.29s
Puzzle Solving
: 2.9 A test is fully passed only if every run passed for that test.Timed out: 2Wrong answer: 1Response Time (avg)33.38sResponse Time (max)47.31sResponse Time (total)100.14s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.31sResponse Time (max)4.31sResponse Time (total)4.31s