Anti-AI Tricks
: 3.3 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)471msResponse Time (max)872msResponse Time (total)1.41s
Combined
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Data parsing and extraction
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)714msResponse Time (max)987msResponse Time (total)1.43s
Domain specific
: 5.9 A test is fully passed only if every run passed for that test.API error: 1Wrong answer: 1Response Time (avg)287msResponse Time (max)334msResponse Time (total)860ms
Instructions following
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.09sResponse Time (max)1.90sResponse Time (total)2.18s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
A test is fully passed only if every run passed for that test.Wrong answer: 12Did not follow instructions: 3Response Time (avg)1.39sResponse Time (max)3.84sResponse Time (total)23.70s…
Anti-AI Tricks
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)1.18sResponse Time (max)1.81sResponse Time (total)4.70s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.84sResponse Time (max)3.84sResponse Time (total)3.84s
Data parsing and extraction
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.11sResponse Time (max)1.25sResponse Time (total)2.23s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)926msResponse Time (max)959msResponse Time (total)2.78s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.40sResponse Time (max)3.40sResponse Time (total)3.40s
Anti-AI Tricks
: 3.2 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)1.19sResponse Time (max)2.73sResponse Time (total)4.76s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.87sResponse Time (max)2.87sResponse Time (total)2.87s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)564msResponse Time (max)564msResponse Time (total)564ms
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)857msResponse Time (max)955msResponse Time (total)1.71s
Puzzle Solving
: 3.6 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)1.38sResponse Time (max)1.74sResponse Time (total)2.75s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.28sResponse Time (max)2.28sResponse Time (total)2.28s
A test is fully passed only if every run passed for that test.Wrong answer: 12Did not follow instructions: 2Response Time (avg)1.76sResponse Time (max)5.51sResponse Time (total)17.56s…
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.33sResponse Time (max)3.33sResponse Time (total)3.33s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)943msResponse Time (max)943msResponse Time (total)943ms
Domain specific
: 5.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.06sResponse Time (max)1.06sResponse Time (total)1.06s
Instructions following
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)923msResponse Time (max)923msResponse Time (total)923ms
Puzzle Solving
: 3.2 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)1.28sResponse Time (max)1.36sResponse Time (total)2.56s
Tool Calling
: 2.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)5.51sResponse Time (max)5.51sResponse Time (total)5.51s
Anti-AI Tricks
: 5.1 A test is fully passed only if every run passed for that test.Timed out: 2Wrong answer: 1Response Time (avg)34.44sResponse Time (max)57.86sResponse Time (total)103.31s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Domain specific
: 3.6 A test is fully passed only if every run passed for that test.Timed out: 3Response Time (avg)137.75sResponse Time (max)202.61sResponse Time (total)413.24s
General Intelligence
: 2.8 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)226.38sResponse Time (max)226.38sResponse Time (total)226.38s
Instructions following
: 6.4 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)17.15sResponse Time (max)28.54sResponse Time (total)34.29s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.31sResponse Time (max)4.31sResponse Time (total)4.31s
Combined
: 2.8 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)65.57sResponse Time (max)65.57sResponse Time (total)65.57s
Data parsing and extraction
: 6.3 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)1.51sResponse Time (max)1.51sResponse Time (total)1.51s
Domain specific
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 2No answer: 1Response Time (avg)174.55sResponse Time (max)174.55sResponse Time (total)174.55s
General Intelligence
: 3.6 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)18.14sResponse Time (max)18.14sResponse Time (total)18.14s
Instructions following
: 6.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.97sResponse Time (max)2.97sResponse Time (total)2.97s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)15.95sResponse Time (max)15.95sResponse Time (total)15.95s
A test is fully passed only if every run passed for that test.Wrong answer: 9Did not follow instructions: 5Response Time (avg)11.57sResponse Time (max)81.80sResponse Time (total)127.32s…
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.28sResponse Time (max)4.28sResponse Time (total)4.28s
Data parsing and extraction
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)81.80sResponse Time (max)81.80sResponse Time (total)81.80s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)638msResponse Time (max)638msResponse Time (total)638ms
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.64sResponse Time (max)2.64sResponse Time (total)2.64s
Anti-AI Tricks
: 3.1 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)1.71sResponse Time (max)3.79sResponse Time (total)6.84s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)5.91sResponse Time (max)5.91sResponse Time (total)5.91s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)847msResponse Time (max)1.09sResponse Time (total)1.69s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)464msResponse Time (max)622msResponse Time (total)1.39s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)514msResponse Time (max)582msResponse Time (total)1.03s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.27sResponse Time (max)1.27sResponse Time (total)1.27s
A test is fully passed only if every run passed for that test.Wrong answer: 10Did not follow instructions: 3Response Time (avg)1.17sResponse Time (max)2.52sResponse Time (total)19.82s…
Anti-AI Tricks
: 3.1 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)929msResponse Time (max)1.55sResponse Time (total)3.72s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.52sResponse Time (max)2.52sResponse Time (total)2.52s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.30sResponse Time (max)1.58sResponse Time (total)2.61s
Domain specific
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)937msResponse Time (max)1.25sResponse Time (total)2.81s
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)728msResponse Time (max)731msResponse Time (total)1.46s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)2.32sResponse Time (max)2.32sResponse Time (total)2.32s
A test is fully passed only if every run passed for that test.Wrong answer: 12Did not follow instructions: 1Response Time (avg)592msResponse Time (max)1.27sResponse Time (total)10.07s…
Anti-AI Tricks
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)483msResponse Time (max)716msResponse Time (total)1.93s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)606msResponse Time (max)606msResponse Time (total)606ms
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)667msResponse Time (max)819msResponse Time (total)1.33s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)534msResponse Time (max)733msResponse Time (total)1.60s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)551msResponse Time (max)622msResponse Time (total)1.10s
Puzzle Solving
: 3.1 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)533msResponse Time (max)637msResponse Time (total)1.60s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.27sResponse Time (max)1.27sResponse Time (total)1.27s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)45.14sResponse Time (max)45.14sResponse Time (total)45.14s
Data parsing and extraction
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.32sResponse Time (max)1.32sResponse Time (total)1.32s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)962msResponse Time (max)962msResponse Time (total)962ms
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.34sResponse Time (max)1.34sResponse Time (total)1.34s
Instructions following
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)7.71sResponse Time (max)14.65sResponse Time (total)15.42s
Puzzle Solving
: 3.2 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)22.86sResponse Time (max)42.58sResponse Time (total)45.73s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.47sResponse Time (max)2.47sResponse Time (total)2.47s
Combined
: 4.7 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)41.03sResponse Time (max)41.03sResponse Time (total)41.03s
Data parsing and extraction
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)21.95sResponse Time (max)24.88sResponse Time (total)43.89s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Timed out: 2Wrong answer: 1Response Time (avg)19.00sResponse Time (max)21.63sResponse Time (total)38.01s
Tool Calling
: 4.7 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)12.05sResponse Time (max)12.05sResponse Time (total)12.05s
A test is fully passed only if every run passed for that test.Wrong answer: 12Did not follow instructions: 1Response Time (avg)1.94sResponse Time (max)7.58sResponse Time (total)19.44s…
Anti-AI Tricks
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)1.34sResponse Time (max)1.83sResponse Time (total)2.67s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)7.58sResponse Time (max)7.58sResponse Time (total)7.58s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.27sResponse Time (max)1.27sResponse Time (total)1.27s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)637msResponse Time (max)637msResponse Time (total)637ms
General Intelligence
: 4.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)909msResponse Time (max)909msResponse Time (total)909ms
Puzzle Solving
: 3.7 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)1.30sResponse Time (max)1.54sResponse Time (total)2.60s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.51sResponse Time (max)2.51sResponse Time (total)2.51s
A test is fully passed only if every run passed for that test.Wrong answer: 9Did not follow instructions: 4Response Time (avg)8.86sResponse Time (max)24.97sResponse Time (total)150.70s…
Anti-AI Tricks
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)7.43sResponse Time (max)16.69sResponse Time (total)29.72s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)19.98sResponse Time (max)19.98sResponse Time (total)19.98s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.92sResponse Time (max)13.23sResponse Time (total)15.84s
Domain specific
: 3.6 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)6.23sResponse Time (max)14.38sResponse Time (total)18.70s
Tool Calling
: 4.7 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)16.00sResponse Time (max)16.00sResponse Time (total)16.00s
A test is fully passed only if every run passed for that test.Wrong answer: 10Did not follow instructions: 2Response Time (avg)629msResponse Time (max)1.72sResponse Time (total)10.70s…
Anti-AI Tricks
: 3.4 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)395msResponse Time (max)769msResponse Time (total)1.58s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.72sResponse Time (max)1.72sResponse Time (total)1.72s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)822msResponse Time (max)1.08sResponse Time (total)1.64s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)367msResponse Time (max)388msResponse Time (total)1.10s
General Intelligence
: 4.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)729msResponse Time (max)729msResponse Time (total)729ms
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)380msResponse Time (max)380msResponse Time (total)759ms
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.40sResponse Time (max)1.40sResponse Time (total)1.40s
A test is fully passed only if every run passed for that test.Wrong answer: 10Did not follow instructions: 2Response Time (avg)3.04sResponse Time (max)8.91sResponse Time (total)51.75s…
Anti-AI Tricks
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)3.02sResponse Time (max)8.17sResponse Time (total)12.07s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)8.91sResponse Time (max)8.91sResponse Time (total)8.91s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.26sResponse Time (max)4.66sResponse Time (total)6.52s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)877msResponse Time (max)894msResponse Time (total)2.63s
Puzzle Solving
: 5.4 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)3.30sResponse Time (max)4.81sResponse Time (total)9.91s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.67sResponse Time (max)6.67sResponse Time (total)6.67s
A test is fully passed only if every run passed for that test.Wrong answer: 12Response Time (avg)10.83sResponse Time (max)42.13sResponse Time (total)108.27s…
Anti-AI Tricks
: 3.6 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)6.24sResponse Time (max)11.38sResponse Time (total)12.48s
Combined
: 2.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)19.16sResponse Time (max)19.16sResponse Time (total)19.16s
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)42.13sResponse Time (max)42.13sResponse Time (total)42.13s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)4.38sResponse Time (max)4.38sResponse Time (total)4.38s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.00sResponse Time (max)4.00sResponse Time (total)4.00s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.67sResponse Time (max)2.67sResponse Time (total)2.67s
Puzzle Solving
: 3.1 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)4.73sResponse Time (max)7.81sResponse Time (total)9.45s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)13.99sResponse Time (max)13.99sResponse Time (total)13.99s
Anti-AI Tricks
: 4.0 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)597msResponse Time (max)866msResponse Time (total)2.39s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)6.48sResponse Time (max)6.48sResponse Time (total)6.48s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)601msResponse Time (max)634msResponse Time (total)1.20s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)611msResponse Time (max)616msResponse Time (total)1.83s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.79sResponse Time (max)4.79sResponse Time (total)4.79s
Anti-AI Tricks
: 5.2 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)5.51sResponse Time (max)6.59sResponse Time (total)11.02s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)3.22sResponse Time (max)3.22sResponse Time (total)3.22s
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.82sResponse Time (max)4.82sResponse Time (total)4.82s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)744msResponse Time (max)744msResponse Time (total)744ms
General Intelligence
: 4.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.59sResponse Time (max)1.59sResponse Time (total)1.59s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)888msResponse Time (max)888msResponse Time (total)888ms
Tool Calling
: 2.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)7.05sResponse Time (max)7.05sResponse Time (total)7.05s
A test is fully passed only if every run passed for that test.Wrong answer: 9Did not follow instructions: 2Response Time (avg)2.89sResponse Time (max)8.21sResponse Time (total)49.05s…
Anti-AI Tricks
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)2.84sResponse Time (max)4.15sResponse Time (total)11.35s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.89sResponse Time (max)4.89sResponse Time (total)4.89s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.47sResponse Time (max)2.48sResponse Time (total)4.95s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.97sResponse Time (max)2.65sResponse Time (total)5.92s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.13sResponse Time (max)2.53sResponse Time (total)4.27s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)8.21sResponse Time (max)8.21sResponse Time (total)8.21s
Anti-AI Tricks
: 5.6 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)2.67sResponse Time (max)5.03sResponse Time (total)10.66s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)25.25sResponse Time (max)25.25sResponse Time (total)25.25s
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)1.23sResponse Time (max)1.96sResponse Time (total)2.46s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.API error: 1Wrong answer: 1Response Time (avg)6.11sResponse Time (max)13.72sResponse Time (total)18.34s
Instructions following
: 7.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.38sResponse Time (max)1.61sResponse Time (total)2.75s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.50sResponse Time (max)3.50sResponse Time (total)3.50s
A test is fully passed only if every run passed for that test.Wrong answer: 10Did not follow instructions: 2Response Time (avg)1.70sResponse Time (max)9.39sResponse Time (total)28.82s…
Anti-AI Tricks
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)788msResponse Time (max)1.34sResponse Time (total)3.15s
Combined
: 2.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)9.39sResponse Time (max)9.39sResponse Time (total)9.39s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.43sResponse Time (max)1.45sResponse Time (total)2.86s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)540msResponse Time (max)649msResponse Time (total)1.62s
Instructions following
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)815msResponse Time (max)973msResponse Time (total)1.63s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.54sResponse Time (max)3.54sResponse Time (total)3.54s
A test is fully passed only if every run passed for that test.Wrong answer: 10Did not follow instructions: 1Response Time (avg)1.43sResponse Time (max)2.89sResponse Time (total)24.27s…
Anti-AI Tricks
: 3.2 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)1.21sResponse Time (max)2.58sResponse Time (total)4.85s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.89sResponse Time (max)2.89sResponse Time (total)2.89s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.04sResponse Time (max)1.06sResponse Time (total)2.08s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.07sResponse Time (max)1.54sResponse Time (total)3.22s
General Intelligence
: 4.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.78sResponse Time (max)1.78sResponse Time (total)1.78s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.07sResponse Time (max)1.17sResponse Time (total)2.15s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.75sResponse Time (max)2.75sResponse Time (total)2.75s
A test is fully passed only if every run passed for that test.Wrong answer: 9Did not follow instructions: 2Response Time (avg)2.31sResponse Time (max)6.58sResponse Time (total)39.25s…
Anti-AI Tricks
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)1.80sResponse Time (max)2.62sResponse Time (total)7.19s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)6.58sResponse Time (max)6.58sResponse Time (total)6.58s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.39sResponse Time (max)1.42sResponse Time (total)2.78s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.78sResponse Time (max)2.49sResponse Time (total)5.34s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.51sResponse Time (max)2.95sResponse Time (total)5.02s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.39sResponse Time (max)4.39sResponse Time (total)4.39s
A test is fully passed only if every run passed for that test.Wrong answer: 10Did not follow instructions: 1Response Time (avg)3.71sResponse Time (max)46.00sResponse Time (total)63.06s…
Anti-AI Tricks
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)1.59sResponse Time (max)3.60sResponse Time (total)6.38s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)46.00sResponse Time (max)46.00sResponse Time (total)46.00s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.01sResponse Time (max)1.06sResponse Time (total)2.02s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)465msResponse Time (max)492msResponse Time (total)1.39s
Instructions following
: 4.5 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)585msResponse Time (max)715msResponse Time (total)1.17s
Puzzle Solving
: 5.4 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)982msResponse Time (max)1.36sResponse Time (total)2.95s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.04sResponse Time (max)2.04sResponse Time (total)2.04s
A test is fully passed only if every run passed for that test.Wrong answer: 9Did not follow instructions: 2Response Time (avg)4.58sResponse Time (max)15.17sResponse Time (total)77.92s…
Anti-AI Tricks
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)3.81sResponse Time (max)6.85sResponse Time (total)15.23s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)15.17sResponse Time (max)15.17sResponse Time (total)15.17s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)8.49sResponse Time (max)14.02sResponse Time (total)16.98s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)2.33sResponse Time (max)2.94sResponse Time (total)6.99s
Instructions following
: 6.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.82sResponse Time (max)2.92sResponse Time (total)5.65s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.02sResponse Time (max)6.02sResponse Time (total)6.02s
Combined
: 4.5 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)60.39sResponse Time (max)60.39sResponse Time (total)60.39s
Data parsing and extraction
: 4.6 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)7.48sResponse Time (max)7.48sResponse Time (total)7.48s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Timed out: 1Response Time (avg)237.27sResponse Time (max)237.27sResponse Time (total)237.27s
Puzzle Solving
: 5.3 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)11.54sResponse Time (max)17.37sResponse Time (total)23.08s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)15.35sResponse Time (max)15.35sResponse Time (total)15.35s
A test is fully passed only if every run passed for that test.Wrong answer: 9Did not follow instructions: 2Response Time (avg)3.89sResponse Time (max)47.43sResponse Time (total)66.07s…
Anti-AI Tricks
: 3.4 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)1.43sResponse Time (max)4.39sResponse Time (total)5.71s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)47.43sResponse Time (max)47.43sResponse Time (total)47.43s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.16sResponse Time (max)1.42sResponse Time (total)2.33s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)485msResponse Time (max)549msResponse Time (total)1.45s
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)809msResponse Time (max)983msResponse Time (total)1.62s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.30sResponse Time (max)2.30sResponse Time (total)2.30s
A test is fully passed only if every run passed for that test.Wrong answer: 10Did not follow instructions: 1Response Time (avg)888msResponse Time (max)4.39sResponse Time (total)15.10s…
Anti-AI Tricks
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)582msResponse Time (max)844msResponse Time (total)2.33s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.39sResponse Time (max)4.39sResponse Time (total)4.39s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)652msResponse Time (max)660msResponse Time (total)1.30s
Domain specific
: 5.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)495msResponse Time (max)642msResponse Time (total)1.49s
General Intelligence
: 5.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)615msResponse Time (max)615msResponse Time (total)615ms
Instructions following
: 8.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)672msResponse Time (max)785msResponse Time (total)1.34s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.91sResponse Time (max)1.91sResponse Time (total)1.91s
A test is fully passed only if every run passed for that test.Wrong answer: 10Response Time (avg)2.40sResponse Time (max)6.70sResponse Time (total)40.85s…
Anti-AI Tricks
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)2.43sResponse Time (max)6.70sResponse Time (total)9.73s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)6.59sResponse Time (max)6.59sResponse Time (total)6.59s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.82sResponse Time (max)1.97sResponse Time (total)3.63s
Domain specific
: 3.6 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)1.33sResponse Time (max)1.53sResponse Time (total)4.00s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.45sResponse Time (max)3.45sResponse Time (total)3.45s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.06sResponse Time (max)1.09sResponse Time (total)2.12s
Puzzle Solving
: 5.2 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)2.46sResponse Time (max)4.23sResponse Time (total)7.37s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.94sResponse Time (max)3.94sResponse Time (total)3.94s
A test is fully passed only if every run passed for that test.Wrong answer: 6Did not follow instructions: 4Response Time (avg)15.05sResponse Time (max)50.92sResponse Time (total)150.55s…
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)31.18sResponse Time (max)31.18sResponse Time (total)31.18s
Data parsing and extraction
: 6.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.98sResponse Time (max)1.98sResponse Time (total)1.98s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)50.92sResponse Time (max)50.92sResponse Time (total)50.92s
Instructions following
: 9.9 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.63sResponse Time (max)7.63sResponse Time (total)7.63s
Tool Calling
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.91sResponse Time (max)6.91sResponse Time (total)6.91s
A test is fully passed only if every run passed for that test.Wrong answer: 9Did not follow instructions: 1Response Time (avg)3.36sResponse Time (max)13.73sResponse Time (total)57.15s…
Anti-AI Tricks
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)1.32sResponse Time (max)3.89sResponse Time (total)5.30s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)6.22sResponse Time (max)6.22sResponse Time (total)6.22s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.57sResponse Time (max)1.83sResponse Time (total)3.14s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)905msResponse Time (max)1.10sResponse Time (total)2.71s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)803msResponse Time (max)803msResponse Time (total)803ms
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)8.81sResponse Time (max)13.73sResponse Time (total)17.61s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.67sResponse Time (max)3.67sResponse Time (total)3.67s
Anti-AI Tricks
: 6.9 A test is fully passed only if every run passed for that test.Extra formatting: 1Wrong answer: 1Response Time (avg)3.46sResponse Time (max)4.38sResponse Time (total)13.86s
Combined
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.54sResponse Time (max)7.51sResponse Time (total)11.08s
Puzzle Solving
: 7.2 A test is fully passed only if every run passed for that test.Did not follow instructions: 2Response Time (avg)5.01sResponse Time (max)5.49sResponse Time (total)15.03s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Anti-AI Tricks
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)25.50sResponse Time (max)37.73sResponse Time (total)51.00s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)65.96sResponse Time (max)65.96sResponse Time (total)65.96s
Data parsing and extraction
: 3.7 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)21.42sResponse Time (max)21.42sResponse Time (total)21.42s
Domain specific
: 5.2 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)204.02sResponse Time (max)204.02sResponse Time (total)204.02s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)33.30sResponse Time (max)33.30sResponse Time (total)33.30s
A test is fully passed only if every run passed for that test.Wrong answer: 6Did not follow instructions: 4Response Time (avg)2.25sResponse Time (max)14.63sResponse Time (total)35.99s…
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.28sResponse Time (max)3.28sResponse Time (total)3.28s
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.11sResponse Time (max)1.47sResponse Time (total)2.21s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)6.48sResponse Time (max)14.63sResponse Time (total)19.43s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.07sResponse Time (max)1.07sResponse Time (total)1.07s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.89sResponse Time (max)1.89sResponse Time (total)1.89s
Anti-AI Tricks
: 3.2 A test is fully passed only if every run passed for that test.Extra formatting: 2Wrong answer: 2Response Time (avg)7.63sResponse Time (max)12.26sResponse Time (total)30.54s
Combined
: 6.5 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)115.89sResponse Time (max)115.89sResponse Time (total)115.89s
Data parsing and extraction
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)9.42sResponse Time (max)16.20sResponse Time (total)18.84s
Domain specific
: 3.6 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)1.61sResponse Time (max)1.77sResponse Time (total)4.83s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.86sResponse Time (max)2.86sResponse Time (total)2.86s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.52sResponse Time (max)1.99sResponse Time (total)3.04s
Puzzle Solving
: 8.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)7.37sResponse Time (max)10.78sResponse Time (total)22.10s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.85sResponse Time (max)11.85sResponse Time (total)11.85s
A test is fully passed only if every run passed for that test.Wrong answer: 7Did not follow instructions: 2Response Time (avg)2.01sResponse Time (max)6.81sResponse Time (total)34.09s…
Anti-AI Tricks
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)1.10sResponse Time (max)2.08sResponse Time (total)4.39s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.47sResponse Time (max)2.47sResponse Time (total)2.47s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.69sResponse Time (max)2.46sResponse Time (total)3.38s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.14sResponse Time (max)1.63sResponse Time (total)3.41s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.18sResponse Time (max)6.81sResponse Time (total)8.36s
Puzzle Solving
: 8.0 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)2.71sResponse Time (max)5.96sResponse Time (total)8.14s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.76sResponse Time (max)2.76sResponse Time (total)2.76s
A test is fully passed only if every run passed for that test.Wrong answer: 8Response Time (avg)3.77sResponse Time (max)11.07sResponse Time (total)37.66s…
Anti-AI Tricks
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)2.37sResponse Time (max)3.39sResponse Time (total)4.75s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.98sResponse Time (max)4.98sResponse Time (total)4.98s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.78sResponse Time (max)5.78sResponse Time (total)5.78s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)2.24sResponse Time (max)2.24sResponse Time (total)2.24s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.27sResponse Time (max)3.27sResponse Time (total)3.27s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.48sResponse Time (max)1.48sResponse Time (total)1.48s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.05sResponse Time (max)2.08sResponse Time (total)4.10s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.07sResponse Time (max)11.07sResponse Time (total)11.07s
A test is fully passed only if every run passed for that test.Wrong answer: 8Response Time (avg)2.51sResponse Time (max)6.65sResponse Time (total)27.60s…
Anti-AI Tricks
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)1.91sResponse Time (max)2.74sResponse Time (total)3.82s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)6.65sResponse Time (max)6.65sResponse Time (total)6.65s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.89sResponse Time (max)1.89sResponse Time (total)1.89s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.17sResponse Time (max)1.44sResponse Time (total)2.33s
General Intelligence
: 4.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.26sResponse Time (max)2.26sResponse Time (total)2.26s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.67sResponse Time (max)1.67sResponse Time (total)1.67s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.82sResponse Time (max)3.52sResponse Time (total)5.65s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.33sResponse Time (max)3.33sResponse Time (total)3.33s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)88.15sResponse Time (max)88.15sResponse Time (total)88.15s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)12.58sResponse Time (max)13.87sResponse Time (total)25.16s
Domain specific
: 3.6 A test is fully passed only if every run passed for that test.Wrong answer: 2Timed out: 1Response Time (avg)44.63sResponse Time (max)82.55sResponse Time (total)133.89s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)18.64sResponse Time (max)18.64sResponse Time (total)18.64s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)10.08sResponse Time (max)22.30sResponse Time (total)40.30s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)87.80sResponse Time (max)87.80sResponse Time (total)87.80s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)18.16sResponse Time (max)20.65sResponse Time (total)36.33s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Timed out: 1Response Time (avg)16.19sResponse Time (max)21.56sResponse Time (total)32.39s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)39.75sResponse Time (max)39.75sResponse Time (total)39.75s
Anti-AI Tricks
: 8.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.81sResponse Time (max)5.65sResponse Time (total)7.62s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)37.64sResponse Time (max)37.64sResponse Time (total)37.64s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.63sResponse Time (max)6.63sResponse Time (total)6.63s
Domain specific
: 5.8 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)121.79sResponse Time (max)121.79sResponse Time (total)121.79s
Tool Calling
: 2.8 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)27.71sResponse Time (max)27.71sResponse Time (total)27.71s
Anti-AI Tricks
: 7.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)4.75sResponse Time (max)7.62sResponse Time (total)19.00s
Combined
: 4.7 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)30.53sResponse Time (max)30.53sResponse Time (total)30.53s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)23.16sResponse Time (max)26.55sResponse Time (total)46.33s
Instructions following
: 9.9 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.18sResponse Time (max)4.46sResponse Time (total)8.36s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)17.33sResponse Time (max)17.33sResponse Time (total)17.33s
A test is fully passed only if every run passed for that test.Did not follow instructions: 5Wrong answer: 4Response Time (avg)15.66sResponse Time (max)102.91sResponse Time (total)266.14s…
Anti-AI Tricks
: 8.6 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.05sResponse Time (max)6.69sResponse Time (total)16.20s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)17.81sResponse Time (max)17.81sResponse Time (total)17.81s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.43sResponse Time (max)3.39sResponse Time (total)4.87s
Domain specific
: 4.1 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)65.31sResponse Time (max)102.91sResponse Time (total)195.92s
Puzzle Solving
: 6.8 A test is fully passed only if every run passed for that test.Did not follow instructions: 2Response Time (avg)4.33sResponse Time (max)7.27sResponse Time (total)13.00s
Tool Calling
: 4.7 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)9.62sResponse Time (max)9.62sResponse Time (total)9.62s
Anti-AI Tricks
: 7.3 A test is fully passed only if every run passed for that test.No answer: 1Wrong answer: 1Response Time (avg)51.38sResponse Time (max)85.28sResponse Time (total)102.75s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)71.37sResponse Time (max)71.37sResponse Time (total)71.37s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)49.78sResponse Time (max)49.78sResponse Time (total)49.78s
Domain specific
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 2Timed out: 1Response Time (avg)137.29sResponse Time (max)137.29sResponse Time (total)137.29s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)92.47sResponse Time (max)92.47sResponse Time (total)92.47s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)31.74sResponse Time (max)31.74sResponse Time (total)31.74s
Combined
: 9.5 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)23.84sResponse Time (max)23.84sResponse Time (total)23.84s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.43sResponse Time (max)3.43sResponse Time (total)3.43s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.54sResponse Time (max)3.54sResponse Time (total)3.54s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.96sResponse Time (max)1.96sResponse Time (total)1.96s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Extra formatting: 1Response Time (avg)2.92sResponse Time (max)3.33sResponse Time (total)5.84s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.11sResponse Time (max)4.11sResponse Time (total)4.11s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)21.13sResponse Time (max)34.96sResponse Time (total)84.53s
Combined
: 4.7 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)75.34sResponse Time (max)75.34sResponse Time (total)75.34s
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)59.33sResponse Time (max)97.12sResponse Time (total)118.65s
Domain specific
: 4.1 A test is fully passed only if every run passed for that test.Timed out: 2Wrong answer: 1Response Time (avg)88.34sResponse Time (max)106.00sResponse Time (total)265.01s
General Intelligence
: 2.8 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)30.30sResponse Time (max)30.30sResponse Time (total)30.30s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)24.45sResponse Time (max)43.36sResponse Time (total)48.89s
Puzzle Solving
: 6.4 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)31.58sResponse Time (max)60.18sResponse Time (total)94.75s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.65sResponse Time (max)4.65sResponse Time (total)4.65s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)14.06sResponse Time (max)14.06sResponse Time (total)14.06s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.15sResponse Time (max)3.15sResponse Time (total)3.15s
Domain specific
: 5.9 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)77.80sResponse Time (max)77.80sResponse Time (total)77.80s
Instructions following
: 9.9 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.12sResponse Time (max)3.12sResponse Time (total)3.12s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)5.47sResponse Time (max)6.45sResponse Time (total)10.94s
Tool Calling
: 4.7 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)10.30sResponse Time (max)10.30sResponse Time (total)10.30s
Anti-AI Tricks
: 6.6 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)74.75sResponse Time (max)182.10sResponse Time (total)298.98s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)262.83sResponse Time (max)262.83sResponse Time (total)262.83s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)24.27sResponse Time (max)27.52sResponse Time (total)48.54s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Timed out: 3Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)17.47sResponse Time (max)19.46sResponse Time (total)34.93s
Puzzle Solving
: 8.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)25.85sResponse Time (max)32.95sResponse Time (total)77.55s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)88.68sResponse Time (max)88.68sResponse Time (total)88.68s
A test is fully passed only if every run passed for that test.Wrong answer: 4Did not follow instructions: 3Response Time (avg)11.08sResponse Time (max)94.06sResponse Time (total)188.39s…
Anti-AI Tricks
: 8.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.52sResponse Time (max)7.74sResponse Time (total)18.10s
Combined
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)24.13sResponse Time (max)24.13sResponse Time (total)24.13s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.54sResponse Time (max)3.33sResponse Time (total)5.08s
Domain specific
: 5.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)38.18sResponse Time (max)94.06sResponse Time (total)114.53s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.88sResponse Time (max)2.61sResponse Time (total)3.75s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.71sResponse Time (max)7.71sResponse Time (total)7.71s
A test is fully passed only if every run passed for that test.Extra formatting: 4Wrong answer: 2Response Time (avg)20.87sResponse Time (max)83.40sResponse Time (total)208.73s…
Anti-AI Tricks
: 6.4 A test is fully passed only if every run passed for that test.Extra formatting: 2Response Time (avg)7.45sResponse Time (max)11.88sResponse Time (total)14.90s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)76.66sResponse Time (max)76.66sResponse Time (total)76.66s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.37sResponse Time (max)7.37sResponse Time (total)7.37s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.04sResponse Time (max)5.04sResponse Time (total)5.04s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.43sResponse Time (max)2.43sResponse Time (total)2.43s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.60sResponse Time (max)4.66sResponse Time (total)9.20s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.73sResponse Time (max)9.73sResponse Time (total)9.73s
A test is fully passed only if every run passed for that test.Wrong answer: 5Did not follow instructions: 2Response Time (avg)5.68sResponse Time (max)18.33sResponse Time (total)96.58s…
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.96sResponse Time (max)11.96sResponse Time (total)11.96s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.21sResponse Time (max)2.52sResponse Time (total)4.42s
Domain specific
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)13.01sResponse Time (max)18.33sResponse Time (total)39.04s
Instructions following
: 8.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.29sResponse Time (max)4.18sResponse Time (total)6.59s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.93sResponse Time (max)3.05sResponse Time (total)8.78s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)8.36sResponse Time (max)8.36sResponse Time (total)8.36s
Anti-AI Tricks
: 8.1 A test is fully passed only if every run passed for that test.Extra formatting: 1Response Time (avg)15.85sResponse Time (max)20.83sResponse Time (total)47.55s
Combined
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)75.68sResponse Time (max)75.68sResponse Time (total)75.68s
Data parsing and extraction
: 6.5 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Domain specific
: 5.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)96.01sResponse Time (max)96.01sResponse Time (total)96.01s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.28sResponse Time (max)7.37sResponse Time (total)8.55s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.77sResponse Time (max)5.26sResponse Time (total)7.55s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)27.78sResponse Time (max)27.78sResponse Time (total)27.78s
A test is fully passed only if every run passed for that test.Wrong answer: 5Did not follow instructions: 1Response Time (avg)6.72sResponse Time (max)38.52sResponse Time (total)114.20s…
Anti-AI Tricks
: 8.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.40sResponse Time (max)4.78sResponse Time (total)13.59s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.12sResponse Time (max)9.12sResponse Time (total)9.12s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.05sResponse Time (max)3.33sResponse Time (total)6.10s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)17.78sResponse Time (max)38.52sResponse Time (total)53.33s
Instructions following
: 7.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)5.46sResponse Time (max)6.45sResponse Time (total)10.92s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.42sResponse Time (max)5.04sResponse Time (total)13.27s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.68sResponse Time (max)4.68sResponse Time (total)4.68s
A test is fully passed only if every run passed for that test.Wrong answer: 4Did not follow instructions: 2Response Time (avg)1.29sResponse Time (max)3.39sResponse Time (total)21.95s…
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.20sResponse Time (max)3.20sResponse Time (total)3.20s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.22sResponse Time (max)1.33sResponse Time (total)2.44s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)942msResponse Time (max)1.12sResponse Time (total)2.83s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.13sResponse Time (max)1.14sResponse Time (total)2.27s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)972msResponse Time (max)1.13sResponse Time (total)2.92s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.39sResponse Time (max)3.39sResponse Time (total)3.39s
A test is fully passed only if every run passed for that test.Did not follow instructions: 3Wrong answer: 3Response Time (avg)26.78sResponse Time (max)170.45sResponse Time (total)294.58s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)13.56sResponse Time (max)32.30sResponse Time (total)40.68s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)29.57sResponse Time (max)29.57sResponse Time (total)29.57s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)15.01sResponse Time (max)15.01sResponse Time (total)15.01s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)170.45sResponse Time (max)170.45sResponse Time (total)170.45s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.91sResponse Time (max)11.91sResponse Time (total)11.91s
A test is fully passed only if every run passed for that test.Did not follow instructions: 3Wrong answer: 3Response Time (avg)8.54sResponse Time (max)24.21sResponse Time (total)145.26s…
Anti-AI Tricks
: 8.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.16sResponse Time (max)3.44sResponse Time (total)12.65s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)20.93sResponse Time (max)20.93sResponse Time (total)20.93s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.01sResponse Time (max)4.27sResponse Time (total)8.02s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)21.33sResponse Time (max)24.21sResponse Time (total)64.00s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.78sResponse Time (max)5.78sResponse Time (total)5.78s
Puzzle Solving
: 8.2 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)3.85sResponse Time (max)4.53sResponse Time (total)11.55s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)12.39sResponse Time (max)12.39sResponse Time (total)12.39s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.11sResponse Time (max)3.43sResponse Time (total)8.43s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)19.29sResponse Time (max)19.29sResponse Time (total)19.29s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.29sResponse Time (max)2.62sResponse Time (total)4.58s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.86sResponse Time (max)2.86sResponse Time (total)2.86s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.07sResponse Time (max)11.07sResponse Time (total)11.07s
A test is fully passed only if every run passed for that test.Extra formatting: 2Wrong answer: 2Timed out: 1Response Time (avg)10.09sResponse Time (max)46.35sResponse Time (total)90.85s…
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)46.35sResponse Time (max)46.35sResponse Time (total)46.35s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)13.90sResponse Time (max)13.90sResponse Time (total)13.90s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.94sResponse Time (max)4.94sResponse Time (total)4.94s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.61sResponse Time (max)2.61sResponse Time (total)2.61s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.80sResponse Time (max)5.22sResponse Time (total)9.60s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.48sResponse Time (max)7.48sResponse Time (total)7.48s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.06sResponse Time (max)4.70sResponse Time (total)12.23s
Combined
: 4.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)64.71sResponse Time (max)64.71sResponse Time (total)64.71s
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)17.20sResponse Time (max)17.44sResponse Time (total)34.40s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)6.00sResponse Time (max)6.14sResponse Time (total)12.01s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.06sResponse Time (max)4.06sResponse Time (total)4.06s
Instructions following
: 9.9 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.36sResponse Time (max)4.35sResponse Time (total)6.72s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)8.19sResponse Time (max)8.19sResponse Time (total)8.19s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)59.11sResponse Time (max)168.31sResponse Time (total)236.44s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)17.78sResponse Time (max)17.78sResponse Time (total)17.78s
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)56.99sResponse Time (max)80.14sResponse Time (total)113.98s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)146.50sResponse Time (max)234.29sResponse Time (total)439.49s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)63.49sResponse Time (max)111.61sResponse Time (total)126.98s
Puzzle Solving
: 6.4 A test is fully passed only if every run passed for that test.Timed out: 2Response Time (avg)56.74sResponse Time (max)115.01sResponse Time (total)170.23s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)10.33sResponse Time (max)10.33sResponse Time (total)10.33s
A test is fully passed only if every run passed for that test.Wrong answer: 5Response Time (avg)1.66sResponse Time (max)3.56sResponse Time (total)16.61s…
Anti-AI Tricks
: 8.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.25sResponse Time (max)1.59sResponse Time (total)2.49s
Combined
: 4.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.56sResponse Time (max)3.56sResponse Time (total)3.56s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.41sResponse Time (max)1.41sResponse Time (total)1.41s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)963msResponse Time (max)963msResponse Time (total)963ms
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.13sResponse Time (max)1.13sResponse Time (total)1.13s
Instructions following
: 6.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.58sResponse Time (max)1.58sResponse Time (total)1.58s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.06sResponse Time (max)1.06sResponse Time (total)2.12s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.35sResponse Time (max)3.35sResponse Time (total)3.35s
A test is fully passed only if every run passed for that test.Wrong answer: 4Did not follow instructions: 1Response Time (avg)3.28sResponse Time (max)11.91sResponse Time (total)55.80s…
Anti-AI Tricks
: 8.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.12sResponse Time (max)3.18sResponse Time (total)8.50s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)11.91sResponse Time (max)11.91sResponse Time (total)11.91s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.00sResponse Time (max)3.74sResponse Time (total)5.99s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)2.36sResponse Time (max)3.51sResponse Time (total)7.07s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.49sResponse Time (max)1.66sResponse Time (total)2.99s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.76sResponse Time (max)5.08sResponse Time (total)8.27s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.54sResponse Time (max)9.54sResponse Time (total)9.54s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.82sResponse Time (max)7.69sResponse Time (total)19.26s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)13.88sResponse Time (max)13.88sResponse Time (total)13.88s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.19sResponse Time (max)6.42sResponse Time (total)12.38s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Timed out: 1Response Time (avg)71.07sResponse Time (max)194.23sResponse Time (total)213.22s
General Intelligence
: 6.1 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)10.05sResponse Time (max)10.05sResponse Time (total)10.05s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.38sResponse Time (max)5.70sResponse Time (total)10.77s
Puzzle Solving
: 7.3 A test is fully passed only if every run passed for that test.Did not follow instructions: 2Response Time (avg)5.44sResponse Time (max)7.26sResponse Time (total)16.32s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.84sResponse Time (max)9.84sResponse Time (total)9.84s
A test is fully passed only if every run passed for that test.Wrong answer: 3Did not follow instructions: 2Response Time (avg)18.95sResponse Time (max)100.41sResponse Time (total)322.23s…
Anti-AI Tricks
: 8.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.11sResponse Time (max)6.42sResponse Time (total)16.42s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)20.57sResponse Time (max)20.57sResponse Time (total)20.57s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.32sResponse Time (max)5.40sResponse Time (total)10.64s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)74.27sResponse Time (max)100.41sResponse Time (total)222.80s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.11sResponse Time (max)3.68sResponse Time (total)6.22s
Puzzle Solving
: 8.2 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)9.13sResponse Time (max)18.14sResponse Time (total)27.39s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)13.28sResponse Time (max)13.28sResponse Time (total)13.28s
A test is fully passed only if every run passed for that test.Wrong answer: 4Did not follow instructions: 1Response Time (avg)3.70sResponse Time (max)14.93sResponse Time (total)62.97s…
Anti-AI Tricks
: 9.1 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)2.33sResponse Time (max)3.89sResponse Time (total)9.30s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)14.93sResponse Time (max)14.93sResponse Time (total)14.93s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.29sResponse Time (max)2.31sResponse Time (total)4.59s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)4.21sResponse Time (max)5.86sResponse Time (total)12.62s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.16sResponse Time (max)3.16sResponse Time (total)3.16s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.91sResponse Time (max)1.93sResponse Time (total)3.82s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.58sResponse Time (max)4.41sResponse Time (total)10.75s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.80sResponse Time (max)3.80sResponse Time (total)3.80s
A test is fully passed only if every run passed for that test.Wrong answer: 4Did not follow instructions: 1Response Time (avg)11.88sResponse Time (max)95.48sResponse Time (total)201.89s…
Anti-AI Tricks
: 8.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)6.30sResponse Time (max)15.56sResponse Time (total)25.21s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)28.44sResponse Time (max)28.44sResponse Time (total)28.44s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.06sResponse Time (max)5.06sResponse Time (total)8.11s
Domain specific
: 5.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)37.34sResponse Time (max)95.48sResponse Time (total)112.01s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.62sResponse Time (max)2.78sResponse Time (total)5.24s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.94sResponse Time (max)6.33sResponse Time (total)11.83s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.20sResponse Time (max)6.20sResponse Time (total)6.20s
Anti-AI Tricks
: 8.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)30.72sResponse Time (max)44.23sResponse Time (total)122.88s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)93.11sResponse Time (max)93.11sResponse Time (total)93.11s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)36.09sResponse Time (max)39.12sResponse Time (total)72.18s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)39.32sResponse Time (max)79.03sResponse Time (total)117.95s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)35.78sResponse Time (max)47.30sResponse Time (total)71.56s
Puzzle Solving
: 8.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)36.87sResponse Time (max)59.22sResponse Time (total)110.62s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)34.81sResponse Time (max)34.81sResponse Time (total)34.81s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)23.66sResponse Time (max)25.06sResponse Time (total)47.32s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)28.96sResponse Time (max)28.96sResponse Time (total)28.96s
Data parsing and extraction
: 7.1 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)8.90sResponse Time (max)8.90sResponse Time (total)8.90s
Domain specific
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 2Timed out: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.25sResponse Time (max)7.25sResponse Time (total)7.25s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)15.64sResponse Time (max)16.34sResponse Time (total)31.27s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)15.93sResponse Time (max)15.93sResponse Time (total)15.93s
Anti-AI Tricks
: 8.7 A test is fully passed only if every run passed for that test.Extra formatting: 1Response Time (avg)19.75sResponse Time (max)49.95sResponse Time (total)79.01s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)163.96sResponse Time (max)163.96sResponse Time (total)163.96s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)30.26sResponse Time (max)32.03sResponse Time (total)60.52s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)79.53sResponse Time (max)95.52sResponse Time (total)238.59s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)19.66sResponse Time (max)32.25sResponse Time (total)39.32s
Puzzle Solving
: 8.2 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)64.61sResponse Time (max)123.57sResponse Time (total)193.84s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.45sResponse Time (max)7.45sResponse Time (total)7.45s
A test is fully passed only if every run passed for that test.Wrong answer: 3Timed out: 1Response Time (avg)29.05sResponse Time (max)119.29sResponse Time (total)493.86s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.75sResponse Time (max)18.03sResponse Time (total)39.01s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)107.79sResponse Time (max)107.79sResponse Time (total)107.79s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)23.41sResponse Time (max)29.79sResponse Time (total)46.83s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)63.40sResponse Time (max)119.29sResponse Time (total)190.20s
General Intelligence
: 3.4 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)34.11sResponse Time (max)34.11sResponse Time (total)34.11s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.88sResponse Time (max)15.44sResponse Time (total)19.76s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)17.18sResponse Time (max)31.99sResponse Time (total)51.55s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.60sResponse Time (max)4.60sResponse Time (total)4.60s
A test is fully passed only if every run passed for that test.Timed out: 2Wrong answer: 2Response Time (avg)39.13sResponse Time (max)81.20sResponse Time (total)391.29s…
Anti-AI Tricks
: 8.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)45.78sResponse Time (max)81.20sResponse Time (total)91.57s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)46.85sResponse Time (max)46.85sResponse Time (total)46.85s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)46.91sResponse Time (max)46.91sResponse Time (total)46.91s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)17.50sResponse Time (max)17.50sResponse Time (total)17.50s
General Intelligence
: 4.7 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)79.86sResponse Time (max)79.86sResponse Time (total)79.86s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)31.93sResponse Time (max)31.93sResponse Time (total)31.93s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)34.57sResponse Time (max)49.12sResponse Time (total)69.13s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.54sResponse Time (max)7.54sResponse Time (total)7.54s
A test is fully passed only if every run passed for that test.Wrong answer: 3Did not follow instructions: 1Response Time (avg)68.83sResponse Time (max)280.52sResponse Time (total)1101.32s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)43.87sResponse Time (max)121.88sResponse Time (total)131.62s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)280.52sResponse Time (max)280.52sResponse Time (total)280.52s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.16sResponse Time (max)8.54sResponse Time (total)14.31s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)127.58sResponse Time (max)133.93sResponse Time (total)382.74s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.25sResponse Time (max)5.25sResponse Time (total)5.25s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)46.33sResponse Time (max)134.22sResponse Time (total)139.00s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.73sResponse Time (max)7.73sResponse Time (total)7.73s
A test is fully passed only if every run passed for that test.Wrong answer: 3Did not follow instructions: 2Response Time (avg)15.76sResponse Time (max)100.93sResponse Time (total)267.97s…
Anti-AI Tricks
: 8.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.16sResponse Time (max)6.68sResponse Time (total)16.63s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)19.56sResponse Time (max)19.56sResponse Time (total)19.56s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.07sResponse Time (max)3.59sResponse Time (total)6.15s
Domain specific
: 5.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)64.31sResponse Time (max)100.93sResponse Time (total)192.94s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.04sResponse Time (max)3.44sResponse Time (total)6.07s
Puzzle Solving
: 9.0 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)5.12sResponse Time (max)8.73sResponse Time (total)15.37s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.37sResponse Time (max)6.37sResponse Time (total)6.37s
A test is fully passed only if every run passed for that test.Wrong answer: 3Did not follow instructions: 1Response Time (avg)13.94sResponse Time (max)43.55sResponse Time (total)237.01s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.90sResponse Time (max)19.37sResponse Time (total)39.60s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)34.95sResponse Time (max)34.95sResponse Time (total)34.95s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)14.95sResponse Time (max)15.40sResponse Time (total)29.90s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)22.08sResponse Time (max)43.55sResponse Time (total)66.23s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.54sResponse Time (max)11.67sResponse Time (total)15.07s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.11sResponse Time (max)7.52sResponse Time (total)18.34s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.87sResponse Time (max)5.87sResponse Time (total)5.87s
A test is fully passed only if every run passed for that test.Wrong answer: 3Did not follow instructions: 2Response Time (avg)27.78sResponse Time (max)168.71sResponse Time (total)472.24s…
Anti-AI Tricks
: 8.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)17.99sResponse Time (max)48.33sResponse Time (total)71.98s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)37.67sResponse Time (max)37.67sResponse Time (total)37.67s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.07sResponse Time (max)12.19sResponse Time (total)18.14s
Domain specific
: 5.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)88.74sResponse Time (max)168.71sResponse Time (total)266.21s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.26sResponse Time (max)9.02sResponse Time (total)14.52s
Puzzle Solving
: 9.0 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)11.03sResponse Time (max)13.85sResponse Time (total)33.09s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)12.38sResponse Time (max)12.38sResponse Time (total)12.38s
A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)9.06sResponse Time (max)26.24sResponse Time (total)90.58s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)14.99sResponse Time (max)26.24sResponse Time (total)29.99s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)10.37sResponse Time (max)10.37sResponse Time (total)10.37s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)10.84sResponse Time (max)10.84sResponse Time (total)10.84s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)7.01sResponse Time (max)7.01sResponse Time (total)7.01s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.34sResponse Time (max)9.34sResponse Time (total)9.34s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.26sResponse Time (max)3.26sResponse Time (total)3.26s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.91sResponse Time (max)4.23sResponse Time (total)7.81s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.96sResponse Time (max)11.96sResponse Time (total)11.96s
A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)5.95sResponse Time (max)14.72sResponse Time (total)101.19s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.48sResponse Time (max)4.31sResponse Time (total)13.94s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.27sResponse Time (max)3.27sResponse Time (total)3.27s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.40sResponse Time (max)14.72sResponse Time (total)18.80s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)8.05sResponse Time (max)14.40sResponse Time (total)24.15s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.68sResponse Time (max)3.68sResponse Time (total)3.68s
Instructions following
: 9.9 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.02sResponse Time (max)7.35sResponse Time (total)14.03s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.11sResponse Time (max)10.27sResponse Time (total)18.32s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.99sResponse Time (max)4.99sResponse Time (total)4.99s
A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)15.56sResponse Time (max)40.61sResponse Time (total)155.64s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.90sResponse Time (max)9.52sResponse Time (total)15.80s
Combined
: 9.5 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)40.61sResponse Time (max)40.61sResponse Time (total)40.61s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.72sResponse Time (max)7.72sResponse Time (total)7.72s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)32.73sResponse Time (max)32.73sResponse Time (total)32.73s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.77sResponse Time (max)11.77sResponse Time (total)11.77s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.56sResponse Time (max)9.56sResponse Time (total)9.56s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.15sResponse Time (max)8.49sResponse Time (total)14.30s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)23.15sResponse Time (max)23.15sResponse Time (total)23.15s
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.39sResponse Time (max)50.16sResponse Time (total)113.86s…
Total Tests: 17Wrong Tests: 0Attempt pass rate: 100.0%Flaky tests: 0…Output Tokens: 1,640Reasoning Tokens: 48,270Response time: avg 11.39s · total 113.86s · max 50.16s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.13sResponse Time (max)5.61sResponse Time (total)8.26s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)50.16sResponse Time (max)50.16sResponse Time (total)50.16s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.72sResponse Time (max)4.72sResponse Time (total)4.72s
Domain specific
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)21.12sResponse Time (max)21.12sResponse Time (total)21.12s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.09sResponse Time (max)4.09sResponse Time (total)4.09s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.10sResponse Time (max)6.10sResponse Time (total)6.10s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.43sResponse Time (max)4.68sResponse Time (total)8.85s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)10.55sResponse Time (max)10.55sResponse Time (total)10.55s