
@rashmor_eth Impressive. Being able to predict strengths/weaknesses from the methodology alone is not trivial.
We've seen quite a few cases where tools that look strong on paper (based on design or prompting strategy) don’t necessarily perform as expected once you run them against evals.
English










