Thoughtful retweetledi
Thoughtful
4 posts

Thoughtful retweetledi

great to see that PostTrainBench is covered (again) by @jackclarkSF in his Substack — now in much more detail than before!
importai.substack.com/p/importai-449…

English
Thoughtful retweetledi

How do we actually measure self-improvement is a hard problem we should be taking shots at
Karina Nguyen@karinanguyen
Excited to release PostTrainBench v1.0! This benchmark evaluates the ability of frontier AI agents to post-train language models in a simplified setting. We believe this is a first step toward tracking progress in recursive self-improvement 🧵:
English
Thoughtful retweetledi
