Sabitlenmiş Tweet

Static LLM benchmarks weren't built for social dynamics.
@GoogleDeepMind's Crystal Qian makes the case for human-centric augmentation evals - testing model checkpoints dynamically, in group settings.
Part 2 of our conversation with the PAIR team → youtu.be/Mk9EFnXfBxA

YouTube
English















