
We’re talking about Goblins. openai.com/index/where-th…
Himanshu Gaurav Singh
140 posts

@Cinnabar233
phd @berkeley_ai, prev iitd

We’re talking about Goblins. openai.com/index/where-th…



Can LLMs Self-Verify? Much better than you'd expect. LLMs are increasingly used as parallel reasoners, sampling many solutions at once. Choosing the right answer is the real bottleneck. We show that pairwise self-verification is a powerful primitive. Introducing V1, a framework that unifies generation and self-verification: 💡 Pairwise self-verification beats pointwise scoring, improving test-time scaling 💡 V1-Infer: Efficient tournament-style ranking that improves self-verification 💡 V1-PairRL: RL training where generation and verification co-evolve for developing better self-verifiers 🧵👇



Now that phantom citations hallucinated by LLMs have been found in NeurIPS papers, what is to be done? Develop a software tool that authors are expected to run to verify their references in Google Scholar. Next, conferences use it to screen papers, and desk reject violators.




It's so o'1'ver. JEE is too easy for o1. Performs close to 80-90%








NPTEL, which started well before Coursera, is still going strong. If it had been "founded" closer to San Jose, its founders would have by now entered "tech" mythology.

July has been a big month for Viser! - Released v1.0.0😊 - We did some writing Some demos👇