

Satwik Bhattamishra
250 posts

@satwik1729
CS PhD student at Oxford | Worked at Google, Cohere, and Microsoft Research















We’re releasing LongCoT, an incredibly hard benchmark to measure long-horizon reasoning capabilities over tens to hundreds of thousands of tokens. LongCoT consists of 2.5K questions across chemistry, math, chess, logic, and computer science. Frontier models score less than 10%🧵

We're looking for a founding ML engineer in Toronto. You'll have a lot of autonomy and compute to make a new genre of social software.

We’re hiring PhD students and postdocs on LLM theory and interpretability! Topics: 1️⃣ abilities & limitations of transformers and other architectures; 2️⃣ LLM interpretability; 3️⃣ foundations of LLM reasoning; 4️⃣ foundations of AI safety.