Sonia Murthy (@soniakmurthy) - Twitter Profili | Zamantika Mersobahis Locabet

Sonia Murthy@soniakmurthy·7 Ara

Excited to be presenting our work on using cognitive models to interpret pluralistic values in LLMs once again as a spotlight talk 🌟 at the NeurIPS CogInterp workshop! Come by upper level room 5AB today and check out the paper here: arxiv.org/abs/2506.20666

CogInterp Workshop @ NeurIPS 2025@CogInterp

The spotlight talks will cover all aspects of interpreting cognition in deep learning models: from behavior to algorithms to representations! Also check out the list of poster presentations at coginterp.github.io/neurips2025/ac… (3/3)

English

0

2

8

991

Sonia Murthy@soniakmurthy·3 Ara

@SaleemaAmershi @adamfourney @ASwearngin77874 @bansalg_ @HsseinMzannar @HuaWenyue31539 @w_epperson @ZacharyHuang12 @MayaMurad0 @ecekamar @HosnRafa hi all! I'll be at neurips and would love to learn more about the phd internships on your team, especially any projects in human-AI interaction and safety. my dms should be open 🙂

English

0

107

Saleema Amershi@SaleemaAmershi·2 Ara

Reach out to me or these amazing humans to learn more about our team or #MSR AI Frontiers! @adamfourney @ASwearngin77874 @bansalg_ @HsseinMzannar Tyler Payne @HuaWenyue31539 @w_epperson @ZacharyHuang12 @MayaMurad0 @ecekamar @HosnRafa

English

2

6

741

Saleema Amershi@SaleemaAmershi·2 Ara

📢We're hiring! Join an incredible team building AI agents that work *with* people and contribute meaningfully to society. Details below 👇 P.S. I'll be at #NeurIPS2025 and #WiML this week. DM me to chat about agents🤖 or #MSR AI Frontiers!

English

9

5

66

64.6K

Sonia Murthy@soniakmurthy·1 Ara

bruce is great at making research resources and this one has been a huge help for my human studies in the stream! check it out ✨

Bruce W. Lee@BruceWLee2

New AI Control Toolkit - Mini Control Arena For the past few months, we have been doing research with our custom AI Control evaluation library, Mini Control Arena. Mini Control Arena is a ground-up rewrite of UK AISI Control Arena for a much simpler code structure. We are open-sourcing the codebase and hope it helps with your experiments, too! github.com/brucewlee/mini…

English

0

3

358

Sonia Murthy retweetledi

Tomek Korbak@tomekkorbak·30 Kas

My rockstar MATS mentee @BruceWLee2 has just open-sourced his sleek and elegant codebase for AI control research, ppl should give it a try!

Bruce W. Lee@BruceWLee2

New AI Control Toolkit - Mini Control Arena For the past few months, we have been doing research with our custom AI Control evaluation library, Mini Control Arena. Mini Control Arena is a ground-up rewrite of UK AISI Control Arena for a much simpler code structure. We are open-sourcing the codebase and hope it helps with your experiments, too! github.com/brucewlee/mini…

English

0

10

103

12.9K

Sonia Murthy@soniakmurthy·19 Kas

@sarahcat21 Hi Sarah! I just gave a talk today where I proposed versions of each of these directions, so was really surprised to see this pop up on my feed - I’ll be at NeurIPS and would love to chat!

English

0

3

646

Sarah Catanzaro@sarahcat21·19 Kas

I'll be among dozens (hundreds?) of VCs attending NeurIPS this year, but among the few who might be more interested in topics like managing episodic memory with RL, avoiding model collapse when training with synthetic data, and more effectively using base models to guide exploration, than who is leading your seed round at $1B post. So ping me if you want to chat :)

English

9

6

156

23.4K

Sonia Murthy@soniakmurthy·12 Kas

@aurielws @NeurIPSConf I’d love to join!

English

0

1

212

Auriel@aurielws·11 Kas

Doing a small Applied AI research dinner at @NeurIPSConf in San Diego with me and some other friends at other big labs (Gemini, Anthropic, Open AI) and Applied AI companies. Can folks help recommend a friend (or yourself) that I should be inviting this year? I love meeting new people at #Neurips and the food will be comped 🙂

English

41

10

229

26.1K

Sonia Murthy retweetledi

Eric Bigelow@EricBigelow·11 Kas

📝 New paper! Two strategies have emerged for controlling LLM behavior at inference time: in-context learning (ICL; i.e. prompting) and activation steering. We propose that both can be understood as altering model beliefs, formally in the sense of Bayesian belief updating. 1/9

GIF

English

8

21

136

33.6K

Sonia Murthy retweetledi

Kushin Mukherjee@kushin_m·21 Eki

Zach did a stellar job on our new paper looking at what recipes make for language models that are representationally aligned with humans! Read his tweetprint and recruit him for grad school!

Zach Studdiford@ZachStuddiford

We’re drowning in language models — there are over 2 mil. of them on Huggingface! Can we use some of them to understand which computational ingredients — architecture, scale, post-training, etc. – help us build models that align with human representations? Read on to find out 🧵

English

2

4

1.5K

Sonia Murthy@soniakmurthy·22 Eki

@ZachStuddiford @siddsuresh97 @kushin_m hi this is cool work! I might be biased because I worked on something that has a very similar spirit arxiv.org/abs/2506.20666, but was excited to see y'all support the motivations and importance we saw around this kind of LLM analysis 😀

English

1

0

1

103

Zach Studdiford@ZachStuddiford·21 Eki

Thanks to @siddsuresh97, @kushin_m, and Tim Rogers for their support on this work! I’m excited to be applying to grad school this cycle so reach out if you want to chat about this Code 💻: github.com/Knowledge-and-… Paper 📄: arxiv.org/abs/2510.01030 10/10

English

1

0

5

324

Zach Studdiford@ZachStuddiford·21 Eki

We’re drowning in language models — there are over 2 mil. of them on Huggingface! Can we use some of them to understand which computational ingredients — architecture, scale, post-training, etc. – help us build models that align with human representations? Read on to find out 🧵

English

1

3

9

3K

Sonia Murthy@soniakmurthy·14 Eki

@kiran_tomlinson hey Kiran! couldn’t message you but I’d love to learn more about these openings/projects if you have some time to chat this week? 🙂

English

0

133

Kiran Tomlinson@kiran_tomlinson·9 Eki

My team at Microsoft Research is hiring PhD interns for next summer! If you’re interested in understanding or improving human-LLM systems, apply here: jobs.careers.microsoft.com/global/en/shar… Topics we’re studying include LLM personalization, reasoning, collaboration, benchmarking, ++

English

2

3

27

2.5K

Sonia Murthy@soniakmurthy·9 Eki

Thanks to my lovely collaborators @rosieyzh, @_jennhu, @ShamKakade6, @m_wulfmeier, Peng Qian, and @TomerUllman and the Kempner Institute! 🧠 [end]

English

0

1

215

Sonia Murthy@soniakmurthy·9 Eki

We also trace the evolution of value trade-offs during alignment by evaluating model checkpoints for 8 unique base model x feedback dataset x alignment algorithm. We see the largest shifts in values early on in training, with strongest effects of base model choice.

English

1

0

2

268

Sonia Murthy@soniakmurthy·9 Eki

Excited to present our new paper as a spotlight talk 🌟 at the Pragmatic Reasoning in LMs workshop at #COLM2025 this Friday! 🍁 Come by room 520B @ 11:30am tomorrow to learn more about how LLMs' pluralistic values evolve over reasoning budgets and alignment 🧵

English

1

5

31

10.7K

Sonia Murthy retweetledi

Apoorv Khandelwal@apoorvkh·7 Eki

In our new paper, we ask whether language models solve compositional tasks using compositional mechanisms. 🧵

English

4

26

182

14.7K

Sonia Murthy@soniakmurthy·1 May

Presenting this today (5/1) at the 4pm poster session (Hall 3) at #NAACL2025! Come chat about alignment, personalization, and all things cognitive science 🐟

Sonia Murthy@soniakmurthy

(1/9) Excited to share my recent work on "Alignment reduces LM's conceptual diversity" with @TomerUllman and @jennhu, to appear at #NAACL2025! 🐟 We want models that match our values...but could this hurt their diversity of thought? Preprint: arxiv.org/abs/2411.04427

English

0

1

21

834

Sonia Murthy retweetledi

Kempner Institute at Harvard University@KempnerInst·10 Şub

NEW blog post: Do modern #LLMs capture the conceptual diversity of human populations? #KempnerInstitute researchers find #alignment reduces conceptual diversity of language models. Read more: bit.ly/4hNjtiI @soniakmurthy @tomerullman @_jennhu

Kempner Institute at Harvard University tweet media

English

0

4

21

5.1K

Sonia Murthy@soniakmurthy·10 Şub

Many thanks to my collaborators and @KempnerInst for helping make this idea come to life!🌱

English

0

1

2

588

Sonia Murthy@soniakmurthy·10 Şub

(9/9) Code and data for our experiments can be found at: github.com/skmur/onefish-… Preprint: arxiv.org/abs/2411.04427 Also, check out our feature in the @KempnerInst Deeper Learning Blog! bit.ly/417WVDL

English

1

0

289

Sonia Murthy@soniakmurthy·10 Şub

(1/9) Excited to share my recent work on "Alignment reduces LM's conceptual diversity" with @TomerUllman and @jennhu, to appear at #NAACL2025! 🐟 We want models that match our values...but could this hurt their diversity of thought? Preprint: arxiv.org/abs/2411.04427

English

3

14

73

7.2K

Sonia Murthy

Keşfet