Sonia Murthy

54 posts

Sonia Murthy banner
Sonia Murthy

Sonia Murthy

@soniakmurthy

cs phd student @harvard · prev @allen_ai, @cocosci_lab, undergrad @princeton · she/her

Katılım Mayıs 2022
167 Takip Edilen356 Takipçiler
Sonia Murthy
Sonia Murthy@soniakmurthy·
Excited to be presenting our work on using cognitive models to interpret pluralistic values in LLMs once again as a spotlight talk 🌟 at the NeurIPS CogInterp workshop! Come by upper level room 5AB today and check out the paper here: arxiv.org/abs/2506.20666
CogInterp Workshop @ NeurIPS 2025@CogInterp

The spotlight talks will cover all aspects of interpreting cognition in deep learning models: from behavior to algorithms to representations! Also check out the list of poster presentations at coginterp.github.io/neurips2025/ac… (3/3)

English
0
2
8
936
Saleema Amershi
Saleema Amershi@SaleemaAmershi·
📢We're hiring! Join an incredible team building AI agents that work *with* people and contribute meaningfully to society. Details below 👇 P.S. I'll be at #NeurIPS2025 and #WiML this week. DM me to chat about agents🤖 or #MSR AI Frontiers!
English
9
5
67
64.3K
Sonia Murthy retweetledi
Sonia Murthy
Sonia Murthy@soniakmurthy·
@sarahcat21 Hi Sarah! I just gave a talk today where I proposed versions of each of these directions, so was really surprised to see this pop up on my feed - I’ll be at NeurIPS and would love to chat!
English
0
0
3
646
Sarah Catanzaro
Sarah Catanzaro@sarahcat21·
I'll be among dozens (hundreds?) of VCs attending NeurIPS this year, but among the few who might be more interested in topics like managing episodic memory with RL, avoiding model collapse when training with synthetic data, and more effectively using base models to guide exploration, than who is leading your seed round at $1B post. So ping me if you want to chat :)
English
9
6
156
23.4K
Auriel
Auriel@aurielws·
Doing a small Applied AI research dinner at @NeurIPSConf in San Diego with me and some other friends at other big labs (Gemini, Anthropic, Open AI) and Applied AI companies.  Can folks help recommend a friend (or yourself) that I should be inviting this year? I love meeting new people at #Neurips and the food will be comped 🙂
Auriel tweet mediaAuriel tweet mediaAuriel tweet media
English
41
10
229
26.1K
Sonia Murthy retweetledi
Eric Bigelow
Eric Bigelow@EricBigelow·
📝 New paper! Two strategies have emerged for controlling LLM behavior at inference time: in-context learning (ICL; i.e. prompting) and activation steering. We propose that both can be understood as altering model beliefs, formally in the sense of Bayesian belief updating. 1/9
GIF
English
8
21
133
33.2K
Sonia Murthy retweetledi
Kushin Mukherjee
Kushin Mukherjee@kushin_m·
Zach did a stellar job on our new paper looking at what recipes make for language models that are representationally aligned with humans! Read his tweetprint and recruit him for grad school!
Zach Studdiford@ZachStuddiford

We’re drowning in language models — there are over 2 mil. of them on Huggingface! Can we use some of them to understand which computational ingredients — architecture, scale, post-training, etc. – help us build models that align with human representations? Read on to find out 🧵

English
2
4
4
1.5K
Zach Studdiford
Zach Studdiford@ZachStuddiford·
We’re drowning in language models — there are over 2 mil. of them on Huggingface! Can we use some of them to understand which computational ingredients — architecture, scale, post-training, etc. – help us build models that align with human representations? Read on to find out 🧵
Zach Studdiford tweet media
English
1
3
9
3K
Sonia Murthy
Sonia Murthy@soniakmurthy·
@kiran_tomlinson hey Kiran! couldn’t message you but I’d love to learn more about these openings/projects if you have some time to chat this week? 🙂
English
0
0
0
132
Kiran Tomlinson
Kiran Tomlinson@kiran_tomlinson·
My team at Microsoft Research is hiring PhD interns for next summer! If you’re interested in understanding or improving human-LLM systems, apply here: jobs.careers.microsoft.com/global/en/shar… Topics we’re studying include LLM personalization, reasoning, collaboration, benchmarking, ++
English
2
3
27
2.4K
Sonia Murthy
Sonia Murthy@soniakmurthy·
We also trace the evolution of value trade-offs during alignment by evaluating model checkpoints for 8 unique base model x feedback dataset x alignment algorithm. We see the largest shifts in values early on in training, with strongest effects of base model choice.
Sonia Murthy tweet media
English
1
0
2
263
Sonia Murthy
Sonia Murthy@soniakmurthy·
Excited to present our new paper as a spotlight talk 🌟 at the Pragmatic Reasoning in LMs workshop at #COLM2025 this Friday! 🍁 Come by room 520B @ 11:30am tomorrow to learn more about how LLMs' pluralistic values evolve over reasoning budgets and alignment 🧵
Sonia Murthy tweet media
English
1
5
31
10.6K
Sonia Murthy retweetledi
Apoorv Khandelwal
Apoorv Khandelwal@apoorvkh·
In our new paper, we ask whether language models solve compositional tasks using compositional mechanisms. 🧵
Apoorv Khandelwal tweet media
English
4
26
183
14.5K
Sonia Murthy
Sonia Murthy@soniakmurthy·
Presenting this today (5/1) at the 4pm poster session (Hall 3) at #NAACL2025! Come chat about alignment, personalization, and all things cognitive science 🐟
Sonia Murthy@soniakmurthy

(1/9) Excited to share my recent work on "Alignment reduces LM's conceptual diversity" with @TomerUllman and @jennhu, to appear at #NAACL2025! 🐟 We want models that match our values...but could this hurt their diversity of thought? Preprint: arxiv.org/abs/2411.04427

English
0
1
21
829
Sonia Murthy
Sonia Murthy@soniakmurthy·
Many thanks to my collaborators and @KempnerInst for helping make this idea come to life!🌱
English
0
1
2
586