Sabitlenmiş Tweet
Heather Gorham
3 posts

Heather Gorham
@heather_gorham_
Product of two trekkies.
London Katılım Ocak 2023
173 Takip Edilen97 Takipçiler

I had the privilege of sitting down with Ida Momennejad, Principal Researcher in Reinforcement Learning at Microsoft Research. Her years of brilliant research on multi-step planning and cognition could not be more timely and important amid today's conversations around agentic AI. Here are some of the highlights from our conversation, many from her published papers. I encourage you to read more and share your thoughts, linked below.
- Despite many claims on what LLMs can do, they're not good at executive functions like abstract reasoning and multi-step planning. Importantly, to understand where the gaps are in their ability (and to close these gaps), we need to have methods for proper evaluation established. Ida studied if LLMs could extract cognitive maps to use in planning and inference (spoiler alert - they could not).
See: microsoft.com/en-us/research…
- To enable agents and systems to perform better at these kinds of tasks, Ida proposes that LLMs need a prefrontal cortex and thus recommends a modular system approach for agentic AI. Different modules (agents) function as the generator, orchestrator, evaluator, predictor, etc. They work together in different roles to better accomplish tasks that require multi-step planning and accomplishing compositional tasks like decision making. This is similar to the framework also seen in the Microsoft's Autogen framework.
See: arxiv.org/abs/2310.00194
microsoft.com/en-us/research…
- We wrapped the conversation on Human-AI alignment. Her recent paper on multi-player games studied the differences in behavior in humans vs. AI agents in game play. This is important in scaling AI safely, as well as better designing behavior in agents that humans enjoy interacting with. Ida proposes the “Task-sets” framework, an interpretable approach towards human-AI alignment.
See: arxiv.org/abs/2402.03575
English

@eladgil I would note that in the speed and quality conversation both model + underlying hardware matter.
English



