Laura Hanu

212 posts

Laura Hanu banner
Laura Hanu

Laura Hanu

@HanuLaura

next gen computer-use agents @salesforce | ex @convergence_ai_ @UnitaryAI @CompSciOxford @imperialcollege | interested in steering ai to do more good than bad

London, England Katılım Ağustos 2018
507 Takip Edilen618 Takipçiler
Laura Hanu
Laura Hanu@HanuLaura·
main takeaways from the @dwarkeshpodcast @karpathy interview: *RL limitations*: hard to get past reward sparsity problem in RL when it comes to real world tasks, one promising direction could be more sample efficient learning by reflecting on mistakes *reduced model size by getting rid of redundant memorisation*: soon the cognitive engine might look like a 1 billion parameter model that just knows how to think without the need to memorise lots of data - it should be able to recognise what it doesn't know and look it up *gradual loss of control and understanding*: even if we still have humans delegating tasks to autonomous entities, it will get increasingly harder to fully control them, let alone understand what they're doing; similar thoughts in this great albeit on the pessimistic side paper arxiv.org/abs/2501.16946 *role of AI education post-AGI*: pre-AGI AI education is useful, post-AGI education is fun, could be seen as a way to train mentally just how you do physically
English
0
0
0
101
Laura Hanu
Laura Hanu@HanuLaura·
pantheon is now up there with some of my favourite sci fi of all time like asimov’s the end of eternity, the last question or story of your life with some elements of snow crash or dark the tv show but upping the scale even more, dare i say something akin to what @DavidDeutschOxf was describing in the beginning of infinity
English
0
0
0
206
Laura Hanu
Laura Hanu@HanuLaura·
a nice straight forward summary on some of the grpo limitations beyond just not being great for multi-turn e.g. * if you have multiple reward signals -> the model won't know which one it is being rewarded for since they're usually all collapsed into one * only the scalar reward is used for policy update when a more detailed textual feedback could be used (what gepa kinda does with their reflective prompt evolution)
arya ꩜@aryagxr

wrote a short blogpost on what I think are some limitations of GRPO: I’ve been playing around with RL finetuning for reasoning tasks and came across a few limitations that i wanted to document here feedback/corrections are welcome!

English
0
0
0
187
Laura Hanu
Laura Hanu@HanuLaura·
It's been interesting to see RL having a comeback lately. Guess in retrospect it's not surprising that RLHF is not enough since it naturally hits a ceiling limited by the quality of human feedback. What's particularly exciting though is that this shift is happening just as AI is becoming more agentic and able to interact with the digital world. Combine this with some grounded reward signals (e.g. profit, likes, citations) and we should see AI not just replicate what humans do but create new knowledge through their own experiences. The next few years should be a fun and wild ride 🎢 For another interesting paper on the topic: incompleteideas.net/papers/TheEraO… or this podcast on it: youtu.be/zzXyPGEtseI?si…
YouTube video
YouTube
English
0
0
0
82
Laura Hanu
Laura Hanu@HanuLaura·
Why does RL work so well to learn preferred completions in llm post training and why can't we just use supervised finetuning? This paper has an interesting information-theoretic explanation: arxiv.org/pdf/2503.01067 TLDR: The literature shows that the 2 stage RL approach (1. learn a good reward model that can score llm completions similar to how a human would 2. use that to learn good policies with RL) used today in the sota models outperforms using only supervised finetuning. The authors posit that this happens when the verification of an output is simpler than generating it. This is because, in the RL case, the search space is constrained to the subset of policies that are optimal for the learnt reward model. When they reduce the verification-generation gap empirically the difference in results diminishes too.
English
1
0
0
168
Laura Hanu
Laura Hanu@HanuLaura·
great discussion! we need more public discourse between top ai leaders and economists 🤝 would’ve liked to see them engage with hinton’s concern that the benefits ai brings will deepen the economic divide in a capitalist system further creating fertile ground for fascism 🌶️
The Nobel Prize@NobelPrize

Watch our 2024 Nobel Prize laureates talk about their research and careers in a unique roundtable discussion, 'Nobel Minds', moderated by BBC's Zeinab Badawi. youtu.be/1tELlYbO_U8

English
0
0
2
282
Laura Hanu retweetledi
Saining Xie
Saining Xie@sainingxie·
Introducing Cambrian-1, a fully open project from our group at NYU. The world doesn't need another MLLM to rival GPT-4V. Cambrian is unique as a vision-centric exploration & here's why I think it's time to shift focus from scaling LLMs to enhancing visual representations.🧵[1/n]
Saining Xie tweet media
English
17
246
1.1K
332.9K
Laura Hanu
Laura Hanu@HanuLaura·
The 2nd part of the Intro to multi-node machine learning is out! 🥳 This all about how to use Slurm to scale up your ML applications! 🚀 unitary.ai/articles/intro…
English
1
4
9
1.1K
Laura Hanu
Laura Hanu@HanuLaura·
Excited to kick off a new deep-dive blog series on how to build and set up the infrastructure for distributed training from scratch. Spoiler alert – setting up a cloud-based cluster that can scale to hundreds of nodes isn't as daunting as it sounds! 👀🚀 unitary.ai/articles/intro…
English
0
2
6
1.1K
Laura Hanu retweetledi
Anita Verő
Anita Verő@anitaveroe·
If you are at ICCV in Paris next week, come to talk to us at our poster at the "What is Next in Multimodal Foundation Models?" workshop! The title is: "Language as the Medium: Multimodal Video Classification through text only" Paper link: arxiv.org/pdf/2309.10783…
Anita Verő tweet media
English
0
4
5
486
Laura Hanu
Laura Hanu@HanuLaura·
Can we use LLMs like GPT3.5/Claude/Llama2 to directly classify multimodal content like videos in-context with no training? 👀 Excited to share our findings at the "What is Next in Multimodal Foundation Models?" workshop at #ICCV2023 @anitaveroe @jdthewlis arxiv.org/abs/2309.10783
English
0
0
5
510
Laura Hanu
Laura Hanu@HanuLaura·
So fun to see so many people interested in LLMs, MLOps and generative AI in London!🐝
Laura Hanu tweet media
Emad@EMostaque

Amazing event by Lukas and our friends @wandb, well over 1,000 people showing up to listen to and discuss machine learning ops & AI! Great energy and excitement, London will be amazing AI hub 🦄 🇬🇧

English
0
1
8
622