Uljad

344 posts

Uljad

@uljadb99

AI PhD student @UniOfOxford @aims_oxford, prev AI Research @JPMorgan, EE with Great Distinction @nyuniversity, Comedian at times, Tiramisu enthusiast

London, England Katılım Ekim 2021

467 Takip Edilen301 Takipçiler

Sabitlenmiş Tweet

Uljad@uljadb99·18 Tem

Unlock real diversity in your LLM! 🚀 LLM outputs can be boring and repetitive. Today, we release Intent Factored Generation (IFG) to: - Sample conceptually diverse outputs💡 - Improve performance on math and code reasoning tasks🤔 - Get more engaging conversational agents 🤖

GIF

English

8.2K

Uljad@uljadb99·4d

The IDEs of March

English

Uljad@uljadb99·13 Mar

@SamuelAlbanie *Alexandr

Français

Samuel Albanie 🇬🇧@SamuelAlbanie·12 Mar

alexander wept, for there were no more benchmarks to saturate

English

5.6K

Uljad retweetledi

Google DeepMind@GoogleDeepMind·12 Mar

We’re excited to unveil the name of our new London building: Platform 37. 📍 The name honors both the surrounding area’s transport heritage and "Move 37" – the critical moment where our AI system AlphaGo showed it could find novel solutions humans hadn't considered.

English

188

2.2K

351.4K

Uljad retweetledi

Dimitris Papailiopoulos@DimitrisPapail·4 Mar

My precise feelings, after started using Claude Code and Codex

English

379

16.1K

Uljad retweetledi

OATML_Oxford@OATML_Oxford·24 Şub

Details, recommended best practices, and more are in the blog post yonatan.gideoni.com/blog/what_matt… and paper arxiv.org/abs/2602.16805 . This research was done in collaboration with Sakana AI (@SakanaAILabs)

English

1.1K

Uljad@uljadb99·1 Mar

Monday after breakfast ofc

Ursula von der Leyen@vonderleyen

Following the ongoing situation in Iran, I am convening a special Security College on Monday. For regional security and stability, it is of the utmost importance that there is no further escalation through Iran’s unjustified attacks on partners in the region.

English

Uljad@uljadb99·27 Şub

@bubbleboi You're exempt due to the 🇦🇱 🧬

English

bubble boi@bubbleboi·26 Şub

Literally me

AAAAAAAAAA@AAAAA4A4AAAAA

English

3.5K

Uljad@uljadb99·26 Şub

Canada has one of the worst, most inefficient and expensive visa application processes in the world. Much worse than the USA for visitor's visas. This incurs an unfair burden to academics born with non-privileged passports. Nice touch with the circus though.

RL_Conference@RL_Conference

RLC attendees will also enjoy the banquet featuring a theatrical dinner show by Cirque du Soleil (LUDŌ): cirquedusoleil.com/ludo All the more reason not to miss the chance to be part of RLC 2026!

English

548

Uljad retweetledi

Taco Cohen@TacoCohen·25 Şub

In case anyone else is also confused about all this newfangled terminology, here is a picture of a Harness on top of a Scaffold in the middle of an Environment. Follow for more frontier educational material!

English

2.2K

Uljad@uljadb99·26 Şub

Wut

Matt Henderson@matthen2

There is a subtle bug in @vllm_project's implementation of the @MistralAI mistral3 architecture. vLLM silently computes incorrect rotary embeddings, scaling cos/sin by ~1.28x vs HuggingFace. This causes different (worse) generations with no warning.

QST

235

Uljad@uljadb99·24 Şub

Get Busy Planning!

William Bankes@bankes_william

Checkout our new reasoning and planning benchmark based around a simple game I used to play at school. Awesome idea from @JuliuszZiomek, always great working with @shyam91019594, @lorenz_wlf, @xiaohang_tang, and @ilijabogunovic!

English

133

Uljad retweetledi

William Bankes@bankes_william·23 Şub

Juliusz Ziomek@JuliuszZiomek

🚨 New Benchmark Alert!! 🚨 Navigate Wikipedia hyperlinks step-by-step. No map. Just planning and world knowledge! We evaluated 20+ models on 3 difficulty levels: Gemini-3: 95% → 66% → 23% GPT-5: 92.5% → 60% → 15% Opus 4.5: 91.5% → 56% → 18% We discover a Planning Gap!🧵

English

575

Uljad retweetledi

Sergey Levine@svlevine·17 Şub

VLAs can enable vehicles to better handle complex edge cases: a VLM can "think through" a complex interaction, deduce a common sense behavior, and then a VLA can carry that out to maintain safe(r) behavior even in unusual situations.

Tian Gao@TianGao_19

Long-tail scenarios remain a major challenge for autonomous driving. Unusual events—like accidents or construction zones—are underrepresented in driving data, yet require semantic and commonsense reasoning grounded in control. We propose SteerVLA, a framework that uses VLM reasoning to steer a driving policy via grounded, fine-grained language instructions. Paper: arxiv.org/abs/2602.08440 Website: steervla.github.io

English

147

26K

Uljad retweetledi

Tim Franzmeyer@frtimlive·15 Şub

HALT (“High Accuracy, Less Talk”) accepted to ICLR 2026 🎉 LLMs are trained to always finish answers — even past what they truly know — causing partially wrong outputs. HALT instead finetunes models to stop when confidence drops, trading completeness for reliability 🚧 👇

Tim Franzmeyer@frtimlive

What if LLMs knew when to stop? 🚧 HALT finetuning teaches LLMs to only generate content they’re confident is correct. 🔍 Insight: Post-training must be adjusted to the model’s capabilities. ⚖️ Tunable trade-off: Higher correctness 🔒 vs. More completeness 📝 with @AIatMeta 🧵

English

1.5K

Uljad@uljadb99·17 Şub

Awesome paper that makes very useful points about what post-training actually does. Helped me build a better intuition, and the results lead to more truthful and reliable agents

Tim Franzmeyer@frtimlive

English

588

Uljad@uljadb99·7 Şub

Excellent point

lucy 🐧@uneventual

it’s crazy that in the end noam chomsky isn’t even going to be the most important noam in the history of linguistics

Français

Uljad@uljadb99·4 Şub

@daphne_cor @Substack This is hilarious 😂

English

Daphne Cornelisse@daphne_cor·4 Şub

@Substack Please explain why you have an option for adding in poetry but no inline LaTeX... :(

English

240

Uljad@uljadb99·3 Şub

@alexUnder_sky @natolambert Blasphemous

English

sacha🥝@alexUnder_sky·3 Şub

@uljadb99 @natolambert just don't use dreamer_v* and you are good to go

English

Uljad@uljadb99·3 Şub

Would be great to write a refreshed version of this for MBRL in 2026. Maybe I should 👀 Good example by @natolambert natolambert.com/writing/debugg… I wish I read this earlier into my PhD. Maybe my World Model autocurricula work would have scaled better

English

258

Uljad@uljadb99·3 Şub

@willccbb Infra for Bandits?

English

246

will brown@willccbb·3 Şub

the infra that enables you to A/B test models or prompts is basically the same infra that lets you do reinforcement learning

English

286

15K

Uljad@uljadb99·3 Şub

It wouldn't but it makes for punchier post to assume it would

English

Keşfet

@SamuelAlbanie @SakanaAILabs @bubbleboi @JuliuszZiomek @shyam91019594 @lorenz_wlf @xiaohang_tang @ilijabogunovic