Uljad
344 posts

Uljad
@uljadb99
AI PhD student @UniOfOxford @aims_oxford, prev AI Research @JPMorgan, EE with Great Distinction @nyuniversity, Comedian at times, Tiramisu enthusiast


Following the ongoing situation in Iran, I am convening a special Security College on Monday. For regional security and stability, it is of the utmost importance that there is no further escalation through Iran’s unjustified attacks on partners in the region.

RLC attendees will also enjoy the banquet featuring a theatrical dinner show by Cirque du Soleil (LUDŌ): cirquedusoleil.com/ludo All the more reason not to miss the chance to be part of RLC 2026!

🚨 New Benchmark Alert!! 🚨 Navigate Wikipedia hyperlinks step-by-step. No map. Just planning and world knowledge! We evaluated 20+ models on 3 difficulty levels: Gemini-3: 95% → 66% → 23% GPT-5: 92.5% → 60% → 15% Opus 4.5: 91.5% → 56% → 18% We discover a Planning Gap!🧵

Long-tail scenarios remain a major challenge for autonomous driving. Unusual events—like accidents or construction zones—are underrepresented in driving data, yet require semantic and commonsense reasoning grounded in control. We propose SteerVLA, a framework that uses VLM reasoning to steer a driving policy via grounded, fine-grained language instructions. Paper: arxiv.org/abs/2602.08440 Website: steervla.github.io

What if LLMs knew when to stop? 🚧 HALT finetuning teaches LLMs to only generate content they’re confident is correct. 🔍 Insight: Post-training must be adjusted to the model’s capabilities. ⚖️ Tunable trade-off: Higher correctness 🔒 vs. More completeness 📝 with @AIatMeta 🧵

HALT (“High Accuracy, Less Talk”) accepted to ICLR 2026 🎉 LLMs are trained to always finish answers — even past what they truly know — causing partially wrong outputs. HALT instead finetunes models to stop when confidence drops, trading completeness for reliability 🚧 👇










