Bayan Bruss

825 posts

Bayan Bruss

@cbbruss

VP of Applied AI Research @CapitalOne | Adjunct @Georgetown | Simple baselines, practical implementations.

Washington, D.C. Katılım Nisan 2009

1.1K Takip Edilen562 Takipçiler

Bayan Bruss retweetledi

Tom Goldstein@tomgoldsteincs·9 Eyl

Our new guardian model lets you create LLM guardrails using natural text. This little 8B model efficiently checks in real time whether chatbots comply with bespoke moderation policies. It's not often that academics beats industry models, but DynaGuard stacks up well!

Monte Hoover@MonteBHoover

Guardrails with custom polices are hard for models trained on safety and harm-related datasets. But what if you trained a guardian model on arbitrary rules? Introducing DynaGuard, a guardian model for custom policies: arxiv.org/abs/2509.02563

English

Bayan Bruss retweetledi

Monte Hoover@MonteBHoover·8 Eyl

There is still a lot of brittleness in getting guardian models to incorporate custom policies, but we think this is a step in the right direction. Try out DynaGuard in this interactive demo (and give us feedback to improve it!): huggingface.co/spaces/tomg-gr…

English

378

Bayan Bruss retweetledi

Monte Hoover@MonteBHoover·8 Eyl

Paper: arxiv.org/abs/2509.02563 Models: huggingface.co/collections/to… Dataset: huggingface.co/datasets/tomg-… This was a collaborative effort with @neeljain1717, @k_saifullaah, @taruschirag, @vatsalbaherwani, Joseph Vincent, Melissa Kazemi Rad, @cbbruss, @PandaAshwinee, @tomgoldsteincs

English

467

Bayan Bruss retweetledi

Monte Hoover@MonteBHoover·8 Eyl

English

12.9K

Bayan Bruss retweetledi

Furong Huang@furongh·4 Eyl

Your critic model is secretly a strong policy model. Stay tuned for a deep dive 🤩

AK@_akhaliq

LLaVA-Critic-R1 Your Critic Model is Secretly a Strong Policy Model

English

2.7K

Bayan Bruss@cbbruss·26 Ağu

@chrmanning We had that for our flight back to Dulles this summer. It was as if they parked the plane at a different airport.

English

295

Christopher Manning@chrmanning·26 Ağu

Bussed to our Lufthansa plane in Frankfurt, our jet was the second most distant one in the whole airport – there was one Condor jet beyond it. I guess that’s what you get for flying to Slovenia. 🇸🇮

English

11.6K

Bayan Bruss@cbbruss·22 Ağu

@scaling01 Why are they even measuring it in pages. Tax_propmt_final_final_v2.docx

English

Lisan al Gaib@scaling01·22 Ağu

But does it work? And what does the first page look like?

ex nihilo (dub edition remix)@flex_nihilo

100 page prompt is crazy

English

3.1K

Bayan Bruss retweetledi

Irina Rish@irinarish·18 Ağu

I am looking for a postdoc to lead projects related to this collaboration, on scaling laws, emergence and interpretability in pre- and post-training & inference/reasoning, in multimodal foundation models (language, time series, tabular data etc). HPC experience is a plus.

Irina Rish@irinarish

So excited to join the new @SimonsFdn Simons Collaboration on the Physics of Learning and Neural Computation, to further advance our understanding (mech interp) of learning & reasoning in large networks, including classical deep nets and other bio-inspired network models!

English

2.5K

Bayan Bruss@cbbruss·18 Ağu

@KezhiKong Awesome. Congrats Kezhi

Indonesia

Kezhi Kong@KezhiKong·18 Ağu

Check out the Nemotron Nano v2 we just released, a SOTA 9b hybrid model with better accuracies and great inference speedup. We also released most of the data! See our tech reports for the details. Congrats to the team!

Bryan Catanzaro@ctnzr

Today we're releasing NVIDIA Nemotron Nano v2 - a 9B hybrid SSM that is 6X faster than similarly sized models, while also being more accurate. Along with this model, we are also releasing most of the data we used to create it, including the pretraining corpus. Links to the models, datasets, and tech report are here: research.nvidia.com/labs/adlr/NVID…

English

1.1K

Bayan Bruss@cbbruss·15 Ağu

@natolambert @interconnectsai I hope the title of this article is a Dr Spaceman reference

English

Nathan Lambert@natolambert·15 Ağu

@interconnectsai relevant to this is that I really think AGI isn't a useful concept looking forward, it'll be better to look back at different ability levels interconnects.ai/p/agi-is-what-…

English

6.4K

Nathan Lambert@natolambert·15 Ağu

I disagree with @dwarkesh_sp on continual learning being a major bottleneck for the current path of AI. It may be a bottleneck on making a "more efficient and human like AI" but language models are well on track to adapt quickly and precisely to personal work.

English

230

66.1K

Bayan Bruss@cbbruss·15 Ağu

@AlexGDimakis I would take it a step further. Humans are really good at learning from non verbal social cues. A look that implies disappointment, excitement, frustration can be a profound reward signal in many situations.

English

280

Alex Dimakis@AlexGDimakis·15 Ağu

Imagine you're trying to teach a human how to do a task, say install Windows XP in a virtual machine. The human walks into a room and sees a document (prompt) that you have written, that describes exactly what they are supposed to do. There is also a computer ready for their keyboard inputs. Then, they try for a while, and suppose they fail. Then, you write some detailed notes and new additional instructions in the prompt document, based on how they failed trying to teach them how to do the task. But then, A NEW PERSON walks in and tries to solve the task. Every day, it's a fresh new employee and you can only update the document. This Memento-like experience is how most companies are currently trying to train agents, by writing prompts (Multi-agent systems usually means multiple prompts) and is not going to lead to AGI. But no wait, you should say, there is Reinforcement Learning post-training, this is the path to AGI, nicely summarized in the recent paper "The Era of Experience" by Silver and Sutton. With RL, the agent will try to solve the task and we will update their brain weights to increase the probability of good rollouts (successful attempts to install Windows) and decrease the probability of failed rollouts. This is good, but imagine trying to teach a human how to install Windows with this tormenting process: leaving them alone in a room, letting them bang their head into a wall and only when they manage to install Windows, update their brain weights a tiny bit. If they keep messing up, there will be no reward, no gradient and no learning. That is why agents must be post-trained with SFT first (basically observe other people solving the task end-to-end) and only after they can solve task 1 out of 10 times, RL can be deployed. You cannot talk to them during the process, you cannot give them verbal feedback on what error they made, they cannot think why they failed etc. One good signal that this process is sub-optimal is that RL trained agents keep getting GRPO gradients after re-solving the same math problem thousands of times, which shows that GRPO is extremely sample inefficient, compared to how humans learn. Dwarkesh has a recent video "Why I don't think AGI is right around the corner". He says that this lack of continual learning is showing that we lack a fundamental idea to reach human-level performance. He describes an example similar to mine but with teaching a kid to play the saxophone, while I set an arguably lower bar(?) of installing Windows. Dwarkesh says "The reason humans are so useful is not mainly their raw intellect, its their ability to build up context, interrogate their own failures and to pick up small improvements as they practice a task." I would summarize the challenge as follows: Right now, there is no established RL algorithm to give a model verbal feedback and have it update it's weights. There is also no established algorithm for a model to reflect on a previous failed execution and update it's own weights. There are several methods (e.g. TextGrad, Reflexion, ALHF from Databricks and the now well-known GEPA) that update PROMPTS or create 'lessons learned' or memory summaries which are inserted into the prompt context, but this is still like a fresh new student walks in every day and things that cannot be explicitly verbalized cannot be learned. On updating weights from feedback and reflection, there is significant on-going research e.g. I recently saw "Natural Language Reinforcement Learning (NLRL)" by Feng et al. and I'd love to get more pointers. I think that indeed we are entering the era of experience and we will need to find new algorithmic ideas before agents can learn as efficiently as humans.

English

190

26K

Bayan Bruss retweetledi

Epoch AI@EpochAIResearch·8 Ağu

The past 5 years have seen big successes in language, image and video generation, but relatively limited success in robotic manipulation. Why don’t we have laundry robots in every house? One thing seems clear: training compute is not the blocker. 🧵

English

239

25K

Bayan Bruss@cbbruss·7 Ağu

If you share this belief, I’ve got some cool work for you to do

will brown@willccbb

i'm increasingly convinced that "transformative ai" is going to look like an abundance of specialized models for everything from drug design to weather sims to robotics to supply chains, not one agent to rule them all. we're going to need a lot more ai researchers

English

269

Bayan Bruss@cbbruss·31 Tem

@jxmnop We’re largely taking an interpreter approach to prompt optimizations. I wonder what a compiler approach looks like.

English

330

dr. jack morris@jxmnop·31 Tem

probably 10x more people should be working on prompt optimization systems (we need a vLLM for promptopt), theory, new techniques, benchmarks. the whole kit and caboodle

English

296

32.5K

Bayan Bruss@cbbruss·30 Tem

@scaling01 Three years is the perfect prediction horizon for anything you want. It’s just close enough that people feel like it’s going to happen soon and just far enough that if the deadline passes no one will remember you were wrong.

English

Lisan al Gaib@scaling01·29 Tem

3 years are now considered slow timelines > "if you are talking about human level AI, [...] we are very very far from it" > "we are not getting there next year or in 2 years"

Haider.@slow_developer

Francois Chollet says human-level AI is still far off true intelligence means learning new skills fast, but LLMs are very poor at this we're only taking baby steps in adapting to new situations in real time "LLMs might help, but they won't be the core of real intelligence"

English

4.9K

Bayan Bruss@cbbruss·25 Tem

@DimitrisPapail Maybe we’ll invent gyms to simulate the natural use of our brains

English

211

Dimitris Papailiopoulos@DimitrisPapail·24 Tem

Is LLM use finally making me less capable? I started using LLMs three years ago for text and code gen. Now, I use several of them, for a ton more things. In fact, I feel like I use them for a huge fraction of the cognitive tasks that I perform that can be described in text. That is... an embarrassing large number of congitive tasks. I've come to realize that all the cognitive offloading is making me feel less capable, not more. It used to be the case that early on, I felt like ChatGPT and Claude gave me super powers. Now I sometimes feel like a slob, as my kneejerk reaction is to go and ask ChatGPT for help. It's always there, more than it used to be, and it's much better than it used to be, it will get much better than it currently is. It's hard to measure tangibly how this cognitive performance differential, in terms of offloading, is manifested, but it feels like it's true to the point I'm convinced that there exists a law of "mental capacity" vs "offloading" that looks like this. I'm sure someone is studying it already :)

English

376

50.6K

Bayan Bruss@cbbruss·24 Tem

@ziv_ravid Denoise

Português

751

Ravid Shwartz Ziv@ziv_ravid·23 Tem

We want to start a podcast about cutting-edge AI research and technical breakthroughs. Need a catchy name! What would you call it? The one who suggest the best name will be our guest 🥳

English

119

236

31.5K

Bayan Bruss@cbbruss·23 Tem

We have a mantra on the team “your data is trying to kill you”

Taco Cohen@TacoCohen

What I look for when hiring? EXTREME PARANOIA about code and data

English

190

Bayan Bruss@cbbruss·22 Tem

@abeirami @DimitrisPapail < 3 months for the greatest test taking technology invented to solve p6

English

Ahmad Beirami@abeirami·22 Tem

@DimitrisPapail Cool. Thanks for sharing.

English

267

Dimitris Papailiopoulos@DimitrisPapail·22 Tem

If this pans out, it implies that IMO 25 was already within reach by current gen frontier models (i.e., gemini 2.5 pro). Perhaps no further algorithmic breakthrough is needed for IMO after all?

Lin Yang@lyang36

🚨 Olympiad math + AI: We ran Google’s Gemini 2.5 Pro on the fresh IMO 2025 problems. With careful prompting and pipeline design, it solved 5 out of 6 — remarkable for tasks demanding deep insight and creativity. The model could win gold! 🥇 #AI #Math #LLMs #IMO2025

English

14.7K

Bayan Bruss retweetledi

Mark Ibrahim@marksibrahim·21 Tem

Open-weights for our Llip multimodal vision-language model led by @lavoiems are public! LLIP proposes new pre-training objective to capture the many ways to describe an image leading to strong performance across a suite of 22-zero shot benchmarks. x.com/lavoiems/statu…

Samuel Lavoie@lavoiems

The code and model weights for this paper are finally open! Despite being a little late for releasing them, I hope you will find them useful! Code: github.com/facebookresear… Models: - (ViT-G): huggingface.co/lavoies/llip-v… - (ViT-B): huggingface.co/lavoies/llip-v…

English

754

Keşfet

@neeljain1717 @k_saifullaah @taruschirag @vatsalbaherwani @PandaAshwinee @tomgoldsteincs @chrmanning @scaling01