Shmuel Berman

52 posts

Shmuel Berman

@ShmuelBerman

PVL Lab @ Princeton | Memory and Perception | Anthropic Fellow | https://t.co/jdfRoBjvfJ

New York, USA Katılım Şubat 2020

151 Takip Edilen70 Takipçiler

Sabitlenmiş Tweet

Shmuel Berman@ShmuelBerman·8 Ağu

Can Visual Language Models (VLMs) do non-local visual reasoning, i.e., piecing together scattered visual evidence? Humans do this to search images, compare objects, and trace lines. Despite recent advances, our new evaluation suggests most VLMs cannot do these consistently. 1/6

English

1.5K

Shmuel Berman retweetledi

Princeton Vision & Learning Lab@PrincetonVL·3d

Stereo depth is highly useful for robots. Meet WAFT-Stereo: #1 on ETH3D (BP-0.5), Middlebury (RMSE), and KITTI (all metrics); 61% less zero-shot ETH3D BP-0.5 error; 1.8-6.7x faster than prior SOTA. Key idea: classify disparity into bins, then iterative high-res warping.🧵1/2

English

116

7.6K

Shmuel Berman@ShmuelBerman·19 Mar

10x is crazy.. glad to have helped on this :)

Samip@industriaalist

Announcing 10x data efficiency on NanoGPT Slowrun! There are two macro trends worth highlighting: - pretraining is nowhere close to done, and - 100x looks feasible. Writeup on all the core ideas: qlabs.sh/10x

English

827

Shmuel Berman@ShmuelBerman·19 Mar

so I wrote something: open.substack.com/pub/thebestwor…

English

Shmuel Berman@ShmuelBerman·19 Mar

I feel like we needed more command line horror in the world... [1/2]

English

Shmuel Berman@ShmuelBerman·18 Mar

@PeterHndrsn big agree! But I am not optimistic that the right policies are in the right people's heads

English

270

Peter Henderson@PeterHndrsn·17 Mar

I feel this urgency too. But this is all so utterly avoidable with good policymaking. No one should be left behind because they didn't accumulate capital in 2026. There are so many people who aren't plugged into these conversations or are simply not in a position to do anything about it. Single mothers and fathers working three jobs to make ends meet cannot possibly work harder to accumulate capital. They already work hard enough as it is. People in this position should not be "left behind." There should be no "permanent underclass,” as many are worried about. Even if you're somewhat better off. People also shouldn't have to work themselves to the detriment of their health and families to shield against future labor impacts. They should be able to trust that their government will think ahead and make good policy.

English

335

26K

Shmuel Berman@ShmuelBerman·14 Mar

@PeterHndrsn Though maybe it's a bad thing to hide the environmental cost from the user..

English

Shmuel Berman@ShmuelBerman·14 Mar

@PeterHndrsn Totally agree re: robotics. Privacy also makes sense, though I think the past twenty years have shown most consumers don't really care (although maybe that will change!) But power will almost always be cheaper (and cleaner!) in centralized locations

English

Peter Henderson@PeterHndrsn·13 Mar

I've been thinking that for most consumer use cases there will basically be no reason to run on servers in a few years, with battery life being the main bottleneck. Cool effort to incentivize that direction!

Jon Saad-Falcon@JonSaadFalcon

Personal AI should run on your personal devices. So, we built OpenJarvis: a personal AI that lives, learns, and works on-device. Try it today and top the OpenJarvis Leaderboard for a chance to win a Mac Mini! Collab w/ @Avanika15, John Hennessy, @HazyResearch, and @Azaliamirh. Details in thread.

English

1.2K

Shmuel Berman retweetledi

Samip@industriaalist·13 Mar

1/ NanoGPT Slowrun update: we've hit 8.9x data efficiency, up from 7x last week! Some really cool changes behind this one. - Ensemble scaling: we train each model in the ensemble with a distillation objective (chain distillation), and scaled to more models (@bishmdl76, @akshayvegesna) - Looping: replaying transformer layers in later stages of training (@ShmuelBerman, @akshayvegesna) - Exclusive Self-Attention (XSA): new attention mechanism from @zhaisf (added by @bishmdl76)

GIF

English

7.3K

Shmuel Berman@ShmuelBerman·2 Mar

@HumansNoContext we need this guy tele-opping robots for training data

English

NO CONTEXT HUMANS@HumansNoContext·1 Mar

He knew exactly what he wanted

English

3.7K

98.8K

15.7M

Shmuel Berman@ShmuelBerman·26 Şub

@phokarlsson @EAjdler

QAM

Henrik Karlsson@phokarlsson·25 Şub

My wife and I do rock paper scissor to decide stuff a lot, and let me tell you, that game has surprising depth when pushed by two nerds who are determined to win. I have long randomized sequences memorized to throw her off, and we know the conditional probabilities of (naive) follow up moves, and psych each other to push the other to become more predictable and naive. We often go 5, 6 rounds of both mirroring the other before someone outsmarts the other. Everything has more layers than you’d naively assume.

English

5.4K

292K

Shmuel Berman@ShmuelBerman·23 Şub

Giving good critique is an art, and it's about to get a whole lot harder. But a different viewpooint beats a sycophant. Read my thoughts on giving impactful feedback here: open.substack.com/pub/thebestwor…

English

Shmuel Berman@ShmuelBerman·23 Şub

ChatGPT reviews our writing; teachers use LLMs to grade. Feedback is being offloaded from humans. Research shows that LLMs are more likely to affirm our beliefs than challenge them. As we see less critique we dislike, I worry we’ll grow less receptive to valid criticism.

English

Shmuel Berman@ShmuelBerman·11 Şub

@AlexanderSpangh @kevinroose Additionally, we can leverage publically available information about open-source models, such as the datasets it was trained on, the RL/instruction tuning, etc. Frontier models are black boxes- only rarely can we make strong claims about the reasons behind their performance.

English

286

Alex Spangher @ Neurips2025@AlexanderSpangh·11 Şub

I'm a former colleague of yours @kevinroose at the NYTimes, now I'm an academic. We chatted a few times back in 2017-2018, I'm not sure if you remember me. I constantly experience, as I'm sure you do too, academics implying journalists don't know what they're doing — and journalists do the same. This tweet is an example of the latter. I'm not really sure what the point of it is, besides to diminish academics. 1. Some very trustworthy academics (e.g. @chrmanning) in the field have pointed out that actually, in this case, you're wrong. An earlier version of this paper was out back when these models were still SOTA. 2. That being said, even if the authors didn't publish earlier, I dispute that we can't draw ANY insights about current models from past models. While, yes, these models have improved drastically, many of the theoretical fundamentals are the same or, at least, VERY similar. Implying that all work older than ChatGPT's latest release is irrelevant discards a ton of intellectually valuable contributions and is kind of damaging to our collective ability to understand our world and propagate knowledge. We don't sit around criticizing your existential Bing Chatbot experience from, like, 2024, which I have seen you continue to reference (although more than a few eyebrows were raised, for sure). It still has value. Indeed, maybe Bing Chat is no longer around, but current chat bots still dupe people, lead people into rabbit holes, and worse, literally every day. It's strange that we're all basically trying to do the same thing, but are getting so turf-y about it.

English

6.9K

Kevin Roose@kevinroose·9 Şub

i am begging academics to study AI capabilities using frontier models. the models used in this study (which is going to be cited for years as proof that "AI is bad at health advice") are GPT-4o, Llama 3, and Command R+, two obsolete models and one i've never heard of.

English

110

111

1.6K

329.6K

Shmuel Berman@ShmuelBerman·2 Şub

First day at Anthropic as a research fellow! Very excited. Please reach out if you want to talk about memory, perception, or safety!

English

271

Shmuel Berman@ShmuelBerman·5 Ara

@WilliamBryk Honored to have authored the “best vlm paper” lol

English

110

Will Bryk@WilliamBryk·5 Ara

We embedded all 5000+ NeurIPS papers! exa.ai/neurips Cool queries: - "new retrieval techniques" - "the paper that elon would love most" - "intersection of coding agents and biology, poster session 5" It uses our in-house model trained for precise semantic retrieval 😌

English

725

177.9K

Shmuel Berman@ShmuelBerman·4 Ara

I was very excited to present my work yesterday at #NeurIPS2025! Thank you to everyone who came to my poster. If you are interested in chatting about perception, memory, or long video, please breach out :)

English

150

Shmuel Berman@ShmuelBerman·26 Kas

@QuanquanGu all AI research is just scaling

English

353

Quanquan Gu@QuanquanGu·26 Kas

Scaling isn’t research?🤣 Scaling is actually some of the most exciting research nowadays.

Yuchen Jin@Yuchenj_UW

“From 2012 to 2020, it was the age of research. From 2020 to 2025, it was the age of scaling. Now, it's back to the age of research again.” I agree.

English

142

33.5K

Shmuel Berman@ShmuelBerman·19 Kas

@sarahcat21 Would love to talk about episodic and streaming memory :)

English

284

Sarah Catanzaro@sarahcat21·19 Kas

I'll be among dozens (hundreds?) of VCs attending NeurIPS this year, but among the few who might be more interested in topics like managing episodic memory with RL, avoiding model collapse when training with synthetic data, and more effectively using base models to guide exploration, than who is leading your seed round at $1B post. So ping me if you want to chat :)

English

156

23.4K

Shmuel Berman@ShmuelBerman·27 Eki

I call this ability "mentalization." I test it and motivate it in this blog post: open.substack.com/pub/thebestwor…

English

Shmuel Berman@ShmuelBerman·27 Eki

One advantage that LLMs have is that they can be re-instantiated at will. Furthermore, their output distribution is well-defined (even if it is intractable). If an LLM knew what it would do in a given situation, it could work together with itself without synchronization.

English

Shmuel Berman@ShmuelBerman·27 Eki

When many humans work on a project, a large component of the cost is coordination. Even inconsequential decisions (e.g what color to paint a house) can be wrong if different people make different choices. As LLMs are applied to larger projects, can they avoid this issue?

English

Keşfet

@PeterHndrsn @bishmdl76 @akshayvegesna @zhaisf @HumansNoContext @phokarlsson @EAjdler @AlexanderSpangh