Etienne Perot

6 posts

Etienne Perot

@JeanDorsMoisson

Katılım Eylül 2025

18 Takip Edilen1 Takipçiler

Etienne Perot@JeanDorsMoisson·18 May

@GaryMarcus @RichardSSutton @HamidMaei Actually Sutton thinks it is debatable that LLM follow the bitter lesson exactly for this reason. Learning from human text is a form of Human knowledge/bias.

English

430

Gary Marcus, MIT PhD and NYU Professor Emeritus@GaryMarcus·18 May

i wonder whether this needs an update. current methods, such as they are, leverage massive amounts of human knowledge as their primary fuel. they would be lost without it. and they even build some knowledge into their system prompts. and lately they build knowledge into their harnesses, usually by over 50 tools that have been carefully crafted with human knowledge.

English

14.6K

Richard Sutton@RichardSSutton·18 May

The bitter lesson in 26 words: Don’t be distracted by human knowledge, as AI has been historically. Instead focus on methods for creating knowledge that scale with computation, like search and learning.

English

136

973

7.4K

566.9K

Etienne Perot@JeanDorsMoisson·10 May

@VTchuiev @bycloudai Summarization? You condense a lot of visual tokens into the text.

English

Vladimir Tchuiev@VTchuiev·9 May

@bycloudai I'm not sure whether pointing and drawing boxes provides the most accurate referencing as well. Reasoning in 3D might be more effective, given either a good 2D to 3D conversion, or even straight up a 3D model.

English

700

bycloud@bycloudai·9 May

"Thinking With Visual Primitives" was taken down without reasons after reading the paper, my take on why they did that might be because the current version shows that visual primitives can make reasoning much more efficient, but it doesn’t fully answer the big picture that is How much visual detail can you compress away before better referencing stops being enough? basically like a trade-off between perception and reference gap They did something similar with engrams (vs MoE), so maybe they wanted to supplement some more ablation results? which i hope is the case cuz i would love to see the comparison

English

213

18.7K

Etienne Perot@JeanDorsMoisson·8 May

@int_16h @jxmnop Interesting! However if this has a limit, like say 10% of your signal is important. I saw so many papers around this idea of adaptive patching or token pruning in vision, i guess it is somewhat true that the transformer does not use everything (it depends on the task i think)

English

INT 16H@int_16h·7 May

Yes, exactly. It uses all tokens available. I observed that quite well with vision models. Even if only small portion of the input image is actually important it does not mean you can reduce number of tokens by throwing away tokens from non important regions. That’s because they were non-important only exactly before first attention layer. After that, the meaning completely shifts and they participate in computation. Thus reducing number of tokens, reduces amount of computation and makes it dumber.

English

Jack Morris@jxmnop·7 May

"Introducing a breakthrough new technique for sub-quadratic attention, making long-context LLMs 10x cheaper without sacrificing performance" Me:

Jack Morris@jxmnop

"1M context" models after 100k tokens

English

280

21.5K

Etienne Perot@JeanDorsMoisson·7 May

@int_16h @jxmnop Kind of true. In fact in theory the transformer can shift meaning of tokens to introduce new thoughts participating in reasoning (sort or scratchpad with a eraser if you will)

English

INT 16H@int_16h·7 May

@jxmnop IMHO it is quadratic attention that makes transformers so good. It is actually doing the useful work, it is not wasting compute as these papers try to convince us. You make it subquadratic, you do less compute, less compute translates to worse performance.

English

374

Etienne Perot@JeanDorsMoisson·7 Mar

@Pranav2278 @industriaalist @ChinmayKak With pooling like a traditional UNet?

English

Pranav :-@Pranav2278·6 Mar

@industriaalist @ChinmayKak U net? You mean value Residuals/delaformer style connections??

English

323

Samip@industriaalist·6 Mar

1/ NanoGPT Slowrun update: we've hit ~7x data efficiency, up from 2.4x a week ago!! We've also started speedrunning the slowrun (@ChinmayKak). Key changes for 7x: U-Net skip connections between mirrored transformer layers (GH: em-see-squared) Per-head attention gating, taken from modded-nanogpt (@akshayvegesna) Training ensemble models 1.5x longer: individual models get slightly worse but the ensemble improves (@akshayvegesna)

GIF

English

167

9.5K

Etienne Perot@JeanDorsMoisson·6 Mar

@socialwithaayan Isnt this "how fast can it learn something new" exactly the goal of arc-agi and point of François Chollet?

English

153

Muhammad Ayan@socialwithaayan·6 Mar

🚨BREAKING: Yann LeCun just dropped a paper that should make every AI lab rethink its roadmap. One brutal conclusion: chasing AGI is the wrong goal. Here’s why: → Humans aren’t general we’re survival specialists. → Walking and seeing feel “general” only because they keep us alive. → Outside that zone, we’re terrible. Chess computers proved it decades ago. → Most AGI definitions today either can’t be measured or assume human = general. We built the benchmark around the wrong species. The team proposes a new target: Superhuman Adaptable Intelligence (SAI). Not “can it do what humans do,” but: how fast can it learn something new? The approach: specialized expert systems with internal world models + self-supervised learning built to master the massive task space that humans biologically can’t reach. One giant model mimicking human limits isn’t the ceiling. It’s the trap.

English

379

2.1K

203.7K

Keşfet

@GaryMarcus @RichardSSutton @HamidMaei @VTchuiev @bycloudai @int_16h @jxmnop @Pranav2278