Etienne Perot

6 posts

Etienne Perot

Etienne Perot

@JeanDorsMoisson

Katılım Eylül 2025
18 Takip Edilen1 Takipçiler
Etienne Perot
Etienne Perot@JeanDorsMoisson·
@GaryMarcus @RichardSSutton @HamidMaei Actually Sutton thinks it is debatable that LLM follow the bitter lesson exactly for this reason. Learning from human text is a form of Human knowledge/bias.
English
1
0
1
430
Gary Marcus, MIT PhD and NYU Professor Emeritus
i wonder whether this needs an update. current methods, such as they are, leverage massive amounts of human knowledge as their primary fuel. they would be lost without it. and they even build some knowledge into their system prompts. and lately they build knowledge into their harnesses, usually by over 50 tools that have been carefully crafted with human knowledge.
English
11
2
73
14.6K
Richard Sutton
Richard Sutton@RichardSSutton·
The bitter lesson in 26 words: Don’t be distracted by human knowledge, as AI has been historically. Instead focus on methods for creating knowledge that scale with computation, like search and learning.
English
136
973
7.4K
566.9K
Vladimir Tchuiev
Vladimir Tchuiev@VTchuiev·
@bycloudai I'm not sure whether pointing and drawing boxes provides the most accurate referencing as well. Reasoning in 3D might be more effective, given either a good 2D to 3D conversion, or even straight up a 3D model.
English
1
0
1
700
bycloud
bycloud@bycloudai·
"Thinking With Visual Primitives" was taken down without reasons after reading the paper, my take on why they did that might be because the current version shows that visual primitives can make reasoning much more efficient, but it doesn’t fully answer the big picture that is How much visual detail can you compress away before better referencing stops being enough? basically like a trade-off between perception and reference gap They did something similar with engrams (vs MoE), so maybe they wanted to supplement some more ablation results? which i hope is the case cuz i would love to see the comparison
bycloud tweet media
English
6
17
213
18.7K
Etienne Perot
Etienne Perot@JeanDorsMoisson·
@int_16h @jxmnop Interesting! However if this has a limit, like say 10% of your signal is important. I saw so many papers around this idea of adaptive patching or token pruning in vision, i guess it is somewhat true that the transformer does not use everything (it depends on the task i think)
English
1
0
1
80
INT 16H
INT 16H@int_16h·
Yes, exactly. It uses all tokens available. I observed that quite well with vision models. Even if only small portion of the input image is actually important it does not mean you can reduce number of tokens by throwing away tokens from non important regions. That’s because they were non-important only exactly before first attention layer. After that, the meaning completely shifts and they participate in computation. Thus reducing number of tokens, reduces amount of computation and makes it dumber.
English
1
0
1
17
Etienne Perot
Etienne Perot@JeanDorsMoisson·
@int_16h @jxmnop Kind of true. In fact in theory the transformer can shift meaning of tokens to introduce new thoughts participating in reasoning (sort or scratchpad with a eraser if you will)
English
1
0
1
53
INT 16H
INT 16H@int_16h·
@jxmnop IMHO it is quadratic attention that makes transformers so good. It is actually doing the useful work, it is not wasting compute as these papers try to convince us. You make it subquadratic, you do less compute, less compute translates to worse performance.
English
1
0
2
374
Samip
Samip@industriaalist·
1/ NanoGPT Slowrun update: we've hit ~7x data efficiency, up from 2.4x a week ago!! We've also started speedrunning the slowrun (@ChinmayKak). Key changes for 7x: U-Net skip connections between mirrored transformer layers (GH: em-see-squared) Per-head attention gating, taken from modded-nanogpt (@akshayvegesna) Training ensemble models 1.5x longer: individual models get slightly worse but the ensemble improves (@akshayvegesna)
GIF
English
4
20
167
9.5K
Etienne Perot
Etienne Perot@JeanDorsMoisson·
@socialwithaayan Isnt this "how fast can it learn something new" exactly the goal of arc-agi and point of François Chollet?
English
0
0
0
153
Muhammad Ayan
Muhammad Ayan@socialwithaayan·
🚨BREAKING: Yann LeCun just dropped a paper that should make every AI lab rethink its roadmap. One brutal conclusion: chasing AGI is the wrong goal. Here’s why: → Humans aren’t general we’re survival specialists. → Walking and seeing feel “general” only because they keep us alive. → Outside that zone, we’re terrible. Chess computers proved it decades ago. → Most AGI definitions today either can’t be measured or assume human = general. We built the benchmark around the wrong species. The team proposes a new target: Superhuman Adaptable Intelligence (SAI). Not “can it do what humans do,” but: how fast can it learn something new? The approach: specialized expert systems with internal world models + self-supervised learning built to master the massive task space that humans biologically can’t reach. One giant model mimicking human limits isn’t the ceiling. It’s the trap.
Muhammad Ayan tweet mediaMuhammad Ayan tweet mediaMuhammad Ayan tweet media
English
99
379
2.1K
203.7K