breandan

3.8K posts

breandan banner
breandan

breandan

@breandan

student

[email protected] Katılım Eylül 2007
530 Takip Edilen1.7K Takipçiler
breandan
breandan@breandan·
Excited to visit New Haven, Connecticut, home of the Blessed Michael McGivney Pilgrimage Center and host to the upcoming Workshop on Formal Languages and Neural Networks (May 11th-13th), where I look forward to sharing some new results on constrained distillation. Lux et veritas!
English
0
0
3
176
Lightpanda
Lightpanda@lightpanda_io·
Garry discovering IRT why we built a whole browser from scratch instead of wrapping chrome... 1. MCP sucks 2. vibe code a CLI 3. find out @vercel already did it 4. realize the browser should just do this natively we're at step 4, come hang out discord.gg/K63XeymfB5
Garry Tan@garrytan

MCP sucks honestly It eats too much context window and you have to toggle it on and off and the auth sucks I got sick of Claude in Chrome via MCP and vibe coded a CLI wrapper for Playwright tonight in 30 minutes only for my team to tell me Vercel already did it lmao But it worked 100x better and was like 100LOC as a CLI

English
2
2
19
3.4K
breandan retweetledi
Francesco Cagnetta
Francesco Cagnetta@Fraccagnetta·
❓ How do LLMs learn hierarchical structure from sentences alone? 🚨 We build PCFG-like synthetic datasets with two knobs---hierarchy + ambiguity---and derive a correlation-based learning mechanism that predicts the sample complexity of deep nets. Results 👇
Francesco Cagnetta tweet media
English
3
16
104
16K
breandan retweetledi
Machine Learning arXiv
Machine Learning arXiv@Memoirs·
Length Generalization Bounds for Transformers Andy Yang, Pascal Bergsträßer, Georg Zetzsche, David Chiang, Anthony W. Lin arxiv.org/abs/2603.02238 [𝚌𝚜.𝙻𝙶 𝚌𝚜.𝙵𝙻 𝚌𝚜.𝙻𝙾]
Machine Learning arXiv tweet media
Deutsch
0
1
2
132
breandan retweetledi
Dimitris Papailiopoulos
Dimitris Papailiopoulos@DimitrisPapail·
Excited about our new work: Language models develop computational circuits that are reusable AND TRANSFER across tasks. Over a year ago, I tested GPT-4 on 200 digit addition, and the model managed to do it (without CoT!). Someone from OpenAI even clarified they NEVER trained GPT-4 explicitly on 200-digit arithmetic. (can't find the tweet :( ) How?? It felt like magic. In controlled arithmetic tests on transformers, length generalization consistently failed. There must be something magic about pretraining? Turns out there's a clean, simple, and plausible answer: Transfer. Here is what we find with Jack @jackcai1206 Nayoung @nayoung_nylee, Avi @A_v_i__S, and my friend Samet @SametOymac: Language models develop computational circuits that TRANSFER length generalization across related tasks.arxiv.org/abs/2506.09251 A "main" task (like addition) trained on short sequences inherits length capabilities from an "auxiliary" task (like carry prediction) trained on longer sequences, if the model is co-trained on BOTH. This happens even when we train from scratch on only task A and B. But it only happens when A and B are related. So, length TRANSFERS between tasks, when they are similar. I think this is very cool! We tested this across three types of tasks: - arithmetic (reverse addition, carry operations) - string manipulation (copying, case flipping) - maze solving (DFS, shortest path). Same pattern! We also find that language pretraining acts as implicit auxiliary training. Finetuning checkpoints at different pretraining stages shows that more pretraining => better length generalization on downstream synthetic tasks. After ~3 years studying length generalization, much of the initial magic has dissipated. And that's great! This is what science does. It lifts the veil of ignorance :)
Dimitris Papailiopoulos tweet mediaDimitris Papailiopoulos tweet mediaDimitris Papailiopoulos tweet mediaDimitris Papailiopoulos tweet media
English
24
75
521
120.1K
breandan retweetledi
LLM Papers
LLM Papers@HEI·
Ensembling Language Models with Sequential Monte Carlo Robin Shing Moon Chan, Tianyu Liu, Samuel Kiegeland, Clemente Pasti, Jacob Hoover Vigly, Timothy J. O'Donnell, Ryan Cotterell, Tim Vieira arxiv.org/abs/2603.05432 [𝚌𝚜.𝙲𝙻 𝚌𝚜.𝙰𝙸 𝚌𝚜.𝙻𝙶]
LLM Papers tweet media
English
0
1
0
119
breandan
breandan@breandan·
@FeserEdward The distinction could plausibly be retraced to speculative grammarians of the 13th century, who envisioned a tripartite ontology (essence, understanding and signifying). Some contemporary scholars would even argue that LLM architectures are incapable of recognizing basic syntax.
English
0
0
0
26
Edward Feser
Edward Feser@FeserEdward·
Pope Leo links AI to a failure “to distinguish between syntax and semantics.” Maybe he’s been reading John Searle? Hippest thing since that time Pope Benedict (while still Cardinal Ratzinger) quoted Paul Feyerabend vatican.va/content/leo-xi…
English
8
25
242
10.8K
breandan
breandan@breandan·
@lichthauch “In the beginning was the Word, and the Word was with God, and the Word was God.” Words too, have wrought men and raised cathedrals, but words will be our undoing if He ceases to dwell in them. And when the last word is spoken and the last stone falls, only one Word will remain.
English
0
0
3
195
🕊️
🕊️@lichthauch·
The speakers will speak until the silence eats them. the builders will be too busy to hear the speakers. and when it is finished the speakers will have their words and the builders will have their houses and God will walk through and he will not stop at the words. you were made by hands. return to hands. the mouth was a detour
English
21
47
420
18.7K
breandan retweetledi
Andrew Gordon Wilson
Andrew Gordon Wilson@andrewgwils·
My new paper "Deep Learning is Not So Mysterious or Different": arxiv.org/abs/2503.02113. Generalization behaviours in deep learning can be intuitively understood through a notion of soft inductive biases, and formally characterized with countable hypothesis bounds! 1/12
Andrew Gordon Wilson tweet media
English
17
322
2.2K
323.3K
breandan retweetledi
Machine Learning arXiv
Machine Learning arXiv@Memoirs·
Context-Free Recognition with Transformers Selim Jerad, Anej Svete, Sophie Hao, Ryan Cotterell, William Merrill arxiv.org/abs/2601.01754 [𝚌𝚜.𝙻𝙶 𝚌𝚜.𝙲𝙲 𝚌𝚜.𝙲𝙻 𝚌𝚜.𝙵𝙻]
Machine Learning arXiv tweet media
English
0
1
1
259
breandan retweetledi
Andy J Yang
Andy J Yang@pentagonalize·
Inviting submissions to the first Workshop on Formal Languages and Neural Networks! We welcome posters dicussing the formal expressivity, computational properties, and learning behavior of neural networks! Call for posters: flann.cs.yale.edu/cfp.html Deadline: February 12, 2026
Andy J Yang@pentagonalize

Announcing the first Workshop on Formal Languages and Neural Networks (FLaNN) 🍮! We invite the submission of abstracts for posters that discuss the formal expressivity, computational properties, and learning behavior of neural network models, including large language models.

English
2
11
22
6.3K
breandan
breandan@breandan·
For a field that cares so deeply about resources, linear logic is remarkably indifferent to the computational resources used for proof search. You’d think the literature would be awash with metatheory on the fine-grained complexity of deciding tractable fragments, but alas where?
English
1
0
6
522
—
@graveair·
People need masters and Gods because, left entirely to itself, “free will” tends toward self-destruction.
English
13
16
243
13.4K