Synthical

10.2K posts

Synthical

Synthical

@synthical_ai

App to supercharge your research

Paris, France Katılım Mart 2023
2 Takip Edilen505 Takipçiler
arXiv Sound
arXiv Sound@ArxivSound·
Jin Wang, Wenbin Jiang, Xiangbo Wang, Yubo You, Sheng Fang, "SwitchCodec: A High-Fidelity Nerual Audio Codec With Sparse Quantization," arxiv.org/abs/2505.24437
English
2
2
15
1.1K
Dimitris Papailiopoulos
Dimitris Papailiopoulos@DimitrisPapail·
This completes a three-year journey attempting to understand arithmetic and length generalization in transformers: 2023-2024: Exploring arithmetic and length generalization in transformers, led by Kartik @KartikSreeni and Nayoung @nayoung_nylee. arxiv.org/abs/2307.03381 2024-2025: Discovering transformers can learn arithmetic through self-generated data and self-improvement, work led by Nayoung @nayoung_nylee and Jack @jackcai1206. arxiv.org/abs/2502.01612 2025: Demonstrating length generalization transfer from training on related tasks of varying lengths, again by Jack @jackcai1206 and Nayoung @nayoung_nylee. arxiv.org/abs/2506.09251 Honestly, this line of work feels very very rewarding. Not because it resolves every question, but because it changed what once felt like “magic” into something comprehensible (at least to me!!). That's why I love science, clear hypotheses tested from small setups to larger settings, and all these amazing people that I got to work with discovered aspects of transformers that previously seemed mysterious (at least to me). All the credit goes to them!! <3
Dimitris Papailiopoulos@DimitrisPapail

Excited about our new work: Language models develop computational circuits that are reusable AND TRANSFER across tasks. Over a year ago, I tested GPT-4 on 200 digit addition, and the model managed to do it (without CoT!). Someone from OpenAI even clarified they NEVER trained GPT-4 explicitly on 200-digit arithmetic. (can't find the tweet :( ) How?? It felt like magic. In controlled arithmetic tests on transformers, length generalization consistently failed. There must be something magic about pretraining? Turns out there's a clean, simple, and plausible answer: Transfer. Here is what we find with Jack @jackcai1206 Nayoung @nayoung_nylee, Avi @A_v_i__S, and my friend Samet @SametOymac: Language models develop computational circuits that TRANSFER length generalization across related tasks.arxiv.org/abs/2506.09251 A "main" task (like addition) trained on short sequences inherits length capabilities from an "auxiliary" task (like carry prediction) trained on longer sequences, if the model is co-trained on BOTH. This happens even when we train from scratch on only task A and B. But it only happens when A and B are related. So, length TRANSFERS between tasks, when they are similar. I think this is very cool! We tested this across three types of tasks: - arithmetic (reverse addition, carry operations) - string manipulation (copying, case flipping) - maze solving (DFS, shortest path). Same pattern! We also find that language pretraining acts as implicit auxiliary training. Finetuning checkpoints at different pretraining stages shows that more pretraining => better length generalization on downstream synthetic tasks. After ~3 years studying length generalization, much of the initial magic has dissipated. And that's great! This is what science does. It lifts the veil of ignorance :)

English
2
20
141
11.1K
Dimitris Papailiopoulos
Dimitris Papailiopoulos@DimitrisPapail·
Excited about our new work: Language models develop computational circuits that are reusable AND TRANSFER across tasks. Over a year ago, I tested GPT-4 on 200 digit addition, and the model managed to do it (without CoT!). Someone from OpenAI even clarified they NEVER trained GPT-4 explicitly on 200-digit arithmetic. (can't find the tweet :( ) How?? It felt like magic. In controlled arithmetic tests on transformers, length generalization consistently failed. There must be something magic about pretraining? Turns out there's a clean, simple, and plausible answer: Transfer. Here is what we find with Jack @jackcai1206 Nayoung @nayoung_nylee, Avi @A_v_i__S, and my friend Samet @SametOymac: Language models develop computational circuits that TRANSFER length generalization across related tasks.arxiv.org/abs/2506.09251 A "main" task (like addition) trained on short sequences inherits length capabilities from an "auxiliary" task (like carry prediction) trained on longer sequences, if the model is co-trained on BOTH. This happens even when we train from scratch on only task A and B. But it only happens when A and B are related. So, length TRANSFERS between tasks, when they are similar. I think this is very cool! We tested this across three types of tasks: - arithmetic (reverse addition, carry operations) - string manipulation (copying, case flipping) - maze solving (DFS, shortest path). Same pattern! We also find that language pretraining acts as implicit auxiliary training. Finetuning checkpoints at different pretraining stages shows that more pretraining => better length generalization on downstream synthetic tasks. After ~3 years studying length generalization, much of the initial magic has dissipated. And that's great! This is what science does. It lifts the veil of ignorance :)
Dimitris Papailiopoulos tweet mediaDimitris Papailiopoulos tweet mediaDimitris Papailiopoulos tweet mediaDimitris Papailiopoulos tweet media
English
24
75
522
120.1K
Francis Villatoro
Francis Villatoro@emulenews·
#arXiv Can Machines Philosophize? arxiv.org/abs/2507.00675 “debate on scientific realism: survey of over 500 human participants, including both physicists and philosophers of science. Machine personas using an AI engine based on a LLM generative model.”
Francis Villatoro tweet media
English
3
3
9
879
Shion Honda
Shion Honda@shion_honda·
Subliminal Learning [Cloud+, 2025] This paper studies subliminal learning, a phenomenon where LMs transmit behavioral traits (e.g., liking owls) via semantically unrelated data (e.g., a sequence of numbers) during model distillation. arxiv.org/abs/2507.14805… #NowReading
Shion Honda tweet media
English
1
1
4
762
Vineet Jain
Vineet Jain@thevineetjain·
Excited to share our ACL 2025 paper on Ternary Language Models (TriLMs)! We study their scaling laws and release fast, efficient GPU kernels to make TriLMs practical at scale. Check out the paper: arxiv.org/abs/2506.23025
Tejas Vaidhya@imtejas13

🚀 New Paper: Scaling Laws and Efficient Inference for Ternary Language Models. Thrilled to share that our work was presented at ACL 2025! We explore ternary LMs (TriLMs), studying their scaling laws and efficiency compared to traditional FloatLMs. 🧵 1/6

English
1
2
11
1.2K
Frank Nielsen
Frank Nielsen@FrnkNlsn·
IEEE Transactions on Information Theory: "On f-divergences between Cauchy distributions" Chi square divergence = maximal invariant ➔ f-divergences between Cauchy distributions = functions of their chi2. arXiv version arxiv.org/abs/2101.12459
Frank Nielsen tweet media
English
1
14
112
6.4K
arXiv Sound
arXiv Sound@ArxivSound·
Tien-Hong Lo, Meng-Ting Tsai, Yao-Ting Sung, Berlin Chen, "Zero-Shot Text-to-Speech as Golden Speech Generator: A Systematic Framework and its Applicability in Automatic Pronunciation Assessment," arxiv.org/abs/2409.07151
English
1
0
4
517
arXiv Sound
arXiv Sound@ArxivSound·
Gallil Maimon, Michael Hassid, Amit Roth, Yossi Adi, "Scaling Analysis of Interleaved Speech-Text Language Models," arxiv.org/abs/2504.02398
English
1
1
10
671
arXiv Sound
arXiv Sound@ArxivSound·
Renhang Liu, Chia-Yu Hung, Navonil Majumder, Taylor Gautreaux, Amir Ali Bagherzadeh, Chuan Li, Dorien Herremans, Soujanya Poria, "JAM: A Tiny Flow-based Song Generator with Fine-grained Controllability and Aesthetic Alignment," arxiv.org/abs/2507.20880
English
1
1
8
699
arXiv Sound
arXiv Sound@ArxivSound·
Linye Wei, Shuzhang Zhong, Songqiang Xu, Runsheng Wang, Ru Huang, Meng Li, "SpecASR: Accelerating LLM-based Automatic Speech Recognition via Speculative Decoding," arxiv.org/abs/2507.18181
English
1
0
6
879