Synthical

9

2.7K

Synthical@synthical_ai·5 Ağu

@ArxivSound Dark mode for this paper for those who read at night 🌚 synthical.com/abs/2505.24437…

English

39

arXiv Sound@ArxivSound·5 Ağu

Jin Wang, Wenbin Jiang, Xiangbo Wang, Yubo You, Sheng Fang, "SwitchCodec: A High-Fidelity Nerual Audio Codec With Sparse Quantization," arxiv.org/abs/2505.24437

English

15

1.1K

Synthical@synthical_ai·4 Ağu

@DimitrisPapail @KartikSreeni @nayoung_nylee Dark mode for this paper for night readers 🌙 synthical.com/abs/2307.03381…

English

Dimitris Papailiopoulos@DimitrisPapail

96

Dimitris Papailiopoulos@DimitrisPapail·4 Ağu

This completes a three-year journey attempting to understand arithmetic and length generalization in transformers: 2023-2024: Exploring arithmetic and length generalization in transformers, led by Kartik @KartikSreeni and Nayoung @nayoung_nylee. arxiv.org/abs/2307.03381 2024-2025: Discovering transformers can learn arithmetic through self-generated data and self-improvement, work led by Nayoung @nayoung_nylee and Jack @jackcai1206. arxiv.org/abs/2502.01612 2025: Demonstrating length generalization transfer from training on related tasks of varying lengths, again by Jack @jackcai1206 and Nayoung @nayoung_nylee. arxiv.org/abs/2506.09251 Honestly, this line of work feels very very rewarding. Not because it resolves every question, but because it changed what once felt like “magic” into something comprehensible (at least to me!!). That's why I love science, clear hypotheses tested from small setups to larger settings, and all these amazing people that I got to work with discovered aspects of transformers that previously seemed mysterious (at least to me). All the credit goes to them!! <3

Excited about our new work: Language models develop computational circuits that are reusable AND TRANSFER across tasks. Over a year ago, I tested GPT-4 on 200 digit addition, and the model managed to do it (without CoT!). Someone from OpenAI even clarified they NEVER trained GPT-4 explicitly on 200-digit arithmetic. (can't find the tweet :( ) How?? It felt like magic. In controlled arithmetic tests on transformers, length generalization consistently failed. There must be something magic about pretraining? Turns out there's a clean, simple, and plausible answer: Transfer. Here is what we find with Jack @jackcai1206 Nayoung @nayoung_nylee, Avi @A_v_i__S, and my friend Samet @SametOymac: Language models develop computational circuits that TRANSFER length generalization across related tasks.arxiv.org/abs/2506.09251 A "main" task (like addition) trained on short sequences inherits length capabilities from an "auxiliary" task (like carry prediction) trained on longer sequences, if the model is co-trained on BOTH. This happens even when we train from scratch on only task A and B. But it only happens when A and B are related. So, length TRANSFERS between tasks, when they are similar. I think this is very cool! We tested this across three types of tasks: - arithmetic (reverse addition, carry operations) - string manipulation (copying, case flipping) - maze solving (DFS, shortest path). Same pattern! We also find that language pretraining acts as implicit auxiliary training. Finetuning checkpoints at different pretraining stages shows that more pretraining => better length generalization on downstream synthetic tasks. After ~3 years studying length generalization, much of the initial magic has dissipated. And that's great! This is what science does. It lifts the veil of ignorance :)

English

20

141

11.1K

Synthical@synthical_ai·4 Ağu

@DimitrisPapail Dark mode for this paper 🌙 synthical.com/abs/2506.09251…

English

335

Dimitris Papailiopoulos@DimitrisPapail·4 Ağu

Excited about our new work: Language models develop computational circuits that are reusable AND TRANSFER across tasks. Over a year ago, I tested GPT-4 on 200 digit addition, and the model managed to do it (without CoT!). Someone from OpenAI even clarified they NEVER trained GPT-4 explicitly on 200-digit arithmetic. (can't find the tweet :( ) How?? It felt like magic. In controlled arithmetic tests on transformers, length generalization consistently failed. There must be something magic about pretraining? Turns out there's a clean, simple, and plausible answer: Transfer. Here is what we find with Jack @jackcai1206 Nayoung @nayoung_nylee, Avi @A_v_i__S, and my friend Samet @SametOymac: Language models develop computational circuits that TRANSFER length generalization across related tasks.arxiv.org/abs/2506.09251 A "main" task (like addition) trained on short sequences inherits length capabilities from an "auxiliary" task (like carry prediction) trained on longer sequences, if the model is co-trained on BOTH. This happens even when we train from scratch on only task A and B. But it only happens when A and B are related. So, length TRANSFERS between tasks, when they are similar. I think this is very cool! We tested this across three types of tasks: - arithmetic (reverse addition, carry operations) - string manipulation (copying, case flipping) - maze solving (DFS, shortest path). Same pattern! We also find that language pretraining acts as implicit auxiliary training. Finetuning checkpoints at different pretraining stages shows that more pretraining => better length generalization on downstream synthetic tasks. After ~3 years studying length generalization, much of the initial magic has dissipated. And that's great! This is what science does. It lifts the veil of ignorance :)

English

24

75

522

120.1K

Synthical@synthical_ai·3 Ağu

@emulenews Dark mode for this paper 🌙 synthical.com/abs/2507.23103…

English

8

Francis Villatoro@emulenews·3 Ağu

#arXiv #HoleyGraphene Hyperbolic Plasmon dispersion and Optical Conductivity of Holey Graphene: signatures of flat-bands arxiv.org/abs/2507.23103

English

3

856

Synthical@synthical_ai·3 Ağu

@emulenews Dark mode for this paper 🌚 synthical.com/abs/2507.00675…

English

13

Francis Villatoro@emulenews·3 Ağu

#arXiv Can Machines Philosophize? arxiv.org/abs/2507.00675 “debate on scientific realism: survey of over 500 human participants, including both physicists and philosophers of science. Machine personas using an AI engine based on a LLM generative model.”

English

3

9

879

Synthical@synthical_ai·3 Ağu

@momiji_fullmoon Dark mode for this paper for those who read at night 🌙 synthical.com/abs/2412.14042…

English

15

望月紅葉さんと幸せな家庭を築きたい@momiji_fullmoon·3 Ağu

[2412.14042] CAD-Recode: Reverse Engineering CAD Code from Point Clouds (ICCV2025) arxiv.org/abs/2412.14042

English

0

2

526

Synthical@synthical_ai·3 Ağu

@momiji_fullmoon Dark mode for this paper for night readers 🌚 synthical.com/abs/2503.16399…

English

10

望月紅葉さんと幸せな家庭を築きたい@momiji_fullmoon·3 Ağu

[2503.16399] SA-Occ: Satellite-Assisted 3D Occupancy Prediction in Real World(ICCV2025) arxiv.org/abs/2503.16399

English

0

2

473

Synthical@synthical_ai·3 Ağu

@shion_honda Dark mode for this paper for night readers 🌚 synthical.com/abs/2507.14805…

English

1

21

Shion Honda@shion_honda·3 Ağu

Subliminal Learning [Cloud+, 2025] This paper studies subliminal learning, a phenomenon where LMs transmit behavioral traits (e.g., liking owls) via semantically unrelated data (e.g., a sequence of numbers) during model distillation. arxiv.org/abs/2507.14805… #NowReading

English

4

762

Synthical@synthical_ai·3 Ağu

@thevineetjain Dark mode for this paper 🌙 synthical.com/abs/2506.23025…

English

20

Vineet Jain@thevineetjain·3 Ağu

Excited to share our ACL 2025 paper on Ternary Language Models (TriLMs)! We study their scaling laws and release fast, efficient GPU kernels to make TriLMs practical at scale. Check out the paper: arxiv.org/abs/2506.23025

Tejas Vaidhya@imtejas13

🚀 New Paper: Scaling Laws and Efficient Inference for Ternary Language Models. Thrilled to share that our work was presented at ACL 2025! We explore ternary LMs (TriLMs), studying their scaling laws and efficiency compared to traditional FloatLMs. 🧵 1/6

English

2

11

1.2K

Synthical@synthical_ai·3 Ağu

@FrnkNlsn Dark mode for this paper 🌙 synthical.com/abs/2101.12459…

English

1

90

Frank Nielsen@FrnkNlsn·3 Ağu

IEEE Transactions on Information Theory: "On f-divergences between Cauchy distributions" Chi square divergence = maximal invariant ➔ f-divergences between Cauchy distributions = functions of their chi2. arXiv version arxiv.org/abs/2101.12459

English

14

112

6.4K

Synthical@synthical_ai·30 Tem

@ArxivSound Dark mode for this paper for night readers 🌙 synthical.com/abs/2409.07151…

English

6

arXiv Sound@ArxivSound·30 Tem

Tien-Hong Lo, Meng-Ting Tsai, Yao-Ting Sung, Berlin Chen, "Zero-Shot Text-to-Speech as Golden Speech Generator: A Systematic Framework and its Applicability in Automatic Pronunciation Assessment," arxiv.org/abs/2409.07151

English

0

4

517

Synthical@synthical_ai·30 Tem

@mathNTb Dark mode for this paper for those who read at night 🌙 synthical.com/abs/2507.21801…

English

8

arXiv math.NT Number Theory@mathNTb·30 Tem

Gautier Ponsinet: On a characterisation of perfectoid fields by Iwasawa theory arxiv.org/abs/2507.21801 arxiv.org/pdf/2507.21801 arxiv.org/html/2507.21801

English

2

9

827

Synthical@synthical_ai·30 Tem

@Hassaan_PHY Dark mode for this paper 🌚 synthical.com/abs/2207.01633…

English

52

Superconformal Hassaan@Hassaan_PHY·30 Tem

I was writing a piece where I had to talk about beyond standard model physics. Among the writing process, I came across the following book arxiv.org/abs/2207.01633 Love it #physics #scicomm

English

8

15

85

4.5K

Synthical@synthical_ai·30 Tem

@CsTominaga Dark mode for this paper for night readers 🌙 synthical.com/abs/2507.21111…

English

2

S Tominaga (Aka Dr Craig Wright)@CsTominaga·30 Tem

My Rebuttal of a terrible paper... arxiv.org/abs/2507.21111 A Formal Rebuttal of "The Blockchain Trilemma: A Formal Proof of the Inherent Trade-Offs Among Decentralization, Security, and Scalability"

English

6

11

53

3.5K

Synthical@synthical_ai·30 Tem

@ArxivSound Dark mode for this paper for night readers 🌚 synthical.com/abs/2504.02398…

English

arXiv Sound@ArxivSound·30 Tem

Gallil Maimon, Michael Hassid, Amit Roth, Yossi Adi, "Scaling Analysis of Interleaved Speech-Text Language Models," arxiv.org/abs/2504.02398

English

10

671

Synthical@synthical_ai·30 Tem

@ArxivSound Dark mode for this paper for night readers 🌙 synthical.com/abs/2507.20880…

English

6

arXiv Sound@ArxivSound·30 Tem

Renhang Liu, Chia-Yu Hung, Navonil Majumder, Taylor Gautreaux, Amir Ali Bagherzadeh, Chuan Li, Dorien Herremans, Soujanya Poria, "JAM: A Tiny Flow-based Song Generator with Fine-grained Controllability and Aesthetic Alignment," arxiv.org/abs/2507.20880

English

8

699

Synthical@synthical_ai·30 Tem

@Cohere_Labs @aclmeeting @akoksal_ @imani_ayyoob @ahmetustun89 @annalkorhonen @lpq29743 Dark mode for this paper for those who read at night 🌙 synthical.com/abs/2409.12958…

English

1

39

Cohere Labs@Cohere_Labs·30 Tem

It’s day 4 at @aclmeeting and we’re excited that MURI will be showcased today! 🎉 Work led by: @akoksal_ Marion Thaler @imani_ayyoob @ahmetustun89 @annalkorhonen @lpq29743 📜arxiv.org/abs/2409.12958

English

3

19

4K

Synthical@synthical_ai·30 Tem

@rust_ruslan Dark mode for this paper for those who read at night 🌙 synthical.com/abs/2507.07935…

English

1

54

Ruslan Rust@rust_ruslan·30 Tem

Microsoft just released a study showing the 40 jobs most affected by AI and the 40 least affected. arxiv.org/pdf/2507.07935

English

3

14

1.8K

Synthical@synthical_ai·30 Tem

@ArxivSound Dark mode for this paper for those who read at night 🌙 synthical.com/abs/2507.18181…

English

6

arXiv Sound@ArxivSound·30 Tem

Linye Wei, Shuzhang Zhong, Songqiang Xu, Runsheng Wang, Ru Huang, Meng Li, "SpecASR: Accelerating LLM-based Automatic Speech Recognition via Speculative Decoding," arxiv.org/abs/2507.18181

English

0

6

879

Synthical@synthical_ai·30 Tem

@SciencePapers Dark mode for this paper for those who read at night 🌚 synthical.com/abs/2210.03781…

English