Or Tal (@Or__Tal) - โปรไฟล์ Twitter

ทวีตที่ปักหมุด

Or Tal@Or__Tal·12 Haz

Which modeling to choose for text-to-music generation? We run a head-to-head comparison to figure it out. Same data, same architecture - AR vs FM. 👇 If you care about fidelity, speed, control, or editing see this thread. 🔗huggingface.co/spaces/ortal16… 📄arxiv.org/abs/2506.08570 1/6

English

1

11

41

1.7K

Or Tal รีทวีตแล้ว

Eliahu Horwitz@EliahuHorwitz·16 Mar

🚨 New paper alert! 🚨 Millions of neural networks now populate public repositories like Hugging Face 🤗, but most lack documentation. So, we decided to build an Atlas 🗺️ Project: horwitz.ai/model-atlas Demo: huggingface.co/spaces/Eliahu/… 🧵👇🏻 Here's what we found:

AK@_akhaliq

Charting and Navigating Hugging Face's Model Atlas

English

5

19

89

13.2K

Or Tal รีทวีตแล้ว

Eliahu Horwitz@EliahuHorwitz·26 Eyl

Excited to share this has now been accepted at #NeurIPS2025 as a position paper (<6% acceptance)!🎉 We advocate for systematically studying entire model populations via weight-space learning, and argue that this requires charting them in a Model Atlas. @NeurIPSConf #NeurIPS 🧵👇

Eliahu Horwitz@EliahuHorwitz

🚨 New paper alert! 🚨 Millions of neural networks now populate public repositories like Hugging Face 🤗, but most lack documentation. So, we decided to build an Atlas 🗺️ Project: horwitz.ai/model-atlas Demo: huggingface.co/spaces/Eliahu/… 🧵👇🏻 Here's what we found:

English

0

21

64

3.9K

Or Tal รีทวีตแล้ว

Heli Ben-Hamu@helibenhamu·5 Eyl

Excited to share our work Set Block Decoding! A new paradigm combining next-token-prediction and masked (or discrete diffusion) models, allowing parallel decoding without any architectural changes and with exact KV cache. Arguably one of the simplest ways to accelerate LLMs!

English

5

24

115

25.7K

Or Tal@Or__Tal·15 Tem

@jesseengel very cool!

English

0

1

66

Jesse Engel@jesseengel·9 Tem

New VST/AU Plugin! 🚨 Play with Lyria RealTime directly from inside your favorite DAW with “The Infinite Crate” 🎧🎶 Like other Lyria RT demos, you can mix together text prompts and other controls to steer the model in real-time. But now with a VST plugin you can feed audio directly into your DAW for sampling, live performance, or even a practice partner to jam with. 💾 Get it here: g.co/magenta/infini…

English

14

33

188

59.7K

Or Tal@Or__Tal·10 Tem

@__Rafail__ @NadavHarTuv I believe it should. If the audio is very long you may need to parse it in chunks but thats also true to all other audio representation models

English

0

1

22

Raphael@__Rafail__·10 Tem

@Or__Tal @NadavHarTuv Can this work on 5070 16gb?

English

1

0

19

נדב הר-טוב@NadavHarTuv·7 Tem

🚨 New paper alert! PAST: phonetic-acoustic speech tokenizer – just got accepted to Interspeech 2025 🎉 It learns phonetic + acoustic tokens jointly, with no SSL babysitter or external vocoder. 🔗pages.cs.huji.ac.il/adiyoss-lab/PA… 👇 If you’re into speech LMs, keep reading!

English

3

33

164

12.8K

Or Tal@Or__Tal·10 Tem

@__Rafail__ @NadavHarTuv Nope, PAST has ~180M params, with the streamable version having ~125M params. This should run on a standard gpu. For speech LM training we used 2 a100 gpus but it could be done with less

English

1

0

1

40

Raphael@__Rafail__·8 Tem

@NadavHarTuv This is all great, but what is the consumption of such a solution? Do I need to buy H200 to run it?

English

1

0

42

Or Tal รีทวีตแล้ว

Gallil Maimon@GallilMaimon·4 Nis

Many modern SpeechLMs are trained with Speech-Text interleaving. How does this impact scaling trends? In our new paper, we train several dozen SLMs, and show - quite a lot! So there is room for optimism 😊 Key insights, code, models, full paper 👇🏻

English

4

19

74

5.5K

Or Tal รีทวีตแล้ว

Gallil Maimon@GallilMaimon·8 Tem

🎉Thrilled that our paper on "scaling analysis of interleaved speech-text LMs" was accepted to #CoLM2025 It gives room for optimism when scaling SpeechLMs *right* - with large TextLMs (in place of more data), interleaving, and synth training data💪

English

1

4

29

1.5K

Or Tal รีทวีตแล้ว

Ron Yosef@ron_yosef·7 Tem

Happy to announce that our paper “EditInspector: A Benchmark for Evaluation of Text-Guided Image Edits” was accepted to #ACL2025 🎉 📄 arxiv.org/abs/2506.09988 🌐 editinspector.github.io

English

2

5

22

1.4K

Or Tal@Or__Tal·7 Tem

💣Introducing PAST: a speech tokenizer that jointly model phonetics and acoustics (No SSL involved). Past demonstrates great reconstruction as well as semantic capabilities in the form of ABX and sWUGGY. 🤗 huggingface.co/slprl/PAST Check out Nadav's post👇@NadavHarTuv @adiyossLC

נדב הר-טוב@NadavHarTuv

🚨 New paper alert! PAST: phonetic-acoustic speech tokenizer – just got accepted to Interspeech 2025 🎉 It learns phonetic + acoustic tokens jointly, with no SSL babysitter or external vocoder. 🔗pages.cs.huji.ac.il/adiyoss-lab/PA… 👇 If you’re into speech LMs, keep reading!

English

0

9

368

Or Tal รีทวีตแล้ว

Audio and Speech Processing arXiv@AudioAndSpeech·12 Haz

Auto-Regressive vs Flow-Matching: a Comparative Study of Modeling Paradigms for Text-to-Music Generation. arxiv.org/abs/2506.08570

English

0

2

11

476

Or Tal รีทวีตแล้ว

Gallil Maimon@GallilMaimon·13 Haz

🎵💬 If you are interested in Audio Tokenisers, you should check out our new work! We empirically analysed existing tokenisers from every way - reconstruction, downstream, LMs and more. Grab yourself a ☕/🍺 and sit down for a read!

English

1

25

103

5.8K

Or Tal รีทวีตแล้ว

Niv Eckhaus@niveckhaus·12 Haz

🚨 New Paper: "Time to Talk"! 🕵️ We built an LLM agent that doesn't just decide WHAT to say, but also WHEN to say it! Introducing "Time to Talk" - LLM agents for asynchronous group communication, tested in real Mafia games with human players. 🌐niveck.github.io/Time-to-Talk 🧵1/7

English

3

12

57

6.2K

Or Tal@Or__Tal·12 Haz

Read the full paper! 🔗huggingface.co/spaces/ortal16… 📄arxiv.org/abs/2506.08570 @FelixKreuk @adiyossLC

English

0

3

88

Or Tal@Or__Tal·12 Haz

What if training steps are capped at 500k? FM reaches near-topline quality with small batches. It’s compute-efficient and forgiving. AR needs larger batch sizes to recover performance. It benefits more from large-scale training. See📉 below by model duration + batch size: 6/6

English

1

0

2

129

Or Tal@Or__Tal·12 Haz

Which modeling to choose for text-to-music generation? We run a head-to-head comparison to figure it out. Same data, same architecture - AR vs FM. 👇 If you care about fidelity, speed, control, or editing see this thread. 🔗huggingface.co/spaces/ortal16… 📄arxiv.org/abs/2506.08570 1/6

English

1

11

41

1.7K

Or Tal รีทวีตแล้ว

Felix Kreuk@FelixKreuk·12 Haz

We’ve been exploring the trade-offs between Autoregressive and Flow-Matching models for music generation. We share our findings in this latest paper led by @Or__Tal. Many interesting take-aways and practical advice on training generative models for music! 🎶🧠

Or Tal@Or__Tal

Which modeling to choose for text-to-music generation? We run a head-to-head comparison to figure it out. Same data, same architecture - AR vs FM. 👇 If you care about fidelity, speed, control, or editing see this thread. 🔗huggingface.co/spaces/ortal16… 📄arxiv.org/abs/2506.08570 1/6

English

1

11

601

Or Tal

ค้นพบ