Gallil Maimon

108 posts

Gallil Maimon

@GallilMaimon

Research Scientist intern @ Meta (FAIR); PhD student @CseHuji; Speech Language Modelling

Israel Beigetreten Ekim 2022

695 Folgt331 Follower

Angehefteter Tweet

Gallil Maimon@GallilMaimon·4 Nis

Many modern SpeechLMs are trained with Speech-Text interleaving. How does this impact scaling trends? In our new paper, we train several dozen SLMs, and show - quite a lot! So there is room for optimism 😊 Key insights, code, models, full paper 👇🏻

English

5.5K

Gallil Maimon@GallilMaimon·18 Mar

@arxiv upload "on hold" for over a week and not for the first time🥲 Guess the world isn't ready for the paper😅 No reason provided, reach out responded with generic "wait". Would be useful to get reasons so that next time we can avoid overloading the moderators

English

164

Gallil Maimon retweetet

Jonathan Kahana@jonkahana·2 Şub

🚨 New Paper Alert 🚨 We assume the "Wisdom of the Crowd" surfaces the best models, but are we actually finding the right ones? 🤔 Surprisingly, popularity is a poor proxy for performance and better options exist! 📉 Project: jonkahana.github.io/hidden_gems 👇 Here's what we found:

English

2.6K

Gallil Maimon retweetet

Oren Sultan@oren_sultan·1 Şub

Can LLMs reliably predict program termination? We evaluate frontier LLMs in the International Competition on Software Verification (SV-COMP) 2025, directly competing with state-of-the-art verification systems. @AIatMeta @HebrewU @Bloomberg @imperialcollege @ucl @jordiae @pascalkesseli @jvanegue @HyadataLab @adiyossLC @PeterOHearn12 Paper: arxiv.org/pdf/2601.18987 Website: orensultan.com/llms_halting_p… 🧵👇 1/n

English

116

43.3K

Gallil Maimon@GallilMaimon·29 Oca

@jiatongshi @WavLab @LTIatCMU @CarnegieMellon @AnuttaconGames @shinjiw_at_cmu Congrats! Well deserved 🍾

English

jiatongshi@jiatongshi·28 Oca

🎓 Officially graduated from @WavLab @LTIatCMU @CarnegieMellon Excited to start a new chapter as an Audio Researcher @AnuttaconGames 🚀 Grateful for amazing mentors @shinjiw_at_cmu , collaborators, and the speech/audio community along the way.

English

1.3K

Gallil Maimon@GallilMaimon·28 Oca

Having a birthday on the ICML deadline is, interesting 😅🎂 Not sure I recommend, but looking forward to share the paper💪🔜

English

234

Gallil Maimon@GallilMaimon·24 Ara

Merry Christmas to all who celebrate 🎅 and happy take advantage of cluster being free for ICML for those who don’t 💻💪

English

140

Gallil Maimon retweetet

AI Engineer@aiDotEngineer·17 Ara

🆕 Code World Models: Building World Models for Computation youtube.com/watch?v=sYgE4p… One of the biggest ideas in codegen this year was the CWM from @AIatMeta. @jacob_d_kahn joins us to discuss the intuition, results, and RL work behind building such a highly capable research model! thanks to @syhw for helping to set this up!

YouTube

English

7.9K

Gallil Maimon@GallilMaimon·11 Ara

@rdesh26 Well written! I think tasks where SLM are most needed we want S2T accuracy>>T2T, as text doesn’t model full speech complexity. E.g stress understanding - arxiv.org/abs/2505.22765 Doing that while still keeping S2T=T2T for rest feels key to alignment that doesn’t collapse to ASR.

English

Desh Raj@rdesh26·10 Ara

🧠 Speech–Text Alignment for SpeechLLMs Let's talk about "speech+text → text" models. (We’ll save “omni” models for a future post 👀) A SpeechLLM has speech-text alignment if its accuracy is similar whether the input is speech (S2T) or text (T2T). How do we actually get this alignment? This requires answering 2 questions. 1️⃣ How do we map speech + text into the same embedding space? Raw spectral features (e.g., filterbanks) can be projected into the LLM dimension, but the sequence is (i) too long and (ii) full of irrelevant detail. A better approach: ✔️ Use a frozen speech encoder (like Whisper) ✔️ Extract dense semantic embeddings ✔️ Optionally subsample ✔️ Pass through a modality adapter → same dim as text tokens This is exactly what many SpeechLLMs do (e.g., Voxtral [1]). Depending on your budget, you can freeze almost everything and train only the adapter + LoRA on the LLM. 2️⃣ How do we build training sequences that force alignment? This depends heavily on your data, but the strategy is: interleave speech + text inside a single sequence. Models such as SpiritLM [2] use segment-level mixed sequences and train with next-token prediction. You can also introduce tasks such as: the speech, or answer questions about the audio as in Qwen2-Audio [3]. These tasks force the model to treat speech embeddings as just another kind of “token.” 🤔 But why doesn't alignment emerge naturally? Text-only multilingual LLMs don’t need special tasks to align languages like English and Hindi. They discover cross-lingual structure automatically [4]. So why not speech? Hypothesis: Text sequences, even across unrelated languages, share token-level patterns. Speech embeddings, on the other hand, look nothing like text tokens. No shared alphabet, no shared short-range patterns, very different entropy + structure. So the model needs explicit signals to tie speech and text together⚡️ I am curious if there are any papers which have studied this problem in more detail. Please share your thoughts! ---------------------- [1] arxiv.org/abs/2507.13264 [2] arxiv.org/abs/2402.05755 [3] arxiv.org/abs/2407.10759 [4] arxiv.org/abs/2406.13229

English

1.7K

Gallil Maimon@GallilMaimon·7 Eki

@COLM_conf starting strong before it officially even started! 🍻 Looking forward to chat about Code World Modeling and also SpeechLMs 🤙🏻

English

Gallil Maimon@GallilMaimon·5 Eki

@tsuname @syhw ai.meta.com/research/publi… This is the correct link

English

dimenwarper@tsuname·5 Eki

@syhw 404 :(

189

Gabriel Synnaeve@syhw·5 Eki

it's what we do in Code World Model too ai.meta.com/temp/research/…

🇺🇦 Dzmitry Bahdanau@DBahdanau

I am excited to open-source PipelineRL - a scalable async RL implementation with in-flight weight updates. Why wait until your bored GPUs finish all sequences? Just update the weights and continue inference! Code: github.com/ServiceNow/Pip… Blog: huggingface.co/blog/ServiceNo…

English

118

15.6K

Gallil Maimon@GallilMaimon·3 Eki

🧑‍💻🤖Our new paper on Code World Model feels like just a start! Super interested to hear diverse perspectives on the future of code generation and reasoning! #COLM_2025

English

198

Gallil Maimon@GallilMaimon·3 Eki

Our work shows optimistic trends for scaling SpeechLMs, *if* done right! Excited to present it at @COLM_conf 📅Oct. 8th 16:30 📍710 Happy to chat SpeechLMs, my recent work on code reasoning👇 or beer at #BreWskey 🍻

English

332

Gallil Maimon@GallilMaimon·3 Eki

@Itay_itzhak_ Congrats 🥂 I’ll come check it out:)

English

Itay Itzhak@Itay_itzhak_·2 Eki

🚨Spotlight update🚨 Our paper on bias origins in LLMs is a *spotlight* paper with oral presentation at CoLM 2025!✨ Honored to be among just 24 selected and super excited to present and discuss biases and finetuning limits. Who’s joining in Montreal Tuesday morning? 👀

Itay Itzhak@Itay_itzhak_

🚨New paper alert🚨 🧠 Instruction-tuned LLMs show amplified cognitive biases — but are these new behaviors, or pretraining ghosts resurfacing? Excited to share our new paper, accepted to CoLM 2025🎉! See thread below 👇 #BiasInAI #LLMs #MachineLearning #NLProc

English

Gallil Maimon@GallilMaimon·25 Eyl

@bigeagle_xd I think this is still early days to see the full potential, hope to see much more:)

English

熊师傅 weight decay 了吗@bigeagle_xd·25 Eyl

anyway, we didn't observe any real benefit of tracing data (except for crux-io or similar tasks), neither does CWM according to the tech report. need mooooore research and gpus on it.

English

830

熊师傅 weight decay 了吗@bigeagle_xd·25 Eyl

nice work! i believe "generation" is the real "understanding", and learning to generate like a coding env during pre-training should lead to better exploration and exploitation in RL. p.s. the 24-03 version of moonshot-v1 model has included "tracing" data :)

Gabriel Synnaeve@syhw

(🧵) Today, we release Meta Code World Model (CWM), a 32-billion-parameter dense LLM that enables novel research on improving code generation through agentic reasoning and planning with world models. ai.meta.com/research/publi…

English

7.5K

Gallil Maimon retweetet

Gabriel Synnaeve@syhw·25 Eyl

I'm immensely proud of the work done by my cracked CodeGen team at Meta, with PhD students and veterans, for which nothing is someone else's problem. The broader Meta AI community all pulled together for this. I'm very thankful for the unwavering support of our whole leadership.

English

135

11.2K

Gallil Maimon retweetet

AI at Meta@AIatMeta·25 Eyl

New from Meta FAIR: Code World Model (CWM), a 32B-parameter research model designed to explore how world models can transform code generation and reasoning about code. We believe in advancing research in world modeling and are sharing CWM under a research license to help empower the community to build upon our work. ➡️ Read the technical report: ai.meta.com/research/publi… ➡️Download the open weights: huggingface.co/facebook/cwm ➡️Download the code: github.com/facebookresear…

English

221

1.4K

311.2K

Gallil Maimon retweetet

Yann LeCun@ylecun·25 Eyl

Code World Model: producing code by imagining the effect of executing instructions and planning instructions that produce the desired effect.

Gabriel Synnaeve@syhw

English

167

1.8K

241.1K

Gallil Maimon retweetet

Alexandr Wang@alexandr_wang·25 Eyl

new research from Meta FAIR: Code World Model (CWM), a 32B research model we encourage the research community to research this open-weight model! pass@1 evals, for the curious: 65.8 % on SWE-bench Verified 68.6 % on LiveCodeBench 96.6 % on Math-500 76.0 % on AIME 2024 🧵

English

156

1.4K

867.6K

Gallil Maimon@GallilMaimon·25 Eyl

Super cool work I got to take part in! 🔥 Code World Models which predict how code impacts the environment - from variables to files, open up new options! More to come on this :) Open weights 👇 Waiting to see what the community build 💪

Gabriel Synnaeve@syhw

English

300

Gallil Maimon retweetet

clem 🤗@ClementDelangue·25 Eyl

Interesting! Weights on HF of course: huggingface.co/collections/fa…

Gabriel Synnaeve@syhw

English

29.2K

Entdecken

@arxiv @AIatMeta @HebrewU @Bloomberg @imperialcollege @ucl @jordiae @pascalkesseli