Gallil Maimon

108 posts

Gallil Maimon

Gallil Maimon

@GallilMaimon

Research Scientist intern @ Meta (FAIR); PhD student @CseHuji; Speech Language Modelling

Israel Beigetreten Ekim 2022
695 Folgt331 Follower
Angehefteter Tweet
Gallil Maimon
Gallil Maimon@GallilMaimon·
Many modern SpeechLMs are trained with Speech-Text interleaving. How does this impact scaling trends? In our new paper, we train several dozen SLMs, and show - quite a lot! So there is room for optimism 😊 Key insights, code, models, full paper 👇🏻
Gallil Maimon tweet media
English
4
19
74
5.5K
Gallil Maimon
Gallil Maimon@GallilMaimon·
@arxiv upload "on hold" for over a week and not for the first time🥲 Guess the world isn't ready for the paper😅 No reason provided, reach out responded with generic "wait". Would be useful to get reasons so that next time we can avoid overloading the moderators
Gallil Maimon tweet media
English
0
0
6
164
Gallil Maimon retweetet
Jonathan Kahana
Jonathan Kahana@jonkahana·
🚨 New Paper Alert 🚨 We assume the "Wisdom of the Crowd" surfaces the best models, but are we actually finding the right ones? 🤔 Surprisingly, popularity is a poor proxy for performance and better options exist! 📉 Project: jonkahana.github.io/hidden_gems 👇 Here's what we found:
English
2
10
34
2.6K
Gallil Maimon retweetet
Oren Sultan
Oren Sultan@oren_sultan·
Can LLMs reliably predict program termination? We evaluate frontier LLMs in the International Competition on Software Verification (SV-COMP) 2025, directly competing with state-of-the-art verification systems. @AIatMeta @HebrewU @Bloomberg @imperialcollege @ucl @jordiae @pascalkesseli @jvanegue @HyadataLab @adiyossLC @PeterOHearn12 Paper: arxiv.org/pdf/2601.18987 Website: orensultan.com/llms_halting_p… 🧵👇 1/n
Oren Sultan tweet media
English
9
42
116
43.3K
Gallil Maimon
Gallil Maimon@GallilMaimon·
Having a birthday on the ICML deadline is, interesting 😅🎂 Not sure I recommend, but looking forward to share the paper💪🔜
English
1
0
8
234
Gallil Maimon
Gallil Maimon@GallilMaimon·
Merry Christmas to all who celebrate 🎅 and happy take advantage of cluster being free for ICML for those who don’t 💻💪
English
0
0
3
140
Gallil Maimon retweetet
AI Engineer
AI Engineer@aiDotEngineer·
🆕 Code World Models: Building World Models for Computation youtube.com/watch?v=sYgE4p… One of the biggest ideas in codegen this year was the CWM from @AIatMeta. @jacob_d_kahn joins us to discuss the intuition, results, and RL work behind building such a highly capable research model! thanks to @syhw for helping to set this up!
YouTube video
YouTube
English
0
4
28
7.9K
Gallil Maimon
Gallil Maimon@GallilMaimon·
@rdesh26 Well written! I think tasks where SLM are most needed we want S2T accuracy>>T2T, as text doesn’t model full speech complexity. E.g stress understanding - arxiv.org/abs/2505.22765 Doing that while still keeping S2T=T2T for rest feels key to alignment that doesn’t collapse to ASR.
English
1
0
1
81
Desh Raj
Desh Raj@rdesh26·
🧠 Speech–Text Alignment for SpeechLLMs Let's talk about "speech+text → text" models. (We’ll save “omni” models for a future post 👀) A SpeechLLM has speech-text alignment if its accuracy is similar whether the input is speech (S2T) or text (T2T). How do we actually get this alignment? This requires answering 2 questions. 1️⃣ How do we map speech + text into the same embedding space? Raw spectral features (e.g., filterbanks) can be projected into the LLM dimension, but the sequence is (i) too long and (ii) full of irrelevant detail. A better approach: ✔️ Use a frozen speech encoder (like Whisper) ✔️ Extract dense semantic embeddings ✔️ Optionally subsample ✔️ Pass through a modality adapter → same dim as text tokens This is exactly what many SpeechLLMs do (e.g., Voxtral [1]). Depending on your budget, you can freeze almost everything and train only the adapter + LoRA on the LLM. 2️⃣ How do we build training sequences that force alignment? This depends heavily on your data, but the strategy is: interleave speech + text inside a single sequence. Models such as SpiritLM [2] use segment-level mixed sequences and train with next-token prediction. You can also introduce tasks such as: the speech, or answer questions about the audio as in Qwen2-Audio [3]. These tasks force the model to treat speech embeddings as just another kind of “token.” 🤔 But why doesn't alignment emerge naturally? Text-only multilingual LLMs don’t need special tasks to align languages like English and Hindi. They discover cross-lingual structure automatically [4]. So why not speech? Hypothesis: Text sequences, even across unrelated languages, share token-level patterns. Speech embeddings, on the other hand, look nothing like text tokens. No shared alphabet, no shared short-range patterns, very different entropy + structure. So the model needs explicit signals to tie speech and text together⚡️ I am curious if there are any papers which have studied this problem in more detail. Please share your thoughts! ---------------------- [1] arxiv.org/abs/2507.13264 [2] arxiv.org/abs/2402.05755 [3] arxiv.org/abs/2407.10759 [4] arxiv.org/abs/2406.13229
English
1
2
31
1.7K
Gallil Maimon
Gallil Maimon@GallilMaimon·
@COLM_conf starting strong before it officially even started! 🍻 Looking forward to chat about Code World Modeling and also SpeechLMs 🤙🏻
Gallil Maimon tweet media
English
0
0
2
45
Gallil Maimon
Gallil Maimon@GallilMaimon·
🧑‍💻🤖Our new paper on Code World Model feels like just a start! Super interested to hear diverse perspectives on the future of code generation and reasoning! #COLM_2025
Gallil Maimon tweet media
English
0
0
3
198
Gallil Maimon
Gallil Maimon@GallilMaimon·
Our work shows optimistic trends for scaling SpeechLMs, *if* done right! Excited to present it at @COLM_conf 📅Oct. 8th 16:30 📍710 Happy to chat SpeechLMs, my recent work on code reasoning👇 or beer at #BreWskey 🍻
Gallil Maimon tweet media
English
1
0
12
332
Itay Itzhak
Itay Itzhak@Itay_itzhak_·
🚨Spotlight update🚨 Our paper on bias origins in LLMs is a *spotlight* paper with oral presentation at CoLM 2025!✨ Honored to be among just 24 selected and super excited to present and discuss biases and finetuning limits. Who’s joining in Montreal Tuesday morning? 👀
Itay Itzhak@Itay_itzhak_

🚨New paper alert🚨 🧠 Instruction-tuned LLMs show amplified cognitive biases — but are these new behaviors, or pretraining ghosts resurfacing? Excited to share our new paper, accepted to CoLM 2025🎉! See thread below 👇 #BiasInAI #LLMs #MachineLearning #NLProc

English
3
7
35
4K
Gallil Maimon
Gallil Maimon@GallilMaimon·
@bigeagle_xd I think this is still early days to see the full potential, hope to see much more:)
English
0
0
0
21
熊师傅 weight decay 了吗
熊师傅 weight decay 了吗@bigeagle_xd·
anyway, we didn't observe any real benefit of tracing data (except for crux-io or similar tasks), neither does CWM according to the tech report. need mooooore research and gpus on it.
English
1
0
5
830
熊师傅 weight decay 了吗
熊师傅 weight decay 了吗@bigeagle_xd·
nice work! i believe "generation" is the real "understanding", and learning to generate like a coding env during pre-training should lead to better exploration and exploitation in RL. p.s. the 24-03 version of moonshot-v1 model has included "tracing" data :)
Gabriel Synnaeve@syhw

(🧵) Today, we release Meta Code World Model (CWM), a 32-billion-parameter dense LLM that enables novel research on improving code generation through agentic reasoning and planning with world models. ai.meta.com/research/publi…

English
2
0
58
7.5K
Gallil Maimon retweetet
Gabriel Synnaeve
Gabriel Synnaeve@syhw·
I'm immensely proud of the work done by my cracked CodeGen team at Meta, with PhD students and veterans, for which nothing is someone else's problem. The broader Meta AI community all pulled together for this. I'm very thankful for the unwavering support of our whole leadership.
English
5
6
135
11.2K
Gallil Maimon retweetet
AI at Meta
AI at Meta@AIatMeta·
New from Meta FAIR: Code World Model (CWM), a 32B-parameter research model designed to explore how world models can transform code generation and reasoning about code. We believe in advancing research in world modeling and are sharing CWM under a research license to help empower the community to build upon our work. ➡️ Read the technical report: ai.meta.com/research/publi… ➡️Download the open weights: huggingface.co/facebook/cwm ➡️Download the code: github.com/facebookresear…
English
90
221
1.4K
311.2K
Gallil Maimon retweetet
Alexandr Wang
Alexandr Wang@alexandr_wang·
new research from Meta FAIR: Code World Model (CWM), a 32B research model we encourage the research community to research this open-weight model! pass@1 evals, for the curious: 65.8 % on SWE-bench Verified 68.6 % on LiveCodeBench 96.6 % on Math-500 76.0 % on AIME 2024 🧵
Alexandr Wang tweet media
English
95
156
1.4K
867.6K
Gallil Maimon
Gallil Maimon@GallilMaimon·
Super cool work I got to take part in! 🔥 Code World Models which predict how code impacts the environment - from variables to files, open up new options! More to come on this :) Open weights 👇 Waiting to see what the community build 💪
Gabriel Synnaeve@syhw

(🧵) Today, we release Meta Code World Model (CWM), a 32-billion-parameter dense LLM that enables novel research on improving code generation through agentic reasoning and planning with world models. ai.meta.com/research/publi…

English
0
0
8
300