Edouard Grave

129 posts

Edouard Grave

Edouard Grave

@EXGRV

large language models @kyutai_labs

paris, france Katılım Ekim 2012
166 Takip Edilen2.9K Takipçiler
Edouard Grave retweetledi
Omar Sanseviero
Omar Sanseviero@osanseviero·
Gemma 4 is here! 🧠 31B and 26B A4B for models with impressive intelligence per parameter 🤏E2B and E4B for mobile and IoT 🤗Apache 2.0 🤖Base and IT checkpoints available Available in AI Studio, Hugging Face, Ollama, Android, and your favorite OS tools 🚀Download it today!
Omar Sanseviero tweet media
English
34
82
650
40K
Lucas Beyer (bl16)
Lucas Beyer (bl16)@giffmana·
Amazing! Truly open review, through which we all gained more insights, i love it! Result: in multi epoch setting, making AR learn multiple orderings ~closes the gap to diffusion, explaining much of the difference. How the truly open review happened (from my vague memory): Mihir posted paper on diffusion being more sample efficient than AR and benefit from more epochs. Hypothesis in paper and from most commenters was due to augs, but no experiment to check this hypothesis in paper. After some discussion (on his thread, here on x) about this and what experiment might check this, including a first negative result Mihir mentioned and tried, we converged towards thinking it might be ordering instead. I suggested an experiment to check, but it had some issues. @YouJiacheng chimed in with connections to another paper, based on which we got the idea of how to make the experiment perfect. Then Mihir ran it, and it now looks like we have reasonably conclusive evidence PLUS potentially more confidence in a method to make AR better at multi-epoch. All in the open in realtime here on x dot com the everything app™
Mihir Prabhudesai@mihirp98

We ran more experiments to better understand “why” diffusion models do better in data-constrained settings than autoregressive. Our findings support the hypothesis that diffusion models benefit from learning over multiple token orderings, which contributes to their robustness and reduced overfitting. To test this, we trained autoregressive (AR) models with varying numbers of token orderings: N=1 corresponds to the standard left-to-right ordering, while N=k includes the left-to-right order plus k−1 additional random permutations. As N increases, we observe that AR models become more data-efficient, exhibiting improved validation loss and reduced overfitting. All models were trained for 100 epochs, and were evaluated using the standard left-to-right factorization. We also experimented with related approaches, such as RAR and σ-GPT, and observed consistent trends --introducing more random factorizations led to better generalization and less overfitting. We have updated our arXiv submission with these new results. We thank @giffmana and @YouJiacheng for suggesting these experiments. Original paper post - x.com/mihirp98/statu…

English
15
36
589
77.6K
Edouard Grave retweetledi
kyutai
kyutai@kyutai_labs·
Unmute meets Moshi 🫂💖 Talk to unmute.sh!
English
7
15
94
6.2K
Edouard Grave retweetledi
kyutai
kyutai@kyutai_labs·
Meet Helium-1 preview, our 2B multi-lingual LLM, targeting edge and mobile devices, released under a CC-BY license. Start building with it today! huggingface.co/kyutai/helium-…
English
10
89
379
58.2K
Edouard Grave
Edouard Grave@EXGRV·
I am at ICML in Vienna! Let me know if you want to chat about (or to) Moshi, multimodal LLMs, Kyutai & more.
English
2
5
31
11.2K
Thomas Wolf
Thomas Wolf@Thom_Wolf·
The @kyutai_labs fully end-to-end audio model demo of today is a huge deal that many people missed in the room Mostly irrelevant are the facts that: - they come a few week after OpenAI ChatGPT-4o - the demo was less polished than the 4o one (in terms of voice quality, voice timing…) Relevant: - the model training pipeline and model archi are simple and hugely scalable, with a tiny 8+ people team like Kyutai building it in 4 months. Synthetic data is a huge enabler here - laser focus on local devices: Moshi will soon be everywhere. Frontier model builders have low incentive to let you run smaller models locally (price per token…) but non-profits like Kyutai have very different incentives. The Moshi demo is already online while the OpenAI 4o one is still in limbo. - going under 300 ms of latency while keeping Llama 8B or above quality of answers is a key enabler in terms of interactivity, it’s game changing, This feeling when the model answer your question before you even finished asking is quite crazy or when you interrupt the model while it’s talking and it react… Predictive coding in a model, instantly updated model of what you’re about to say... Basically they nailed the fundamentals. It’s here. This interactive voice tech will be everywhere. It will soon be an obvious commodity.
English
70
350
1.8K
339.3K
Edouard Grave
Edouard Grave@EXGRV·
✈️ I will be attending #NeurIPS2023: let me know if you want to chat about the future of LLMs, and how to democratize them. 🌐 We are also hiring members of technical staff and interns @kyutai_labs. Happy to talk about the lab and our mission.
English
1
6
58
13.4K
Soumith Chintala
Soumith Chintala@soumithchintala·
a great new open-science lab out of Paris, with a solid amount of initial funding! A very strong talent bench. i had the privilege of working with most of them before, and they're awesome researchers and great human beings!
kyutai@kyutai_labs

Our founding team is covering many AI fields from vision, with Patrick Pérez and Hervé Jégou (@hjegou) to LLMs with Edouard Grave (@EXGRV), audio with Neil Zeghidour (@neilzegh) and Alexandre Défossez (@honualx) and infra with Laurent Mazaré (@lmazare).

English
4
9
109
33.5K
anton
anton@abacaj·
Anyone have success fine tuning models with retrieval? Fine tuning the model to answer the question based on the context, trying this for code will see how it goes
English
14
3
61
22K
Edouard Grave
Edouard Grave@EXGRV·
@yoavgo The idea is cute, but I would not take the experimental results too seriously as the baseline numbers seem to be off.
English
0
0
4
686
(((ل()(ل() 'yoav))))👾
so, two cents on the gzip classification thing: apparently the idea has been around in some form of another for a while. but was treated as a curiosity because how inefficient it was compared to all other classification methods. enter BERT.
English
13
10
268
80.5K