Mathieu

96 posts

Mathieu banner
Mathieu

Mathieu

@Mathieu_Rita

Research Scientist @AIatMeta ex: INRIA-MSR | @CoML_ENS | @Polytechnique Llama3 - RL fine-tuning - Emergent communication

ENS Ulm, Paris Katılım Haziran 2011
277 Takip Edilen229 Takipçiler
Sabitlenmiş Tweet
Mathieu
Mathieu@Mathieu_Rita·
I am glad to announce that we will present our last paper: « Emergent communication: Generalization and Overfitting in Lewis Games » has been accepted to #NeurIPS2022 🎷🎷. with C.Tallec, @pmichelX , JB.Grill, O.Pietquin, E.Dupoux & F.Strub. 🖱️arxiv.org/pdf/2209.15342… 🧵1/13
Mathieu tweet media
English
3
16
70
0
Mathieu retweetledi
Grégoire Mialon
Grégoire Mialon@mialon_gregoire·
I am hiring an intern in our Llama team for 2025! Near the end of PhD completion, willing to be based out of Paris. You will succeed @MekalaDheeraj, work around frontier LLMs, tool use, agents, and more :) Please apply here: metacareers.com/jobs/109555634…
English
5
40
298
42.6K
Mathieu retweetledi
Roberta Raileanu
Roberta Raileanu@robertarail·
I’m looking for a PhD intern for next year to work at the intersection of LLM-based agents and open-ended learning, part of the Llama Research Team in London. If interested please send me an email with a short paragraph with some research ideas and apply at the link below.
English
9
100
560
97.9K
Mathieu retweetledi
Rui Hou
Rui Hou@magpie_rayhou·
Our team, Llama Post-training, is looking to hire 2025 PhD Research Interns to join us at Meta GenAI. If you are interested in working on RL for LLM, Code Generation, Reasoning, and Agents with us, drop me a message with your CV. Link: metacareers.com/jobs/106355302…
English
0
2
10
908
Mathieu retweetledi
Language Gamification Workshop @ NeurIPS 2024
🔔🚨 [ALERT] Calls for papers! 🚨🔔 Language Gamification Workshop @ NeurIPS 2024 openreview.net/group?id=NeurI… 🤔 Topics: In-Context Learning, Deep Reinforcement Learning, Modern NLP, Multi-Agent Learning, Language Emergence, Embodiment, Cognitive Science... ⏰ Deadline: August 30
English
3
7
14
5.5K
Mathieu retweetledi
Jérémie Kalfon
Jérémie Kalfon@jkobject·
This allows scPRINT zero-shot abilities -meaning no fine-tuning required- such as artificially increasing the depth of the expression profile of a cell (denoising / zero imputation), predicting the cell type, disease, sequencer, and sex of a cell, as well as creating cell embeddings 💪 . But one of the key abilities we dived into is its inference of gene networks. We get inspired by ESM2 to design a way to extract gene networks from pre-trained transformers, which we call Large Cell Models. We extensively validate the gene network inference abilities of scPRINT, scGPT, and GENIE3, with our suite of benchmarking tools called BenGRN and GRnnData: github.com/jkobject/benGRN github.com/cantinilab/GRn… Moreover, we don’t just release the code and the model weigths for scPRINT, but also its pre-training strategies, thanks to our dataloader and LaminDB’s new mapped dataset methods github.com/jkobject/scDat…. lamin.ai/blog/arrayload…. Taken together, the goal of these open source tools is to serve as a bedrock for future Large Cell Models developments. To improve -and possibly debug issues in- these transformer models by interrogating and benchmarking their abilities in a reproducible manner 🌍 👥. We need to understand how the cell works but for that we need to know know what works and what doesn’t. This is my contribution to it. While still somehow a WIP, we have defined an extensive ablation study analysis with scPRINT that allows users to change. Models can be pre-trained on only one GPU for the small and medium size models and “only” 4 to 16 GPUs for the larger sized ones. 🚄 🏔️ The very large model is still undergoing training and testing. I am very happy to start building in public now and eager to see what the community will do with these tools. Do contact me if you would like to collaborate and have a try at the tool! I will provide more updates to the package and publish it on pypi in a week or so. But first.. a couple of days off! 🌴☀️ 🫡 🙏🙏🙏 I would like to thank additional collaborators from laminDB, as well as members of the Cantini Lab and Peyré Lab: @JulesSamaran , @TrimbourR , @gjhuizing , Anna Audit and @wariobrega. But most of all, my 2 great P.I.s: @LauCan88 and @gabrielpeyre 🇫🇷 🎓 💯 🙏 Also, I would like to acknowledge the important pioneering work from Geneformer, UCE, scFoundation and scGPT. Thanks to FlashAttention, pytorch, lightning, and scanpy for their toolkits. Thanks to Omnipath, Scenic+, Openproblems, Replogle et al. and Mc Calla et al. for their ground truths and benchmarking tools (all links and citations are in the paper). 🙏🙏🙏 and thanks to Christina Theodoris (@TheodorisLab), @YanayRosen, @wiatrak_maciej, @Mathieu_Rita, @howmanyernest1, @PauBadiaM, @mo_lotfollahi, @m_e_sander and Felix Fischer for the interesting discussions!!
Jérémie Kalfon tweet mediaJérémie Kalfon tweet media
English
1
2
10
928
Mathieu retweetledi
Yann LeCun
Yann LeCun@ylecun·
💥BOOM 💥 Llama 3.1 is out 💥 405B, 70B, 8B versions. Main takeaways: 1. 405B performance is on par with the best closed models. 2. Open/free weights and code, with a license that enables fine-tuning, distillation into other models, and deployment anywhere. 3. 128k context length, multi-lingual abilities, good code generation performance, complex reasoning abilities, tool use. 4. Llama Stack API for easy integration. 5. Ecosystem with over 25 partners, including AWS, NVIDIA, Databricks, Groq, Dell, Azure, and Google Cloud. Blog post: ai.meta.com/blog/meta-llam… Llama home: llama.meta.com
Ahmad Al-Dahle@Ahmad_Al_Dahle

With today’s launch of our Llama 3.1 collection of models we’re making history with the largest and most capable open source AI model ever released. 128K context length, multilingual support, and new safety tools. Download 405B and our improved 8B & 70B here. llama.meta.com

English
221
912
6.1K
739.8K
Mathieu retweetledi
Arena.ai
Arena.ai@arena·
Exciting new blog -- What’s up with Llama-3? Since Llama 3’s release, it has quickly jumped to top of the leaderboard. We dive into our data and answer below questions: - What are users asking? When do users prefer Llama 3? - How challenging are the prompts? - Are certain users or prompts over-represented? - Does Llama 3 have qualitative differences that make users like it? Key Insights: 1. Llama 3 beats top-tier models on open-ended writing and creative problems but loses a bit on close-ended math and coding problems.
Arena.ai tweet media
English
14
113
713
175K
Mathieu retweetledi
Thomas Scialom
Thomas Scialom@ThomasScialom·
We had a small party to celebrate Llama-3 yesterday in Paris! The entire LLM OSS community joined us with @huggingface, @kyutai_labs, @GoogleDeepMind (Gemma), @cohere As someone said: better that the building remains safe, or ciao the open source for AI 😆
Thomas Scialom tweet media
English
14
10
226
40.1K
Mathieu retweetledi
Arena.ai
Arena.ai@arena·
Moreover, we observe even stronger performance in English category, where Llama 3 ranking jumps to ~1st place with GPT-4-Turbo! It consistently performs strong against top models (see win-rate matrix) by human preference. It's been optimized for dialogue scenario with large amount of instruction data in post-training. More analysis still ongoing with topic distribution and agreement study. We also look forward to details in Llama-3's technical report.
Arena.ai tweet mediaArena.ai tweet media
English
11
39
371
346.8K
Mathieu retweetledi
Arena.ai
Arena.ai@arena·
Exciting update -- Llama-3 full result is out, now reaching top-5 on the Arena leaderboard🔥 We've got stable enough CIs with over 12K votes. No question now Llama-3 70B is the new king of open model. Its powerful 8B variant has also surpassed many larger-size models. What an incredible launch! Huge congrats to Llama team at @AIatMeta and for such valuable contribution to open community! Can't wait to see the 400B.
Arena.ai tweet media
English
30
151
1.1K
1.1M
Mathieu retweetledi
AI at Meta
AI at Meta@AIatMeta·
Introducing Meta Llama 3: the most capable openly available LLM to date. Today we’re releasing 8B & 70B models that deliver on new capabilities such as improved reasoning and set a new state-of-the-art for models of their sizes. Today's release includes the first two Llama 3 models — in the coming months we expect to introduce new capabilities, longer context windows, additional model sizes and enhanced performance + the Llama 3 research paper for the community to learn from our work. More details ➡️ go.fb.me/i2y41n Download Llama 3 ➡️ go.fb.me/ct2xko
English
339
1.4K
5.7K
1.1M
Mathieu retweetledi
Mathieu retweetledi
fly51fly
fly51fly@fly51fly·
[CL] Language Evolution with Deep Learning M Rita, P Michel, R Chaabouni, O Pietquin, E Dupoux, F Strub [INRIA & Google DeepMind] (2024) arxiv.org/abs/2403.11958 - Deep learning is well-suited for simulating communication games and studying language emergence and evolution. - Communication games can be formalized as a multi-agent machine learning problem where agents are represented by deep neural networks. - Communicative agents are designed using functional modules: perception, generation, understanding, and action. Neural networks can be used to model these modules. - Various neural architectures like MLPs, CNNs, RNNs and Transformers can be used to implement the agents' modules depending on the input data type and task. - Optimization techniques like supervised learning and reinforcement learning are used to train the agents to develop a shared communication protocol and solve the game. - The Visual Discrimination Game is a common case study in emergent communication research with neural agents. - Recent work has explored more realistic simulations beyond simple referential games, such as embodied agents in 2D worlds. - Despite successes, current simulations have limitations in realism and the languages that emerge are still far from natural languages.
fly51fly tweet mediafly51fly tweet mediafly51fly tweet mediafly51fly tweet media
English
0
9
16
1.3K
Mathieu retweetledi
AK
AK@_akhaliq·
Meta presents SpiRit-LM Interleaved Spoken and Written Language Model paper page: huggingface.co/papers/2402.05… introduce SPIRIT-LM, a foundation multimodal language model that freely mixes text and speech. Our model is based on a pretrained text language model that we extend to the speech modality by continuously training it on text and speech units. Speech and text sequences are concatenated as a single set of tokens, and trained with a word-level interleaving method using a small automatically-curated speech-text parallel corpus. SPIRIT-LM comes in two versions: a BASE version that uses speech semantic units and an EXPRESSIVE version that models expressivity using pitch and style units in addition to the semantic units. For both versions, the text is encoded with subword BPE tokens. The resulting model displays both the semantic abilities of text models and the expressive abilities of speech models. Additionally, we demonstrate that SPIRIT-LM is able to learn new tasks in a few-shot fashion across modalities (i.e. ASR, TTS, Speech Classification).
AK tweet media
English
1
27
121
18.8K
Mathieu retweetledi
Baptiste Rozière
Baptiste Rozière@b_roziere·
We released a 70B version of CodeLlama today! Trained on 1T tokens, it is a much stronger base model for coding tasks. I look forward to seeing what the community will do with it! :)
AI at Meta@AIatMeta

Today we’re releasing Code Llama 70B: a new, more performant version of our LLM for code generation — available under the same license as previous Code Llama models. Download the models ➡️ bit.ly/3Oil6bQ • CodeLlama-70B • CodeLlama-70B-Python • CodeLlama-70B-Instruct

English
3
31
143
28.5K