Mathieu

96 posts

Mathieu

@Mathieu_Rita

Research Scientist @AIatMeta ex: INRIA-MSR | @CoML_ENS | @Polytechnique Llama3 - RL fine-tuning - Emergent communication

ENS Ulm, Paris Katılım Haziran 2011

277 Takip Edilen229 Takipçiler

Sabitlenmiş Tweet

Mathieu@Mathieu_Rita·25 Eki

I am glad to announce that we will present our last paper: « Emergent communication: Generalization and Overfitting in Lewis Games » has been accepted to #NeurIPS2022 🎷🎷. with C.Tallec, @pmichelX , JB.Grill, O.Pietquin, E.Dupoux & F.Strub. 🖱️arxiv.org/pdf/2209.15342… 🧵1/13

English

Mathieu retweetledi

Language Gamification Workshop @ NeurIPS 2024@MLLanguageGames·20 Oca

🥳 Recording of our workshop is now publicly available at neurips.cc/virtual/2024/w…! We highly recommend the panel discussion, especially the debate on inductive bias for learning 😆

English

237

Mathieu retweetledi

Paul Michel@pmichelX·11 Ara

Interested in working on Gemini pre-training? I'm hiring a research scientist to work on pre-training data @GoogleDeepMind in London: boards.greenhouse.io/deepmind/jobs/… I am unfortunately not at #NeurIPS2024 but feel free to reach out to ask questions or see the team at the booth there!

English

159

27.3K

Mathieu retweetledi

Dr. Limor Raviv 🦄🤗🐘🦒@Limor_Raviv·27 Kas

New paper w/@LukasGalke! We identify several key pressures for language learning and emergence by reviewing 3 mismatches between how humans👥 and deep neural networks🤖(LLMs & emergent communication agents) behave when learning to communicate from scratch: ldr.lps.library.cmu.edu/article/id/748/

English

920

Mathieu retweetledi

Grégoire Mialon@mialon_gregoire·30 Eki

I am hiring an intern in our Llama team for 2025! Near the end of PhD completion, willing to be based out of Paris. You will succeed @MekalaDheeraj, work around frontier LLMs, tool use, agents, and more :) Please apply here: metacareers.com/jobs/109555634…

English

298

42.6K

Mathieu retweetledi

Roberta Raileanu@robertarail·19 Eki

I’m looking for a PhD intern for next year to work at the intersection of LLM-based agents and open-ended learning, part of the Llama Research Team in London. If interested please send me an email with a short paragraph with some research ideas and apply at the link below.

English

100

560

97.9K

Mathieu retweetledi

CoML (Cognitive Machine Learning) | @ENS@CoML_ENS·18 Eki

🚀 Exciting Post-Doc Opportunity! 🚀 Join the CoML team for the new ERC project InfantSimulator ! If you're passionate about language modeling & machine learning, apply now ! @ENS_ULM 📍 Paris 🔗 cognitive-ml.fr/docs/Fiche_pos… #PostDoc #CognitiveScience #LanguageModeling #Job #AI

English

1.8K

Mathieu retweetledi

Rui Hou@magpie_rayhou·18 Eki

Our team, Llama Post-training, is looking to hire 2025 PhD Research Interns to join us at Meta GenAI. If you are interested in working on RL for LLM, Code Generation, Reasoning, and Agents with us, drop me a message with your CV. Link: metacareers.com/jobs/106355302…

English

908

Mathieu retweetledi

Language Gamification Workshop @ NeurIPS 2024@MLLanguageGames·17 Ağu

🔔🚨 [ALERT] Calls for papers! 🚨🔔 Language Gamification Workshop @ NeurIPS 2024 openreview.net/group?id=NeurI… 🤔 Topics: In-Context Learning, Deep Reinforcement Learning, Modern NLP, Multi-Agent Learning, Language Emergence, Embodiment, Cognitive Science... ⏰ Deadline: August 30

English

5.5K

Mathieu retweetledi

Jérémie Kalfon@jkobject·29 Tem

This allows scPRINT zero-shot abilities -meaning no fine-tuning required- such as artificially increasing the depth of the expression profile of a cell (denoising / zero imputation), predicting the cell type, disease, sequencer, and sex of a cell, as well as creating cell embeddings 💪 . But one of the key abilities we dived into is its inference of gene networks. We get inspired by ESM2 to design a way to extract gene networks from pre-trained transformers, which we call Large Cell Models. We extensively validate the gene network inference abilities of scPRINT, scGPT, and GENIE3, with our suite of benchmarking tools called BenGRN and GRnnData: github.com/jkobject/benGRN github.com/cantinilab/GRn… Moreover, we don’t just release the code and the model weigths for scPRINT, but also its pre-training strategies, thanks to our dataloader and LaminDB’s new mapped dataset methods github.com/jkobject/scDat…. lamin.ai/blog/arrayload…. Taken together, the goal of these open source tools is to serve as a bedrock for future Large Cell Models developments. To improve -and possibly debug issues in- these transformer models by interrogating and benchmarking their abilities in a reproducible manner 🌍 👥. We need to understand how the cell works but for that we need to know know what works and what doesn’t. This is my contribution to it. While still somehow a WIP, we have defined an extensive ablation study analysis with scPRINT that allows users to change. Models can be pre-trained on only one GPU for the small and medium size models and “only” 4 to 16 GPUs for the larger sized ones. 🚄 🏔️ The very large model is still undergoing training and testing. I am very happy to start building in public now and eager to see what the community will do with these tools. Do contact me if you would like to collaborate and have a try at the tool! I will provide more updates to the package and publish it on pypi in a week or so. But first.. a couple of days off! 🌴☀️ 🫡 🙏🙏🙏 I would like to thank additional collaborators from laminDB, as well as members of the Cantini Lab and Peyré Lab: @JulesSamaran , @TrimbourR , @gjhuizing , Anna Audit and @wariobrega. But most of all, my 2 great P.I.s: @LauCan88 and @gabrielpeyre 🇫🇷 🎓 💯 🙏 Also, I would like to acknowledge the important pioneering work from Geneformer, UCE, scFoundation and scGPT. Thanks to FlashAttention, pytorch, lightning, and scanpy for their toolkits. Thanks to Omnipath, Scenic+, Openproblems, Replogle et al. and Mc Calla et al. for their ground truths and benchmarking tools (all links and citations are in the paper). 🙏🙏🙏 and thanks to Christina Theodoris (@TheodorisLab), @YanayRosen, @wiatrak_maciej, @Mathieu_Rita, @howmanyernest1, @PauBadiaM, @mo_lotfollahi, @m_e_sander and Felix Fischer for the interesting discussions!!

English

928

Mathieu retweetledi

Yann LeCun@ylecun·23 Tem

💥BOOM 💥 Llama 3.1 is out 💥 405B, 70B, 8B versions. Main takeaways: 1. 405B performance is on par with the best closed models. 2. Open/free weights and code, with a license that enables fine-tuning, distillation into other models, and deployment anywhere. 3. 128k context length, multi-lingual abilities, good code generation performance, complex reasoning abilities, tool use. 4. Llama Stack API for easy integration. 5. Ecosystem with over 25 partners, including AWS, NVIDIA, Databricks, Groq, Dell, Azure, and Google Cloud. Blog post: ai.meta.com/blog/meta-llam… Llama home: llama.meta.com

Ahmad Al-Dahle@Ahmad_Al_Dahle

With today’s launch of our Llama 3.1 collection of models we’re making history with the largest and most capable open source AI model ever released. 128K context length, multilingual support, and new safety tools. Download 405B and our improved 8B & 70B here. llama.meta.com

English

221

912

6.1K

739.8K

Mathieu retweetledi

Arena.ai@arena·9 May

Exciting new blog -- What’s up with Llama-3? Since Llama 3’s release, it has quickly jumped to top of the leaderboard. We dive into our data and answer below questions: - What are users asking? When do users prefer Llama 3? - How challenging are the prompts? - Are certain users or prompts over-represented? - Does Llama 3 have qualitative differences that make users like it? Key Insights: 1. Llama 3 beats top-tier models on open-ended writing and creative problems but loses a bit on close-ended math and coding problems.

English

113

713

175K

Mathieu retweetledi

Thomas Scialom@ThomasScialom·25 Nis

We had a small party to celebrate Llama-3 yesterday in Paris! The entire LLM OSS community joined us with @huggingface, @kyutai_labs, @GoogleDeepMind (Gemma), @cohere As someone said: better that the building remains safe, or ciao the open source for AI 😆

English

226

40.1K

Mathieu retweetledi

Arena.ai@arena·22 Nis

Moreover, we observe even stronger performance in English category, where Llama 3 ranking jumps to ~1st place with GPT-4-Turbo! It consistently performs strong against top models (see win-rate matrix) by human preference. It's been optimized for dialogue scenario with large amount of instruction data in post-training. More analysis still ongoing with topic distribution and agreement study. We also look forward to details in Llama-3's technical report.

English

371

346.8K

Mathieu retweetledi

Arena.ai@arena·22 Nis

Exciting update -- Llama-3 full result is out, now reaching top-5 on the Arena leaderboard🔥 We've got stable enough CIs with over 12K votes. No question now Llama-3 70B is the new king of open model. Its powerful 8B variant has also surpassed many larger-size models. What an incredible launch! Huge congrats to Llama team at @AIatMeta and for such valuable contribution to open community! Can't wait to see the 400B.

English

151

1.1K

1.1M

Mathieu retweetledi

AI at Meta@AIatMeta·18 Nis

Introducing Meta Llama 3: the most capable openly available LLM to date. Today we’re releasing 8B & 70B models that deliver on new capabilities such as improved reasoning and set a new state-of-the-art for models of their sizes. Today's release includes the first two Llama 3 models — in the coming months we expect to introduce new capabilities, longer context windows, additional model sizes and enhanced performance + the Llama 3 research paper for the community to learn from our work. More details ➡️ go.fb.me/i2y41n Download Llama 3 ➡️ go.fb.me/ct2xko

English

339

1.4K

5.7K

1.1M

Mathieu retweetledi

Rui Hou@magpie_rayhou·18 Nis

Excited to release a preview version of Llama3 with superb performance to the community! More to come soon!

Ahmad Al-Dahle@Ahmad_Al_Dahle

It’s here! Meet Llama 3, our latest generation of models that is setting a new standard for state-of-the art performance and efficiency for openly available LLMs. Key highlights • 8B and 70B parameter openly available pre-trained and fine-tuned models. • Trained on more than 15T tokens, 7x+ larger than Llama 2's dataset! • Improved tokenizer with vocabulary of 128K tokens for better performance. • State-of-the-art performance across industry benchmarks. • New capabilities, including enhanced reasoning and coding. • 3x more efficient training than Llama 2. • New trust and safety tools with Llama Guard 2, Code Shield, and CyberSec Eval 2. • Integrated into Meta AI, and available in more countries across our apps. • And, just the beginning with more models and new capabilities coming soon! Visit the Llama 3 website to read more and download the models. llama.meta.com/llama3

English

6.1K

Mathieu retweetledi

fly51fly@fly51fly·19 Mar

[CL] Language Evolution with Deep Learning M Rita, P Michel, R Chaabouni, O Pietquin, E Dupoux, F Strub [INRIA & Google DeepMind] (2024) arxiv.org/abs/2403.11958 - Deep learning is well-suited for simulating communication games and studying language emergence and evolution. - Communication games can be formalized as a multi-agent machine learning problem where agents are represented by deep neural networks. - Communicative agents are designed using functional modules: perception, generation, understanding, and action. Neural networks can be used to model these modules. - Various neural architectures like MLPs, CNNs, RNNs and Transformers can be used to implement the agents' modules depending on the input data type and task. - Optimization techniques like supervised learning and reinforcement learning are used to train the agents to develop a shared communication protocol and solve the game. - The Visual Discrimination Game is a common case study in emergent communication research with neural agents. - Recent work has explored more realistic simulations beyond simple referential games, such as embodied agents in 2D worlds. - Despite successes, current simulations have limitations in realism and the languages that emerge are still far from natural languages.

English

1.3K

Mathieu retweetledi

AK@_akhaliq·9 Şub

Meta presents SpiRit-LM Interleaved Spoken and Written Language Model paper page: huggingface.co/papers/2402.05… introduce SPIRIT-LM, a foundation multimodal language model that freely mixes text and speech. Our model is based on a pretrained text language model that we extend to the speech modality by continuously training it on text and speech units. Speech and text sequences are concatenated as a single set of tokens, and trained with a word-level interleaving method using a small automatically-curated speech-text parallel corpus. SPIRIT-LM comes in two versions: a BASE version that uses speech semantic units and an EXPRESSIVE version that models expressivity using pitch and style units in addition to the semantic units. For both versions, the text is encoded with subword BPE tokens. The resulting model displays both the semantic abilities of text models and the expressive abilities of speech models. Additionally, we demonstrate that SPIRIT-LM is able to learn new tasks in a few-shot fashion across modalities (i.e. ASR, TTS, Speech Classification).

English

121

18.8K

Mathieu retweetledi

Baptiste Rozière@b_roziere·29 Oca

We released a 70B version of CodeLlama today! Trained on 1T tokens, it is a much stronger base model for coding tasks. I look forward to seeing what the community will do with it! :)

AI at Meta@AIatMeta

Today we’re releasing Code Llama 70B: a new, more performant version of our LLM for code generation — available under the same license as previous Code Llama models. Download the models ➡️ bit.ly/3Oil6bQ • CodeLlama-70B • CodeLlama-70B-Python • CodeLlama-70B-Instruct

English

143

28.5K

Mathieu retweetledi

CoML (Cognitive Machine Learning) | @ENS@CoML_ENS·26 Oca

[🚨Recruitment🗣️] CoML team is actively recruiting a Postdoctoral Fellow with expertise in machine learning, linguistics, or cognitive science. More information in the detailed announcement here : cognitive-ml.fr/docs/fiche_pos…

English

7.9K

Keşfet

@GoogleDeepMind @LukasGalke @MekalaDheeraj @ENS_ULM @JulesSamaran @TrimbourR @gjhuizing @wariobrega