Patrick Pérez

38 posts

Patrick Pérez banner
Patrick Pérez

Patrick Pérez

@ptrkprz

AI & CV scientist, CEO at @kyutai_labs

Paris Katılım Aralık 2023
63 Takip Edilen757 Takipçiler
Sabitlenmiş Tweet
Patrick Pérez
Patrick Pérez@ptrkprz·
changing air, entering blue sky, same handle
English
0
0
3
706
Patrick Pérez retweetledi
Alexandre Défossez
Alexandre Défossez@honualx·
I’ll be presenting a deep dive into how Moshi works at the next NLP Meetup in Paris, this Wednesday the 9th at 7pm. Register if you want to attend ! 🧩🔎🟢 meetup.com/fr-FR/paris-nl…
English
5
9
72
11K
Patrick Pérez retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
Moshi is a very nice/fun conversational AI audio 🔊 model release from @kyutai_labs . Are you slowly losing faith in the objective reality and existence of Advanced Voice Mode? Talk to Moshi instead :) You can talk to it on their website: moshi.chat Or even locally on your Apple Silicon Mac with just: $ pip install moshi_mlx $ python -m moshi_mlx.local_web -q 4 I find the Moshi model personality to be very amusing: it is a bit abrupt, it interrupts, it is a bit rude but somehow in a kind of endearing way, it goes off on tangets, it goes silent for no reason sometimes, so it's all a bit confusing but also very funny and meme-worthy. This video "it's just the pressure" / "i just like working on projects" is a good example, soooo funny: x.com/AdrianDittmann… But in any case, it's really cool that I can even run this kind of voice interaction with my Macbook, that the repo is out on GitHub along with a detailed paper, and I certainly look forward to effortlessly talking to our computers in end-to-end ways, without going through intermediate text representations that lose a ton of information content.
kyutai@kyutai_labs

Today, we release several Moshi artifacts: a long technical report with all the details behind our model, weights for Moshi and its Mimi codec, along with streaming inference code in Pytorch, Rust and MLX. More details below 🧵 ⬇️ Paper: kyutai.org/Moshi.pdf Repo: github.com/kyutai-labs/mo… HuggingFace: huggingface.co/kmhf

English
70
319
2.8K
510.2K
Patrick Pérez
Patrick Pérez@ptrkprz·
And our demo runs in the US thanks to a donation from @huggingface
Patrick Pérez@ptrkprz

Thanks @Thom_Wolf Moshi experimental voice AI is indeed a crazy adventure / a radical innovation / a new technology / a surprising experience / a research prototype / a shared resource / a starting point…. not a productized conversational bot.

English
0
0
5
843
Patrick Pérez
Patrick Pérez@ptrkprz·
Thanks @Thom_Wolf Moshi experimental voice AI is indeed a crazy adventure / a radical innovation / a new technology / a surprising experience / a research prototype / a shared resource / a starting point…. not a productized conversational bot.
Thomas Wolf@Thom_Wolf

The @kyutai_labs fully end-to-end audio model demo of today is a huge deal that many people missed in the room Mostly irrelevant are the facts that: - they come a few week after OpenAI ChatGPT-4o - the demo was less polished than the 4o one (in terms of voice quality, voice timing…) Relevant: - the model training pipeline and model archi are simple and hugely scalable, with a tiny 8+ people team like Kyutai building it in 4 months. Synthetic data is a huge enabler here - laser focus on local devices: Moshi will soon be everywhere. Frontier model builders have low incentive to let you run smaller models locally (price per token…) but non-profits like Kyutai have very different incentives. The Moshi demo is already online while the OpenAI 4o one is still in limbo. - going under 300 ms of latency while keeping Llama 8B or above quality of answers is a key enabler in terms of interactivity, it’s game changing, This feeling when the model answer your question before you even finished asking is quite crazy or when you interrupt the model while it’s talking and it react… Predictive coding in a model, instantly updated model of what you’re about to say... Basically they nailed the fundamentals. It’s here. This interactive voice tech will be everywhere. It will soon be an obvious commodity.

English
0
1
9
1.4K
Patrick Pérez
Patrick Pérez@ptrkprz·
It feels so good to have shared at last what we have been up to in the past 6 months. We worked hard on this unique voice AI, carefully training it on a mix of text and speech, making it multi-stream and real-time, and putting it in an online demo for everyone to experience it.
kyutai@kyutai_labs

Yesterday we introduced Moshi, the lowest latency conversational AI ever released. Moshi can perform small talk, explain various concepts, engage in roleplay in many emotions and speaking styles. Talk to Moshi here moshi.chat/?queue_id=talk… and learn more about the method below 🧵.

English
4
5
55
4.2K
Patrick Pérez retweetledi
Amir Zamir
Amir Zamir@zamir_ar·
We are releasing 4M-21 with a permissive license, including its source code and trained models. It's a pretty effective multimodal model that solves 10s of tasks & modalities. See the demo code, sample results, and the tokenizers of diverse modalities on the website. IMO, the multitask learning aspect of multimodal models has really taken a step forward. We can train a single model on many diverse tasks with ~SOTA accuracy. But a long way to go in terms of transfer/emergence. 🌐 4m.epfl.ch ⌨️ github.com/apple/ml-4m/ Joint work w/ @EPFL_en @Apple.
Amir Zamir@zamir_ar

We are releasing the 1st version of 4M, a framework for training multimodal foundation models across tens of modalities & tasks, based on scalable masked modeling. Joint effort by @EPFL_en & @Apple. 4M: Massively Multimodal Masked Modeling 🌐4m.epfl.ch 🧵1/n

English
7
93
352
67.7K
Patrick Pérez retweetledi
valeo.ai
valeo.ai@valeoai·
📢We introduce the ScaLR models (code+checkpoints) for LiDAR perception distilled from vision foundation models tl;dr: don’t neglect the choice of teacher, student, and pretraining datasets -> their impact is probably more important than the distillation method #CVPR2024 🧵 [1/8]
valeo.ai tweet media
English
1
11
32
9K
Patrick Pérez retweetledi
F. Güney
F. Güney@ftm_guney·
we’ve got multiple PhD and postdoc positions funded by my #ERCstg project ENSURE. if you’re interested in computer vision and self-driving, please consider applying. graduate students: apply ASAP! details at gsse.ku.edu.tr postdocs: send me an email with your CV and brief research interests at fguney@ku.edu.tr we offer competitive (euro-based) scholarships and salaries for Turkey standards.
English
7
28
104
38.2K
Patrick Pérez retweetledi
Ian Hogarth
Ian Hogarth@soundboy·
1/ Today the UK's AI Safety Institute is open sourcing our safety evaluations platform. We call it "Inspect": gov.uk/government/new…
English
7
79
288
77.5K