mcFarabo

1.2K posts

mcFarabo banner
mcFarabo

mcFarabo

@mcFarabo

coding features • creating bugs ⬇️ react / node.js / web3

Los Angeles 🇺🇸 Katılım Aralık 2012
1 Takip Edilen3.6K Takipçiler
mcFarabo
mcFarabo@mcFarabo·
@elonmusk Have you ever given Elon a high-five in spirit?
mcFarabo tweet media
English
11
0
1
1.9K
mcFarabo
mcFarabo@mcFarabo·
@elonmusk Have you ever praised Elon for this?
mcFarabo tweet media
English
1
0
0
86
mcFarabo
mcFarabo@mcFarabo·
@elonmusk How fly is this? I'm totally into it.
mcFarabo tweet media
English
32
0
0
2.8K
mcFarabo
mcFarabo@mcFarabo·
@elonmusk Someone tosses a scarf, tie it, thanks.
mcFarabo tweet media
English
28
0
1
743
mcFarabo
mcFarabo@mcFarabo·
@elonmusk A friend offers a game, play it, awesome.
mcFarabo tweet media
English
35
0
0
1.7K
mcFarabo
mcFarabo@mcFarabo·
@elonmusk The day drops a feather, keep it, nice.
mcFarabo tweet media
English
41
0
0
4.1K
mcFarabo retweetledi
Andrew Ng
Andrew Ng@AndrewYNg·
Continuing from last week’s post on the rise of the Voice Stack, there’s an area that today’s voice-based systems often struggle with: Voice Activity Detection (VAD) and the turn-taking paradigm of communication. When communicating with a text-based chatbot, the turns are clear: You write something, then the bot does, then you do, and so on. The success of text-based chatbots with clear turn-taking has influenced the design of voice-based bots, most of which also use the turn-taking paradigm. A key part of building such a system is a VAD component to detect when the user is talking. This allows our software to take the parts of the audio stream in which the user is saying something and pass that to the model for the user’s turn. It also supports interruption in a limited way, whereby if a user insistently interrupts the AI system while it is talking, eventually the VAD system will realize the user is talking, shut off the AI’s output, and let the user take a turn. This works reasonably well in quiet environments. However, VAD systems today struggle with noisy environments, particularly when the background noise is from other human speech. For example, if you are in a noisy cafe speaking with a voice chatbot, VAD — which is usually trained to detect human speech — tends to be inaccurate at figuring out when you, or someone else, is talking. (In comparison, it works much better if you are in a noisy vehicle, since the background noise is more clearly not human speech.) It might think you are interrupting when it was merely someone in the background speaking, or fail to recognize that you’ve stopped talking. This is why today’s speech applications often struggle in noisy environments. Intriguingly, last year, Kyutai Labs published Moshi, a model that had many technical innovations. An important one was enabling persistent bi-direction audio streams from the user to Moshi and from Moshi to the user. If you and I were speaking in person or on the phone, we would constantly be streaming audio to each other (through the air or the phone system), and we’d use social cues to know when to listen and how to politely interrupt if one of us felt the need. Thus, the streams would not need to explicitly model turn-taking. Moshi works like this. It’s listening all the time, and it’s up to the model to decide when to stay silent and when to talk. This means an explicit VAD step is no longer necessary. (Moshi also included other innovations, such as an “inner monologue” that simultaneously generates text alongside the audio to improve the quality of responses as well as audio encoding.) Just as the architecture of text-only transformers has gone through many evolutions (such as encoder-decoder models, decoder-only models, and reasoning models that generate a lot of “reasoning tokens” before the final output), voice models are going through a lot of architecture explorations. Given the importance of foundation models with voice-in and voice-out capabilities, many large companies right now are investing in developing better voice models. I’m confident we’ll see many more good voice models released this year. It feels like the space of potential innovation for voice remains large. Hard technical problems, like the one of latency that I described last week and VAD errors, remain to be solved. As solutions get better, voice-to-voice will continue to be a promising category to build applications in. [Original text: deeplearning.ai/the-batch/issu… ]
English
39
78
372
69.8K
mcFarabo retweetledi
Andrew Ng
Andrew Ng@AndrewYNg·
I'm teaching a new course! AI Python for Beginners is a series of four short courses that teach anyone to code, regardless of current technical skill. We are offering these courses free for a limited time. Generative AI is transforming coding. This course teaches coding in a way that’s aligned with where the field is going, rather than where it has been: (1) AI as a Coding Companion. Experienced coders are using AI to help write snippets of code, debug code, and the like. We embrace this approach and describe best-practices for coding with a chatbot. Throughout the course, you'll have access to an AI chatbot that will be your own coding companion that can assist you every step of the way as you code. (2) Learning by Building AI Applications. You'll write code that interacts with large language models to quickly create fun applications to customize poems, write recipes, and manage a to-do list. This hands-on approach helps you see how writing code that calls on powerful AI models will make you more effective in your work and personal projects. With this approach, beginning programmers can learn to do useful things with code far faster than they could have even a year ago. Knowing a little bit of coding is increasingly helping people in job roles other than software engineers. For example, I've seen a marketing professional write code to download web pages and use generative AI to derive insights; a reporter write code to flag important stories; and an investor automate the initial drafts of contracts. With this course you’ll be equipped to automate repetitive tasks, analyze data more efficiently, and leverage AI to enhance your productivity. If you are already an experienced developer, please help me spread the word and encourage your non-developer friends to learn a little bit of coding. I hope you'll check out the first two short courses here! deeplearning.ai/short-courses/…
English
482
1.7K
8.4K
1.2M
mcFarabo
mcFarabo@mcFarabo·
مبارك عليكم الشهر الكريم،، "أتى رمضانُ مَزرعَةُ العِباد ، لتَطْهير القُلوب من الفساد ، فأدِّ حقوقَه قولاً وفعلاً ، وزادك فاتَّخذه إلى الْمعَاد ، فمن زَرع الْحُبوبَ وما سقاها ، تأوَّه نادماً يومَ الْحَصاد (لطائف المعارف : ٣٥٠)"
العربية
0
0
0
85