mcFarabo
1.2K posts

mcFarabo
@mcFarabo
coding features • creating bugs ⬇️ react / node.js / web3
Los Angeles 🇺🇸 Katılım Aralık 2012
1 Takip Edilen3.6K Takipçiler

The very worst judges – those who repeatedly flout the law – should at least be put to an impeachment vote, whether that vote succeeds or not.
Chuck Grassley@ChuckGrassley
Another day, another judge unilaterally deciding policy for the whole country. This time to benefit foreign gang members If the Supreme Court or Congress doesn’t fix, we’re headed towards a constitutional crisis. Senate Judiciary Cmte taking action
English
mcFarabo retweetledi

A clever showcase of the different ways to use Manus!
Min Choi@minchoi
This is wild. China's Manus AI just dropped and it will completely change the AI agent game. 10 wild examples. 1. Create Interactive Website based on Data Insights
English
mcFarabo retweetledi

Continuing from last week’s post on the rise of the Voice Stack, there’s an area that today’s voice-based systems often struggle with: Voice Activity Detection (VAD) and the turn-taking paradigm of communication.
When communicating with a text-based chatbot, the turns are clear: You write something, then the bot does, then you do, and so on. The success of text-based chatbots with clear turn-taking has influenced the design of voice-based bots, most of which also use the turn-taking paradigm.
A key part of building such a system is a VAD component to detect when the user is talking. This allows our software to take the parts of the audio stream in which the user is saying something and pass that to the model for the user’s turn. It also supports interruption in a limited way, whereby if a user insistently interrupts the AI system while it is talking, eventually the VAD system will realize the user is talking, shut off the AI’s output, and let the user take a turn. This works reasonably well in quiet environments.
However, VAD systems today struggle with noisy environments, particularly when the background noise is from other human speech. For example, if you are in a noisy cafe speaking with a voice chatbot, VAD — which is usually trained to detect human speech — tends to be inaccurate at figuring out when you, or someone else, is talking. (In comparison, it works much better if you are in a noisy vehicle, since the background noise is more clearly not human speech.) It might think you are interrupting when it was merely someone in the background speaking, or fail to recognize that you’ve stopped talking. This is why today’s speech applications often struggle in noisy environments.
Intriguingly, last year, Kyutai Labs published Moshi, a model that had many technical innovations. An important one was enabling persistent bi-direction audio streams from the user to Moshi and from Moshi to the user.
If you and I were speaking in person or on the phone, we would constantly be streaming audio to each other (through the air or the phone system), and we’d use social cues to know when to listen and how to politely interrupt if one of us felt the need. Thus, the streams would not need to explicitly model turn-taking. Moshi works like this. It’s listening all the time, and it’s up to the model to decide when to stay silent and when to talk. This means an explicit VAD step is no longer necessary. (Moshi also included other innovations, such as an “inner monologue” that simultaneously generates text alongside the audio to improve the quality of responses as well as audio encoding.)
Just as the architecture of text-only transformers has gone through many evolutions (such as encoder-decoder models, decoder-only models, and reasoning models that generate a lot of “reasoning tokens” before the final output), voice models are going through a lot of architecture explorations. Given the importance of foundation models with voice-in and voice-out capabilities, many large companies right now are investing in developing better voice models. I’m confident we’ll see many more good voice models released this year.
It feels like the space of potential innovation for voice remains large. Hard technical problems, like the one of latency that I described last week and VAD errors, remain to be solved. As solutions get better, voice-to-voice will continue to be a promising category to build applications in.
[Original text: deeplearning.ai/the-batch/issu… ]
English
mcFarabo retweetledi

I'm teaching a new course! AI Python for Beginners is a series of four short courses that teach anyone to code, regardless of current technical skill. We are offering these courses free for a limited time.
Generative AI is transforming coding. This course teaches coding in a way that’s aligned with where the field is going, rather than where it has been:
(1) AI as a Coding Companion. Experienced coders are using AI to help write snippets of code, debug code, and the like. We embrace this approach and describe best-practices for coding with a chatbot. Throughout the course, you'll have access to an AI chatbot that will be your own coding companion that can assist you every step of the way as you code.
(2) Learning by Building AI Applications. You'll write code that interacts with large language models to quickly create fun applications to customize poems, write recipes, and manage a to-do list. This hands-on approach helps you see how writing code that calls on powerful AI models will make you more effective in your work and personal projects.
With this approach, beginning programmers can learn to do useful things with code far faster than they could have even a year ago.
Knowing a little bit of coding is increasingly helping people in job roles other than software engineers. For example, I've seen a marketing professional write code to download web pages and use generative AI to derive insights; a reporter write code to flag important stories; and an investor automate the initial drafts of contracts.
With this course you’ll be equipped to automate repetitive tasks, analyze data more efficiently, and leverage AI to enhance your productivity.
If you are already an experienced developer, please help me spread the word and encourage your non-developer friends to learn a little bit of coding.
I hope you'll check out the first two short courses here! deeplearning.ai/short-courses/…
English












