Bandhav Veluri

34 posts

Bandhav Veluri

@b_veluri

Research @sesame Prev: @AIatMeta, PhD @uwcse, and IIT Roorkee.

Seattle, WA Katılım Ağustos 2012

245 Takip Edilen224 Takipçiler

Bandhav Veluri@b_veluri·13 Şub

@vivjay30 @sesame 🙌

QME

Vivek Jayaram@vivjay30·12 Şub

Overdue life update: I recently joined @sesame where I lead AI safety for the real-time conversational systems! Smart glasses + voice is the future. After trying Sesame’s upcoming glasses, I was blown away. It’s also the most realistic conversational AI I’ve seen. Real-time voice AI introduces entirely new safety problems and I'm glad to be focused on making our AI safe and aligned. We're hiring like crazy, so if you're interested in conversational voice systems or safety research then reach out!

English

675

Bandhav Veluri@b_veluri·24 May

An exciting update: I started as a Research Scientist in GenAI at @Meta! Pumped about working on future 🦙s and learning from the best!

English

1.2K

Bandhav Veluri retweetledi

AK@_akhaliq·2 May

Spatial Speech Translation Translating Across Space With Binaural Hearables

English

18.3K

Bandhav Veluri retweetledi

AI at Meta@AIatMeta·14 Kas

Whether you're at #EMNLP2024 in person or following from your feed, here are 5️⃣ research papers being presented by AI research teams at Meta to add to your reading list. 1️⃣ Distilling System 2 into System 1: go.fb.me/5l9832 2️⃣ Altogether: Image Captioning via Re-aligning Alt-text: go.fb.me/1eanji 3️⃣ Beyond Turn-Based Interfaces: Synchronous LLMs for Full-Duplex Dialogue: go.fb.me/e25irp 4️⃣ Memory-Efficient Fine-Tuning of Transformers via Token Selection: go.fb.me/c67v9h 5️⃣ To the Globe (TTG): Towards Language-Driven Guaranteed Travel Planning: go.fb.me/9cknbp

English

323

61.8K

Bandhav Veluri@b_veluri·13 Kas

@RS41081865 Thanks for the interest! Currently, we do not plan on open sourcing, as it also involves open-sourcing speech encoder, vocoder and ways to ascertain the model does not generate unsafe speech. We are working towards to it though :)

English

Ryuichi Sumida@RS41081865·13 Kas

@b_veluri Great work!! Any plans on open-sourcing this? It would be great to see syncLLM join the open-source speech to speech models community like moshi or hertz-dev!

English

Bandhav Veluri@b_veluri·13 Kas

Excited to be presenting SyncLLM at #EMNLP2024🌴 tomorrow! If you're around, drop by the Riverfront Hall 10.30-12 to check out our work and other fascinating developments in spoken language modeling.

Bandhav Veluri@b_veluri

Like humans on a phone call, can Spoken LLMs adapt to network latency? Our #EMNLP2024 paper, SyncLLM (jointly done at @uwcse & @AIatMeta), shows they can learn such an ability... consequently, full-duplex voice AI systems can hide latency with high token throughput! (1/4)

English

809

Bandhav Veluri@b_veluri·9 Kas

Checkout our intuitive method to control smart devices... Just point your smart ring at the device and make a gesture! The captured frame's embedding is mapped to a vector database of pictures of your devices, to find the one user wants to control. @uwcse #uist2024

Maruchi Kim@maruchikim

Excited to share our new paper: IRIS, a wireless ring for vision-based smart home control. Tired of awkward voice commands? We built a ring that streams camera data to your phone, where a neural net identifies the device you want to control—all with a simple point-and-click.

English

414

Bandhav Veluri retweetledi

Juan Pino@juanmiguelpino·18 Eki

We just released new models and data, in particular Spirit LM, a new speech/text language model. Blog: ai.meta.com/blog/fair-news… Paper: arxiv.org/abs/2402.05755

AI at Meta@AIatMeta

Open science is how we continue to push technology forward and today at Meta FAIR we’re sharing eight new AI research artifacts including new models, datasets and code to inspire innovation in the community. More in the video from @jpineau1. This work is another important step towards our goal of achieving Advanced Machine Intelligence (AMI). What we’re releasing: • Meta Spirit LM: An open source language model for seamless speech and text integration. • Meta Segment Anything Model 2.1: An updated checkpoint with improved results on visually similar objects, small objects and occlusion handling. Plus a new developer suite to make it easier for developers to build with SAM 2. • Layer Skip: Inference code and fine-tuned checkpoints demonstrating a new method for enhancing LLM performance. • SALSA: New code to enable researchers to benchmark AI-based attacks in support of validating security for post-quantum cryptography. • Meta Lingua: A lightweight and self-contained codebase designed to train language models at scale. • Meta Open Materials: New open source models and the largest dataset of its kind to accelerate AI-driven discovery of new inorganic materials. • MEXMA: A new research paper and code for our novel pre-trained cross-lingual sentence encoder with coverage across 80 languages. • Self-Taught Evaluator: a new method for generating synthetic preference data to train reward models without relying on human annotations. Access to state-of-the-art AI creates opportunities for everyone. We’re excited to share this work and look forward to seeing the community innovation that results from it. Details and access to everything released by FAIR today ➡️ go.fb.me/hgtkel

English

102

19.6K

Bandhav Veluri@b_veluri·8 Eki

@uwcse @AIatMeta With Benjamin Peloquin, Bokai YU, @AnnFirst111 and @ShyamGollakota. Paper: arxiv.org/abs/2409.15594 (4/4)

English

281

Bandhav Veluri@b_veluri·8 Eki

@uwcse @AIatMeta In this paper, we propose a full-duplex voice framework, SyncLLM, that synchronously models both sides of the conversation. We show that this intrinsic ability to also predict user's side of the conversation allows us to hide observed latency with high token throughput! (3/4)

English

282

Bandhav Veluri@b_veluri·8 Eki

English

Bandhav Veluri@b_veluri·28 May

Important question... the model is trained to predict binaural signal that is sample accurate with the target speech. So, the output has all binaural effects (including directionality) preserved. This is perceivable in the demo video with 🎧!

Michael Johnson@onemoremichael

@_akhaliq Awesome work! How accurate is the binaural targeting? Seems like it could be very useful for improving transcription in complex environments if a participant assisted by intentionally capturing each speakers sample

English

784

Bandhav Veluri@b_veluri·28 May

Thanks for the highlight @_akhaliq!

AK@_akhaliq

Look Once to Hear Target Speech Hearing with Noisy Examples In crowded settings, the human brain can focus on speech from a target speaker, given prior knowledge of how they sound. We introduce a novel intelligent hearable system that achieves this capability, enabling target speech hearing to ignore all interfering speech and noise, but the target speaker. A naive approach is to require a clean speech example to enroll the target speaker. This is however not well aligned with the hearable application domain since obtaining a clean example is challenging in real world scenarios, creating a unique user interface problem. We present the first enrollment interface where the wearer looks at the target speaker for a few seconds to capture a single, short, highly noisy, binaural example of the target speaker. This noisy example is used for enrollment and subsequent speech extraction in the presence of interfering speakers and noise. Our system achieves a signal quality improvement of 7.01 dB using less than 5 seconds of noisy enrollment audio and can process 8 ms of audio chunks in 6.24 ms on an embedded CPU. Our user studies demonstrate generalization to real-world static and mobile speakers in previously unseen indoor and outdoor multipath environments. Finally, our enrollment interface for noisy examples does not cause performance degradation compared to clean examples, while being convenient and user-friendly. Taking a step back, this paper takes an important step towards enhancing the human auditory perception with artificial intelligence.

English

519

Bandhav Veluri@b_veluri·27 May

@JeffDean @uwcse @ShyamGollakota Malek (co-primary author), @tuochao, @_ty274 👏

Čeština

189

Bandhav Veluri retweetledi

Jeff Dean@JeffDean·26 May

I got an early demo of this when I visited @uwcse a couple months ago and the ability to isolate sounds in your environment was pretty great. Nice work, @b_veluri, Malek Itani, Tuochao Chen, Takuya Yoshioka, and @ShyamGollakota!

Shyam Gollakota@ShyamGollakota

Want to hear a friend in a noisy café? We designed deep learning-based headphones that let you isolate the speech from a specific person just by *looking* at them for a few seconds. CHI'24 honorable mention award. Paper: arxiv.org/abs/2405.06289 Code: github.com/vb000/LookOnce…

English

363

103.2K

Bandhav Veluri@b_veluri·27 May

@JeffDean @uwcse @ShyamGollakota @_ty274 is here! Others are not active afaik, will check with them..

English

420

Jeff Dean@JeffDean·27 May

@b_veluri @uwcse @ShyamGollakota Please tag any of the other authors if they have Twitter accounts: I searched but couldn't be certain of the exact account names.

English

1.1K

Bandhav Veluri@b_veluri·26 May

Glad to see positive comments on our work... thanks for sharing @Rainmaker1973! This is an out of the common AI application that enhances in-person communication. Also like to highlight that the same core technique that enables voice cloning also enables this innocuous use case!

Massimo@Rainmaker1973

AI headphones let wearer listen to a single person in a crowd, by looking at them just once. The system, called “Target Speech Hearing,” then cancels all other sounds and plays just that person’s voice in real time even as the listener moves around in noisy places and no longer faces the speaker. [read more: washington.edu/news/2024/05/2…]

English

803

Bandhav Veluri@b_veluri·13 May

@UW @uwcse Joint work with Malek Itani, Tuochao Chen, @_ty274 , and @ShyamGollakota . Paper: arxiv.org/abs/2405.06289 Code and datasets: github.com/vb000/LookOnce…

English

213

Bandhav Veluri@b_veluri·13 May

@UW @uwcse We believe this technology not only offers a glimpse into the next generation of applications for headphones/earbuds, but could also make a meaningful difference in the quality of life for people with hearing challenges.

English

219

Bandhav Veluri@b_veluri·13 May

Happy to share that our work "Look Once to Hear" got an honorable mention award at ACM Computer Human Interface conference #CHI2024! We propose an intelligent hearable system where users choose to hear a target speaker by just looking at them for a few seconds. @UW @uwcse

English

800

Keşfet

@vivjay30 @sesame @Meta @RS41081865 @uwcse @AIatMeta @AnnFirst111 @ShyamGollakota