Hardik Chauhan

470 posts

Hardik Chauhan

@backpropagater

Tech Lead - Microsoft Copilot Voice

Seattle, WA Katılım Ekim 2011

617 Takip Edilen109 Takipçiler

Hardik Chauhan@backpropagater·4d

@rdesh26 That makes sense. Feels similar to recent RL work on full-duplex spoken dialogue models, where RL is learning an interaction policy (when to speak, wait, yield, repair) rather than just minimizing latency. Reward design here seems really tricky. Ref: openreview.net/forum?id=QbLbX…

English

Desh Raj@rdesh26·4d

Great question! They don't talk much about their post-training data/recipe (for obvious reasons), but in my experience, it takes both supervised fine-tuning and RL. - SFT data is usually a mix of synthetic conversations (to teach instruction following) and real dialog (to retain naturalness). The goal in this stage is to build a model that can respond in a variety of ways. - The RL stage is great for tuning the behavior on things like interruption handling or prompt adherence. Notably, the turn-taking behavior should not be "forced" (i.e., 180ms response latency is not always the right thing to do), but should be inferred from context. So it's hard to teach it through human conversations alone. Also, real human speech contains a lot of disfluencies that we don't want the model to learn.

English

416

Desh Raj@rdesh26·4d

x.com/i/article/2054…

ZXX

169

35.3K

Hardik Chauhan@backpropagater·26 Nis

ChatGPT and Gemini voice modes still inherit text-style optimization. They try to say everything in one turn, while voice works better across multiple short turns.

English

Hardik Chauhan@backpropagater·26 Nis

Starting to share more thoughts on AI, especially voice LLMs. I’ve spent a lot of time thinking about realtime voice models, evals, and post-training, and I want to start writing down the small lessons I’m noticing.

English

Hardik Chauhan@backpropagater·10 Ağu

@yaroslavvb @ChrSzegedy youtu.be/PFDu9oVAE-g?si…

YouTube

QME

Yaroslav Bulatov@yaroslavvb·10 Ağu

@ChrSzegedy How would you motivate eigenvalues then? Classical approach always seemed abstract to me. Here the motivation is "you want to compute A^1000, but 1000 matmuls takes too long. Use residue theorem to get an alternative formula, with different comp. complexity"

English

980

Yaroslav Bulatov@yaroslavvb·10 Ağu

Where do eigenvalues come from? They don't have a nice geometric interpretation. For triangular matrices, eigenvalues are just the diagonal entries. Off-diagonal part of the matrix is ignored even though it has an influence on geometry 1/6

English

753

498.3K

Hardik Chauhan retweetledi

Saurabh Mishra@Saurabh_29_·22 Tem

I will be presenting my poster at ICML 2024! Join me on Tuesday in Hall C 4-9, poster #1316, to discuss my latest research paper titled "From Inverse Optimization to Feasibility to ERM." This work is done in collaboration with @sharan_vaswani and Anant Raj. See you there!

English

671

Hardik Chauhan retweetledi

Sanchit Ahuja@SanchitAhuja7·16 Tem

Is it possible to achieve improvements by LLMs on synthetic multilingual data without affecting the performance on std LLM benchmarks? We take a stab at this problem by proposing sPhinX: Sample Efficient Multilingual Instruction Fine-Tuning Through N-shot Guided Prompting to (1/n

English

6.9K

Hardik Chauhan retweetledi

Andrej Karpathy@karpathy·4 May

# CUDA/C++ origins of Deep Learning Fun fact many people might have heard about the ImageNet / AlexNet moment of 2012, and the deep learning revolution it started. en.wikipedia.org/wiki/AlexNet What's maybe a bit less known is that the code backing this winning submission to the contest was written from scratch, manually in CUDA/C++ by Alex Krizhevsky. The repo was called cuda-convnet and it was here on Google Code: code.google.com/archive/p/cuda… I think Google Code was shut down (?), but I found some forks of it on GitHub now, e.g.: github.com/ulrichstern/cu… This was among the first high-profile applications of CUDA for Deep Learning, and it is the scale that doing so afforded that allowed this network to get such a strong performance in the ImageNet benchmark. Actually this was a fairly sophisticated multi-GPU application too, and e.g. included model-parallelism, where the two parallel convolution streams were split across two GPUs. You have to also appreciate that at this time in 2012 (~12 years ago), the majority of deep learning was done in Matlab, on CPU, in toy settings, iterating on all kinds of learning algorithms, architectures and optimization ideas. So it was quite novel and unexpected to see Alex, Ilya and Geoff say: forget all the algorithms work, just take a fairly standard ConvNet, make it very big, train it on a big dataset (ImageNet), and just implement the whole thing in CUDA/C++. And it's in this way that deep learning as a field got a big spark. I recall reading through cuda-convnet around that time like... what is this :S Now of course, there were already hints of a shift in direction towards scaling, e.g. Matlab had its initial support for GPUs, and much of the work in Andrew Ng's lab at Stanford around this time (where I rotated as a 1st year PhD student) was moving in the direction of GPUs for deep learning at scale, among a number of parallel efforts. But I just thought it was amusing, while writing all this C/C++ code and CUDA kernels, that it feels a bit like coming back around to that moment, to something that looks a bit like cuda-convnet.

English

160

850

6.9K

Hardik Chauhan retweetledi

Ritu Raut@RituRaut3·29 Haz

@united Wth is happening. My flight got canceled and got automatically rebooked to a random origin and destination ! I'm unable to even change my flight or cancel my flight! Please show some kindness and pick up the customer service calls and resolve this asap!

English

205

Hardik Chauhan retweetledi

Gujarat Titans@gujarat_titans·29 May

0, 1, 1, 1, 6, 4 Mohit Sharma, the final over doesn't define what you've delivered this season. Hold your head high! 💙 #CSKvGT | #TATAIPL | #Final

English

277

1.2K

24.7K

673.1K

Hardik Chauhan retweetledi

Satya Nadella@satyanadella·7 Şub

Bing and Edge + AI: a new way to search starts today blogs.microsoft.com/blog/2023/02/0…

English

622

3.9K

19.2K

4.2M

Hardik Chauhan retweetledi

DAN KOE@thedankoe·24 Kas

Information is only bad when: - You have too much - You don’t write with it - You don’t build with it The right information, when understood, helps you make better decisions. And that’s all success is, the right sequence of good decisions.

English

393

Hardik Chauhan retweetledi

Yann LeCun@ylecun·6 Ağu

Hahahaha! ROFL

unleashed@postrat_dril

Polski

489

Hardik Chauhan retweetledi

Pakchikpak Raja Babu@HaramiParindey·10 Ağu

When you see your good colleague resign (1/2)

English

6.4K

Hardik Chauhan retweetledi

Sasha Rush@srush_nlp·12 Tem

You use GPUs everyday, but do you (actually) know how they work? GPU-Puzzles (v0.1) - 14 short puzzles in Python with a visual debugger. No background required. Do puzzles, learn CUDA. Link: github.com/srush/GPU-Puzz…