Hardik Chauhan

470 posts

Hardik Chauhan banner
Hardik Chauhan

Hardik Chauhan

@backpropagater

Tech Lead - Microsoft Copilot Voice

Seattle, WA Katılım Ekim 2011
617 Takip Edilen109 Takipçiler
Hardik Chauhan
Hardik Chauhan@backpropagater·
@rdesh26 That makes sense. Feels similar to recent RL work on full-duplex spoken dialogue models, where RL is learning an interaction policy (when to speak, wait, yield, repair) rather than just minimizing latency. Reward design here seems really tricky. Ref: openreview.net/forum?id=QbLbX…
English
0
0
1
34
Desh Raj
Desh Raj@rdesh26·
Great question! They don't talk much about their post-training data/recipe (for obvious reasons), but in my experience, it takes both supervised fine-tuning and RL. - SFT data is usually a mix of synthetic conversations (to teach instruction following) and real dialog (to retain naturalness). The goal in this stage is to build a model that can respond in a variety of ways. - The RL stage is great for tuning the behavior on things like interruption handling or prompt adherence. Notably, the turn-taking behavior should not be "forced" (i.e., 180ms response latency is not always the right thing to do), but should be inferred from context. So it's hard to teach it through human conversations alone. Also, real human speech contains a lot of disfluencies that we don't want the model to learn.
English
1
0
0
416
Hardik Chauhan
Hardik Chauhan@backpropagater·
ChatGPT and Gemini voice modes still inherit text-style optimization. They try to say everything in one turn, while voice works better across multiple short turns.
English
0
0
0
51
Hardik Chauhan
Hardik Chauhan@backpropagater·
Starting to share more thoughts on AI, especially voice LLMs. I’ve spent a lot of time thinking about realtime voice models, evals, and post-training, and I want to start writing down the small lessons I’m noticing.
English
1
0
1
59
Yaroslav Bulatov
Yaroslav Bulatov@yaroslavvb·
@ChrSzegedy How would you motivate eigenvalues then? Classical approach always seemed abstract to me. Here the motivation is "you want to compute A^1000, but 1000 matmuls takes too long. Use residue theorem to get an alternative formula, with different comp. complexity"
English
3
0
4
980
Yaroslav Bulatov
Yaroslav Bulatov@yaroslavvb·
Where do eigenvalues come from? They don't have a nice geometric interpretation. For triangular matrices, eigenvalues are just the diagonal entries. Off-diagonal part of the matrix is ignored even though it has an influence on geometry 1/6
Yaroslav Bulatov tweet media
English
28
69
753
498.3K
Hardik Chauhan retweetledi
Saurabh Mishra
Saurabh Mishra@Saurabh_29_·
I will be presenting my poster at ICML 2024! Join me on Tuesday in Hall C 4-9, poster #1316, to discuss my latest research paper titled "From Inverse Optimization to Feasibility to ERM." This work is done in collaboration with @sharan_vaswani and Anant Raj. See you there!
English
4
2
8
671
Hardik Chauhan retweetledi
Sanchit Ahuja
Sanchit Ahuja@SanchitAhuja7·
Is it possible to achieve improvements by LLMs on synthetic multilingual data without affecting the performance on std LLM benchmarks? We take a stab at this problem by proposing sPhinX: Sample Efficient Multilingual Instruction Fine-Tuning Through N-shot Guided Prompting to (1/n
Sanchit Ahuja tweet media
English
2
8
38
6.9K
Hardik Chauhan retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
# CUDA/C++ origins of Deep Learning Fun fact many people might have heard about the ImageNet / AlexNet moment of 2012, and the deep learning revolution it started. en.wikipedia.org/wiki/AlexNet What's maybe a bit less known is that the code backing this winning submission to the contest was written from scratch, manually in CUDA/C++ by Alex Krizhevsky. The repo was called cuda-convnet and it was here on Google Code: code.google.com/archive/p/cuda… I think Google Code was shut down (?), but I found some forks of it on GitHub now, e.g.: github.com/ulrichstern/cu… This was among the first high-profile applications of CUDA for Deep Learning, and it is the scale that doing so afforded that allowed this network to get such a strong performance in the ImageNet benchmark. Actually this was a fairly sophisticated multi-GPU application too, and e.g. included model-parallelism, where the two parallel convolution streams were split across two GPUs. You have to also appreciate that at this time in 2012 (~12 years ago), the majority of deep learning was done in Matlab, on CPU, in toy settings, iterating on all kinds of learning algorithms, architectures and optimization ideas. So it was quite novel and unexpected to see Alex, Ilya and Geoff say: forget all the algorithms work, just take a fairly standard ConvNet, make it very big, train it on a big dataset (ImageNet), and just implement the whole thing in CUDA/C++. And it's in this way that deep learning as a field got a big spark. I recall reading through cuda-convnet around that time like... what is this :S Now of course, there were already hints of a shift in direction towards scaling, e.g. Matlab had its initial support for GPUs, and much of the work in Andrew Ng's lab at Stanford around this time (where I rotated as a 1st year PhD student) was moving in the direction of GPUs for deep learning at scale, among a number of parallel efforts. But I just thought it was amusing, while writing all this C/C++ code and CUDA kernels, that it feels a bit like coming back around to that moment, to something that looks a bit like cuda-convnet.
Andrej Karpathy tweet media
English
160
850
6.9K
1M
Hardik Chauhan retweetledi
Ritu Raut
Ritu Raut@RituRaut3·
@united Wth is happening. My flight got canceled and got automatically rebooked to a random origin and destination ! I'm unable to even change my flight or cancel my flight! Please show some kindness and pick up the customer service calls and resolve this asap!
English
1
1
0
205
Hardik Chauhan retweetledi
Gujarat Titans
Gujarat Titans@gujarat_titans·
0, 1, 1, 1, 6, 4 Mohit Sharma, the final over doesn't define what you've delivered this season. Hold your head high! 💙 #CSKvGT | #TATAIPL | #Final
Gujarat Titans tweet media
English
277
1.2K
24.7K
673.1K
Hardik Chauhan retweetledi
DAN KOE
DAN KOE@thedankoe·
Information is only bad when: - You have too much - You don’t write with it - You don’t build with it The right information, when understood, helps you make better decisions. And that’s all success is, the right sequence of good decisions.
English
49
49
393
0
Hardik Chauhan retweetledi
Pakchikpak Raja Babu
Pakchikpak Raja Babu@HaramiParindey·
When you see your good colleague resign (1/2)
Pakchikpak Raja Babu tweet mediaPakchikpak Raja Babu tweet mediaPakchikpak Raja Babu tweet mediaPakchikpak Raja Babu tweet media
English
55
1K
6.4K
0
Hardik Chauhan retweetledi
Sasha Rush
Sasha Rush@srush_nlp·
You use GPUs everyday, but do you (actually) know how they work? GPU-Puzzles (v0.1) - 14 short puzzles in Python with a visual debugger. No background required. Do puzzles, learn CUDA. Link: github.com/srush/GPU-Puzz…
Sasha Rush tweet mediaSasha Rush tweet mediaSasha Rush tweet mediaSasha Rush tweet media
English
23
475
3.1K
0