nullHawk

62 posts

nullHawk

@null_hawk

i make GPUs go brrr... | building voice @rumik_ai

شامل ہوئے Ağustos 2023

124 فالونگ40 فالوورز

nullHawk@null_hawk·26 Mar

Here is something that ma boi has been cookin for a while... it's realy impressive!!!

English

154

nullHawk@null_hawk·25 Mar

@Gauri_the_great is this ASR or speech model?

English

Gauri Tripathi@Gauri_the_great·25 Mar

Do I have a SOTA model ?

Català

874

nullHawk@null_hawk·25 Mar

@AcousimHss

GIF

QME

VioP@AcousimHss·25 Mar

type shi one has to do for data

English

143

nullHawk@null_hawk·25 Mar

@WhispererCode 🤫

QME

CodeWhisperer@WhispererCode·25 Mar

@null_hawk Whats your reward model ?

English

nullHawk@null_hawk·25 Mar

Its been approx 20+ hrs and finally reduced my GRPO runtime from ~12.8 hrs to ~1.5 hrs. used vLLM inference on one GPU and DDP on the other 3, applied pretty much everything I had. using 3 GPUs for DDP, hosted vLLM inference on one GPU, with 8 samples generating parallely, along with codec tokens, grad_accum=4, data sharding across ranks, micro-batching within the loss function, TF32 tensor cores, BF16 autocast for forward pass, Flash SDP + mem-efficient SDP, cuDNN benchmark mode, BF16 reduced precision reduction... works well for Hopper architecture. also turned on gradient checkpointing, manual gradient all-reduce (instead of full DDP wrapping), plus a file-based vLLM watcher that restarts the inference server with fresh merged weights at every optimizer step from a clean process tree (to avoid NCCL conflicts with torchrun), with retry logic on generation calls during server. biggest debugging rabbit holes: vLLM V1 silently ignoring stop_token_ids (had to force V0 with VLLM_USE_V1=0), and merge_and_unload() with tie_word_embeddings=True dropping the trained lm_head during save, model generates infinite codec tokens and never stops. fix: untie before merge so both embed_tokens and lm_head are saved separately. GRPO on TTS is a different beast from text!

English

961

nullHawk@null_hawk·25 Mar

though 50% of the time is spent by reward model

English

128

nullHawk@null_hawk·24 Mar

@cneuralnetwork @smallest_AI i don't know why it didn't worked when i tried it... maybe its upset with me :(

English

neural nets.@cneuralnetwork·24 Mar

okay so lightning v3.1 from @smallest_AI gets hard hindi tongue twisters right

Sudarshan Kamath@kamath_sutra

x.com/i/article/2036…

English

nullHawk@null_hawk·24 Mar

@atulit_gaur Nah bro... I don't believe my college

English

atulit@atulit_gaur·24 Mar

@null_hawk I was supposed to apply 🤣

English

398

atulit@atulit_gaur·24 Mar

anybody else's personal dms look like this? i have obviously not opened these links

English

453

15.7K

nullHawk@null_hawk·24 Mar

just call the "LLM" a "Policy" voila!!! you are an RL guy now

English

101

nullHawk@null_hawk·24 Mar

@SherryYanJiang @amilabs @theworldlabs @gen_intuition @moonlake @physical_int @bifrost_ai @reactorworld @odysseyml @NianticSpatial @Figure_robot @SkildAI @DecartAI @runwayml @wayve_ai maybe add @rumik_ai in that list too 👀

English

Sherry Jiang@SherryYanJiang·24 Mar

here are some cool companies doing stuff in world models and physical ai - anyone im missing?? @amilabs @theworldlabs @gen_intuition @moonlake @physical_int @bifrost_ai @reactorworld @odysseyml @NianticSpatial @Figure_robot @SkildAI @DecartAI @runwayml @wayve_ai @agilityrobotics

English

525

37.5K

nullHawk@null_hawk·24 Mar

@kadirnardev crwazy

English

Kadir Nar@kadirnardev·24 Mar

The loss value of my new omni model is 1.5 👀

English

2.7K

nullHawk@null_hawk·21 Mar

@kadirnardev exactly

English

Kadir Nar@kadirnardev·19 Mar

Why can't teams with big resources achieve good audio quality? The model is nice but the audio quality is low :/

Xiaomi MiMo@XiaomiMiMo

MiMo TTS doesn't just talk — it sounds human. Sobbing. A sudden laugh. A cough. Heavy breathing. A nervous sigh. All woven naturally into speech. 🎧 2/n

English

1.3K

nullHawk ری ٹویٹ کیا

Rohan@lets_dig_deeper·18 Mar

meet priya. and for the next 47 secs listen to her story.

English

4.1K

nullHawk@null_hawk·17 Mar

@varundeepsaini Dayum brooo

English

774

Varun Deep Saini@varundeepsaini·17 Mar

This is either the biggest fumble or the biggest bag

English

1.5K

116.2K

nullHawk@null_hawk·15 Mar

@habibtwts

GIF

QME

habib@habibtwts·14 Mar

eid shopping done

English

nullHawk ری ٹویٹ کیا

VioP@AcousimHss·14 Mar

New blog!! From Air Pressure to Speech: Exploring classical TTS pipeline i walk you through how tts works from capturing sound froom microphone to getting wav files hope you guys enjoy ; ) @VioSIlverP/rkOZITb9Wx?utm_source=profile-share" target="_blank" rel="nofollow noopener">hackmd.io/@VioSIlverP/rk…