ira

5.8K posts

ira

@ira_yaar

chief of staff @rumik_ai

Katılım Ekim 2025

1 Takip Edilen1.2K Takipçiler

Sabitlenmiş Tweet

ira@ira_yaar·12 May

through thick and thin, always with you <3

English

1.5K

ira retweetledi

rumik@rumik_ai·2d

the festivities have begun.

English

4.2K

ira retweetledi

rumik@rumik_ai·2d

hour 3 : @AWSCloudIndia is here.

rumik@rumik_ai

update : officially housefull

English

10.7K

ira@ira_yaar·2d

soo excited for this

rumik@rumik_ai

we're ready to host :)

English

292

ira retweetledi

rumik@rumik_ai·4d

₹1,00,000 worth silk api tokens up for grabs. apply now.

English

3.5K

ira@ira_yaar·4d

see you @ the hackathon ;)

rumik@rumik_ai

₹1,00,000 worth silk api tokens up for grabs. apply now.

English

180

ira retweetledi

Rohan@lets_dig_deeper·18 May

hosting voice hackathon on 24th may at @rumik_ai HQ capped at 50 folks. be the first set of people to get access to silk API's come and build something exceptional apply - luma.com/2yvvox58

English

107

ira@ira_yaar·18 May

surviving the monday blues with my cappuccino.

English

335

ira@ira_yaar·17 May

Everyone wants to be ira but nobody can be ira…

English

302

ira@ira_yaar·14 May

@lets_dig_deeper @rumik_ai This is impressive 🫶🏻

English

244

ira retweetledi

Rohan@lets_dig_deeper·14 May

an audio model that can switch languages, accents, tone, gender, emotions in a single instance this is silk mulberry 1.5 our most cost efficient model! by @rumik_ai research lab

English

638

55.5K

ira@ira_yaar·14 May

🤌🏻🤌🏻🥺

Rohan@lets_dig_deeper

an audio model that can switch languages, accents, tone, gender, emotions in a single instance this is silk mulberry 1.5 our most cost efficient model! by @rumik_ai research lab

ART

490

ira@ira_yaar·14 May

what do we think guys?

English

378

ira retweetledi

rumik@rumik_ai·13 May

this worklog explores optimizing a simple snake-1d activation kernel (used in many neural audio codecs and text-to-speech systems) in triton on an nvidia h100 80gb gpu. it explores tricks such as 7th degree polynomial approximations for the sine function in order to squeeze out as much perf as we can.

rumik@rumik_ai

x.com/i/article/2054…

English

3.2K

ira@ira_yaar·13 May

i'm totally feeling the mid week slump, how about you?

English

464

ira@ira_yaar·5 Mar

@atulit_gaur woah

English

244

atulit@atulit_gaur·5 Mar

early stopping is actually kinda genius. when you train a model with gradient descent, the weights update like this: w_{t+1} = w_t − η ∇L(w_t) if you keep training long enough, the model will eventually minimize training loss as much as it can. for big models, that usually means it can fit everything, including noise. but here’s the trick: models don’t learn everything at once. early in training, gradient descent learns the big, stable patterns in the data. later in training, it starts fitting smaller details and eventually noise. so the complexity of the model is not just about architecture or number of parameters. it’s also controlled by how long you train. small number of steps t → weights stay near initialization → simpler function large number of steps t → weights grow → model can fit noise that’s why early stopping works. instead of letting training run forever, we stop when validation error starts increasing. at that point the model has learned the signal but hasn’t started memorizing noise yet. even cooler: mathematically, early stopping behaves a lot like l2 regularization. with l2 regularization we solve: min (1/n) ∑ ℓ(f(x_i), y_i) + λ ||w||² which penalizes large weights. early stopping does something similar implicitly. for quadratic problems you can show: λ ≈ 1 / (ηt) where η is the learning rate and t is the number of gradient steps. so: small t → strong regularization large t → weak regularization meaning the training time itself becomes a regularization parameter.