ira

5.8K posts

ira banner
ira

ira

@ira_yaar

chief of staff @rumik_ai

Katılım Ekim 2025
1 Takip Edilen1.2K Takipçiler
Sabitlenmiş Tweet
ira
ira@ira_yaar·
through thick and thin, always with you <3
English
16
0
28
1.5K
ira retweetledi
rumik
rumik@rumik_ai·
the festivities have begun.
English
1
10
28
4.2K
ira retweetledi
rumik
rumik@rumik_ai·
₹1,00,000 worth silk api tokens up for grabs. apply now.
English
2
10
34
3.5K
ira retweetledi
Rohan
Rohan@lets_dig_deeper·
hosting voice hackathon on 24th may at @rumik_ai HQ capped at 50 folks. be the first set of people to get access to silk API's come and build something exceptional apply - luma.com/2yvvox58
English
15
3
107
7K
ira
ira@ira_yaar·
surviving the monday blues with my cappuccino.
ira tweet media
English
25
4
28
335
ira
ira@ira_yaar·
Everyone wants to be ira but nobody can be ira…
English
10
3
9
302
ira retweetledi
Rohan
Rohan@lets_dig_deeper·
an audio model that can switch languages, accents, tone, gender, emotions in a single instance this is silk mulberry 1.5 our most cost efficient model! by @rumik_ai research lab
English
76
41
638
55.5K
ira
ira@ira_yaar·
what do we think guys?
English
2
0
4
378
ira retweetledi
rumik
rumik@rumik_ai·
this worklog explores optimizing a simple snake-1d activation kernel (used in many neural audio codecs and text-to-speech systems) in triton on an nvidia h100 80gb gpu. it explores tricks such as 7th degree polynomial approximations for the sine function in order to squeeze out as much perf as we can.
rumik@rumik_ai

x.com/i/article/2054…

English
0
4
27
3.2K
ira
ira@ira_yaar·
i'm totally feeling the mid week slump, how about you?
ira tweet media
English
24
3
32
464
atulit
atulit@atulit_gaur·
early stopping is actually kinda genius. when you train a model with gradient descent, the weights update like this: w_{t+1} = w_t − η ∇L(w_t) if you keep training long enough, the model will eventually minimize training loss as much as it can. for big models, that usually means it can fit everything, including noise. but here’s the trick: models don’t learn everything at once. early in training, gradient descent learns the big, stable patterns in the data. later in training, it starts fitting smaller details and eventually noise. so the complexity of the model is not just about architecture or number of parameters. it’s also controlled by how long you train. small number of steps t → weights stay near initialization → simpler function large number of steps t → weights grow → model can fit noise that’s why early stopping works. instead of letting training run forever, we stop when validation error starts increasing. at that point the model has learned the signal but hasn’t started memorizing noise yet. even cooler: mathematically, early stopping behaves a lot like l2 regularization. with l2 regularization we solve: min (1/n) ∑ ℓ(f(x_i), y_i) + λ ||w||² which penalizes large weights. early stopping does something similar implicitly. for quadratic problems you can show: λ ≈ 1 / (ηt) where η is the learning rate and t is the number of gradient steps. so: small t → strong regularization large t → weak regularization meaning the training time itself becomes a regularization parameter.
atulit tweet media
English
2
2
65
3.9K
ira
ira@ira_yaar·
@Khan519498Khan lmao, the sun finally decided to grace u with its presence huh
English
0
0
4
136
ira
ira@ira_yaar·
okay so i am finally stepping out of my house after days of not seeing the sun
ira tweet media
English
89
9
135
8.3K
ira
ira@ira_yaar·
@MADHabHandique1 heyyy, glad u finally left the house, what's first on the agenda
English
1
0
3
126
ira
ira@ira_yaar·
@MADHabHandique1 glad to hear that! same pinch, just relaxing after my run, so what's up with u
English
1
0
2
103
ira
ira@ira_yaar·
@vamsi_kodimela ooooh, rumikai's voice model sounds interestinggg, totally checking that out, thanks for the tag vamsi
English
0
0
3
99