Kaahan Radia

18 posts

Kaahan Radia banner
Kaahan Radia

Kaahan Radia

@kradisme

Building something new @keyframelabs. Ex-Zipline.

Los Angeles, CA Katılım Nisan 2020
60 Takip Edilen37 Takipçiler
Kaahan Radia retweetledi
Y Combinator
Y Combinator@ycombinator·
Datost (@datostapp) is an AI data analyst in Slack. It keeps a semantic layer of your business definitions, crm, docs, and codebase so it knows what questions mean. 75.2% on the hardest public text-to-SQL benchmark, where Opus 4.6 scores 33%. Congrats on the launch, @maceock & @jasonhywang! ycombinator.com/launches/Pxg-d…
English
33
17
195
133.1K
Kaahan Radia retweetledi
LiveKit
LiveKit@livekit·
We built a demo with Keyframe Labs avatars on the LiveKit Agents Framework. The avatar doesn't just lip-sync. It picks up on the emotional context of the conversation, and you can see it in its face when the mood changes. It can also hand off the conversation to a different agent without reconnecting. The new agent fires RPCs to update the UI in real time. LiveKit x Keyframe plugin and sample repo in the thread.
English
2
9
65
4K
Kaahan Radia
Kaahan Radia@kradisme·
@arnie_hacker IMO initial experiments on a small dataset, image it, way easier to debug and iterate. For larger datasets, kinda depends on your video characteristics; getting sampling diversity might require decoding more frames than you think. Preproc + streaming webdataset is our go to.
English
0
0
1
234
Arnie Ramesh
Arnie Ramesh@arnie_hacker·
Anyone experienced with training video diffusion models? Noob question: Do you pre-process mp4 into individual frames and store before training? Doesn't this blow-up storage requirements? Or do you dynamically convert mp4 into frames during training (how is this parallelized?)
English
16
2
73
13K
Kaahan Radia
Kaahan Radia@kradisme·
@sedielem @ThKouz Matches my intuition; contrary to a bunch of papers that came out late 2025, tuning noise scaling still seems to be necessary, even for these newer architectures.
English
0
0
0
64
Sander Dieleman
Sander Dieleman@sedielem·
Neat idea: jointly diffuse pixels and DINO features with separate noise levels. Then optimise the trajectory through 2D noise level space. Could do this with DINO + traditional VAE latents as well to get a souped-up version of ReDi (representationdiffusion.github.io @ThKouz et al.)!
Sander Dieleman tweet media
Alan Baade@BaadeAlan

What's the right space to diffuse in: Raw Data or Latents? Why not both! In Latent Forcing, we order a joint diffusion trajectory to reveal Latents before Pixels, leading to improved convergence while being lossless at encoding and end-to-end at inference. w/ @drfeifei+... 1/n

Islington, London 🇬🇧 English
2
24
219
13.7K
Kaahan Radia
Kaahan Radia@kradisme·
the only thing I’ve learned from this whole ai coding thing is that some people are really, really bad at reviewing PRs
English
1
0
3
95
Kaahan Radia retweetledi
Keyframe Labs
Keyframe Labs@KeyframeLabs·
Introducing the world's most expressive, conversational AI humans. Runs in real-time at just $0.06 per minute. Watch Cosmo move fluidly through emotions in an unedited conversation with our CTO.
English
5
1
14
669
Mati Staniszewski
Mati Staniszewski@matiii·
Passing the Turing test for voice agents. We just shipped a major ElevenAgents update: - Lower-latency, smoother turn-taking with new conversational model - Expressive Mode for contextual emotional delivery - Available in 70+ languages
English
162
165
2.6K
277.8K
Kaahan Radia
Kaahan Radia@kradisme·
@gabriberton Did a little bit of this at Zipline, harder than you think to stop even a medium capacity ResNets from zero-ing out the gradient reversal layer's impact. It would bifurcate it's own feature representations!
English
0
0
0
68
Gabriele Berton
Gabriele Berton@gabriberton·
A little more info on Domain Adaptation: the task is that you would have a labelled train set of one "source" domain (e.g. daytime images) and an unlabelled set from the test/target domain (e.g. night images). [1/N]
Gabriele Berton tweet media
Gabriele Berton@gabriberton

Writing this gave me flashbacks of when CLIP came out. Part of my lab was working on Domain Adaptation, i.e. adapting models to unseen domains. CLIP killed that field CLIP has seen everything, suddenly there was this model with no unseen domain. [1/2]

English
4
2
68
10.5K
Kaahan Radia
Kaahan Radia@kradisme·
@KBlueleaf Smells like FSQ shenanigans, maybe a residual variant? Cool stuff.
English
1
0
0
91
琥珀青葉@KohakuLab
琥珀青葉@KohakuLab@KBlueleaf·
30k step from scratch, no GAN training F16 VQ-VAE with effective 2^64 codebook size, 512 emb dim, trainable param for VQ is 66K only
琥珀青葉@KohakuLab tweet media
English
3
4
109
7.5K
Kaahan Radia
Kaahan Radia@kradisme·
@unilightwf Feels like there’s an obvious extension for TTS — reminds me, in spirit, of the similarity scoring Tortoise did.
English
1
0
0
155
Wen-Chin Huang
Wen-Chin Huang@unilightwf·
While everyone is amazed by SAM audio, the hidden gem to me is the SAM Audio Judge! SAM Audio judge assesses how well a separated audio matches a given text description in terms of (1) overall quality (2) recall (3) precision (4) faithfulness. huggingface.co/facebook/sam-a…
Wen-Chin Huang tweet media
English
3
30
218
11.6K
Kaahan Radia retweetledi
Zipline
Zipline@zipline·
The future of delivery has arrived
English
37
159
968
283.1K