Kyle Sargent

221 posts

Kyle Sargent

Kyle Sargent

@KyleSargentAI

Computer vision researcher. CS PhD @Stanford advised by @jiajunwu_cs @drfeifei, Past: AI Resident @Google, A.B. @Harvard

Katılım Kasım 2012
1.1K Takip Edilen1.5K Takipçiler
Sabitlenmiş Tweet
Kyle Sargent
Kyle Sargent@KyleSargentAI·
Vision-language models are getting better every day. Can we use them to improve image compression? Yes! For my internship, working w/ @GoogleDeepMind, @GoogleResearch, we designed VLIC, a diffusion autoencoder post-trained with VLM preferences. Our preprint is out today! A🧵:
Kyle Sargent tweet media
English
5
39
314
43.6K
Kyle Sargent
Kyle Sargent@KyleSargentAI·
My personal benchmark for voice AI is: when am I going to be able to improve my Chinese by talking to it in a mixture of bad Chinese and English, interrupting it frequently for clarifications and definitions, switching often between languages, etc.? So far nothing works at all
English
0
0
7
893
Volodymyr Kuleshov 🇺🇦
@chenhao_chao Question: what happens if you apply the subtokenizer to the AR model? Does NLL improve similarly? Is it possible that the experiments are effectively comparing models with different tokenizers?
English
2
0
6
538
R.Сам 🦋🐏
R.Сам 🦋🐏@Logo_Daedalus·
So they didn’t nominate Eddington for anything but we’re supposed to take this seriously as some ceremony of cinematic legitimacy granting
English
22
51
870
23.6K
Kyle Sargent retweetledi
Haian Jin
Haian Jin@Haian_Jin·
Spatial reconstruction is a long-context problem: real scenes come with hundreds of images. But O(N²) transformer-based models don’t scale efficiently. Introducing: 🤐ZipMap (CVPR ’26): Linear-Time, Stateful 3D Reconstruction via Test-Time Training (TTT). ZipMap “zips” a large image collection into an implicit TTT scene state in a single linear-time operation. The state will then be decoded into spatial outputs, and can be queried efficiently for novel-view geometry and appearance (~100 FPS) ZipMap is not only much faster (>20× faster than VGGT), but also matches or surpasses the accuracy of all SOTA models.
English
19
99
744
66.8K
Kyle Sargent retweetledi
Zizhang Li
Zizhang Li@zizhang_li·
#CVPR2026 🤩 PerpetualWonder: interactive 4D scene generation with long-horizon actions. From a single image, build a world you can interact with a sequence of ✨3D actions (point force / wind / gravity). Project: johnzhan2023.github.io/PerpetualWonde… More details in 🧵: 1/5
English
5
20
111
13.2K
Kyle Sargent
Kyle Sargent@KyleSargentAI·
"b (gh gw) (ph pw c) -> b c (gh ph) (gw pw)"
Kyle Sargent tweet media
Polski
0
0
10
569
Greg Yang
Greg Yang@TheGregYang·
I'm trying out @NotionHQ for organizing my learnings on lyme & test results feedback so far 1. android app is kinda jank and lags everywhere 2. notion ai is not bad so far but very annoying it can read files I upload in the chat window but can't attach those files to pages
English
26
4
139
13.7K
Kyle Sargent
Kyle Sargent@KyleSargentAI·
I’m broadly optimistic about the ability of AI to automate most coding, but there’s something about human care and attention in software that still feels so valuable and important to me. The other day I flipped open my Game Boy Advance SP, which my parents got me in 2003-ish, and started playing Super Mario Bros 3. After 23 years, it still works perfectly. I’ve played the whole game through maybe a dozen times – hundreds or low thousands of hours of overall play time – and have yet to encounter a single bug. There’s maybe one sharp edge, which is that in the Ice World stage, level 5 or 6 (forget which) has a button combination that’s a little challenging to press on the Game Boy Advance SP compared with other platforms on which I assume the game was released. But that’s all I can remember struggling with. AI can write thousands of lines of code for you, but what software do we have now, in 2026, that still works this way? Trying to boot up my PS5 to play a triple-A title is an aggravating bore because you have to deal with mandatory console updates, followed by mandatory game updates, etc.
English
2
0
21
1.8K
Kyle Sargent
Kyle Sargent@KyleSargentAI·
I love AI coding tools but often feel they are way too aggressive about suppressing errors or introducing footguns like my_dict.get(key, dumb_default). Silent failures are bad! If there's a bug in my code I want to know - show me the traceback!
English
1
0
9
790
Hadi AlZayer
Hadi AlZayer@HadiZayer·
What I appreciate about this work is how we can optimize compression models on metrics beyond L2 reconstruction error. VLMs provide a neat tool to allow us to optimize for arbitrary goals and objectives
Kyle Sargent@KyleSargentAI

Vision-language models are getting better every day. Can we use them to improve image compression? Yes! For my internship, working w/ @GoogleDeepMind, @GoogleResearch, we designed VLIC, a diffusion autoencoder post-trained with VLM preferences. Our preprint is out today! A🧵:

English
2
0
10
794
Kyle Sargent retweetledi
Chen Geng
Chen Geng@gengchen01·
✨ Any static 3D assets ➡️ 4D dynamic worlds. Introducing CHORD, a universal framework for generating scene-level 4D dynamic motion from any static 3D inputs. It generalizes surprisingly well across a wide range of objects 🤯 and can even be used to learn robotics manipulation policy 🤖! Project page: yanzhelyu.github.io/chord. Dive deeper in a 🧵: 1/n
English
10
65
408
41.3K
Kyle Sargent
Kyle Sargent@KyleSargentAI·
@__nmca__ Are you hinting zoph has a 360 bench? That’s insanely impressive if so
English
0
0
1
329
Nat McAleese
Nat McAleese@__nmca__·
some consider a 360lb bench press unethical
English
3
0
35
4.9K
Kyle Sargent retweetledi
Daniel Litt
Daniel Litt@littmath·
IMO it should be considered quite rude in most contexts to post or send someone a wall of 100% AI-generated text. “Here, read this thing I didn’t care enough about to express myself.”
English
176
691
9.2K
765.1K
Kyle Sargent retweetledi
Wenlong Huang
Wenlong Huang@wenlong_huang·
What if we can simulate an *interactive 3D world*, from a single image, in the wild, in real time? Introducing PointWorld-1B: a large pre-trained 3D world model that predicts env dynamics given RGB-D capture and robot actions. 🌐 point-world.github.io from @Stanford @nvidia
English
23
225
1.2K
233.9K