Kyle Sargent

221 posts

Kyle Sargent

@KyleSargentAI

Computer vision researcher. CS PhD @Stanford advised by @jiajunwu_cs @drfeifei, Past: AI Resident @Google, A.B. @Harvard

Katılım Kasım 2012

1.1K Takip Edilen1.5K Takipçiler

Sabitlenmiş Tweet

Kyle Sargent@KyleSargentAI·19 Ara

Vision-language models are getting better every day. Can we use them to improve image compression? Yes! For my internship, working w/ @GoogleDeepMind, @GoogleResearch, we designed VLIC, a diffusion autoencoder post-trained with VLM preferences. Our preprint is out today! A🧵:

English

314

43.6K

Kyle Sargent@KyleSargentAI·20h

My personal benchmark for voice AI is: when am I going to be able to improve my Chinese by talking to it in a mixture of bad Chinese and English, interrupting it frequently for clarifications and definitions, switching often between languages, etc.? So far nothing works at all

English

893

Kyle Sargent@KyleSargentAI·1d

@volokuleshov @chenhao_chao Same question, seems like kind of a free lunch?

English

Volodymyr Kuleshov 🇺🇦@volokuleshov·2d

@chenhao_chao Question: what happens if you apply the subtokenizer to the AR model? Does NLL improve similarly? Is it possible that the experiments are effectively comparing models with different tokenizers?

English

538

Chen-Hao (Lance) Chao@chenhao_chao·3d

(1/7) We introduce MDM-Prime-v2 which scales 21.8× better than autoregressive models (ARMs) in compute-optimal comparisons. 📎 Paper: arxiv.org/abs/2603.16077 🌟 Blog: chen-hao-chao.github.io/mdm-prime-v2 ⌨️ Github: github.com/chen-hao-chao/… Here’s how we did it👇:

GIF

English

309

28.4K

Kyle Sargent@KyleSargentAI·6d

@Logo_Daedalus Eddington was an astonishing film. Insane snub

English

974

R.Сам 🦋🐏@Logo_Daedalus·6d

So they didn’t nominate Eddington for anything but we’re supposed to take this seriously as some ceremony of cinematic legitimacy granting

English

870

23.6K

Kyle Sargent retweetledi

Haian Jin@Haian_Jin·6 Mar

Spatial reconstruction is a long-context problem: real scenes come with hundreds of images. But O(N²) transformer-based models don’t scale efficiently. Introducing: 🤐ZipMap (CVPR ’26): Linear-Time, Stateful 3D Reconstruction via Test-Time Training (TTT). ZipMap “zips” a large image collection into an implicit TTT scene state in a single linear-time operation. The state will then be decoded into spatial outputs, and can be queried efficiently for novel-view geometry and appearance (~100 FPS) ZipMap is not only much faster (>20× faster than VGGT), but also matches or surpasses the accuracy of all SOTA models.

English

744

66.8K

Kyle Sargent@KyleSargentAI·4 Mar

Can we not do this please lol? Can we just stick with artificial NNs? They're pretty good now!

chiefofautism@chiefofautism

someone connected LIVING BRAIN CELLS to an LLM Cortical Labs grew 200,000 human neurons in a lab and kept them alive on a silicon chip, they taught the neurons to play Pong, then DOOM now someone wired them into a LLM... real brain cells firing electrical impulses to choose every token the AI generates you can see which channels were stimulated, the feedback from the neurons in choosing that letter or word

English

2.3K

Kyle Sargent retweetledi

Zizhang Li@zizhang_li·24 Şub

#CVPR2026 🤩 PerpetualWonder: interactive 4D scene generation with long-horizon actions. From a single image, build a world you can interact with a sequence of ✨3D actions (point force / wind / gravity). Project: johnzhan2023.github.io/PerpetualWonde… More details in 🧵: 1/5

English

111

13.2K

Kyle Sargent@KyleSargentAI·24 Şub

@Taesung Whoa

English

171

Taesung Park@Taesung·24 Şub

Reve's new text-to-image model is here. Really proud of the team to rank at #3 with the big labs. To my knowledge, we are the first lab to use native pixel space diffusion without latent autoencoder at 4k (16MP) resolution for production level image generation.

Reve@reve

We’re releasing an early version of our new text-to-image model, and we’re already a top three model on @arena

English

179

36.6K

Kyle Sargent@KyleSargentAI·15 Şub

"b (gh gw) (ph pw c) -> b c (gh ph) (gw pw)"

Polski

569

Kyle Sargent@KyleSargentAI·9 Şub

@TheGregYang @NotionHQ Notion is great but for some reason their desktop version is the only usable version

English

469

Greg Yang@TheGregYang·9 Şub

I'm trying out @NotionHQ for organizing my learnings on lyme & test results feedback so far 1. android app is kinda jank and lags everywhere 2. notion ai is not bad so far but very annoying it can read files I upload in the chat window but can't attach those files to pages

English

139

13.7K

Kyle Sargent@KyleSargentAI·29 Oca

@holynski_ Looks amazing

English

253

Aleksander Holynski@holynski_·29 Oca

hey guys, you wanted to try Genie... now's your chance!!!

Google DeepMind@GoogleDeepMind

Step inside Project Genie: our experimental research prototype that lets you create, edit, and explore virtual worlds. 🌎

English

103

13.5K

Kyle Sargent@KyleSargentAI·29 Oca

I’m broadly optimistic about the ability of AI to automate most coding, but there’s something about human care and attention in software that still feels so valuable and important to me. The other day I flipped open my Game Boy Advance SP, which my parents got me in 2003-ish, and started playing Super Mario Bros 3. After 23 years, it still works perfectly. I’ve played the whole game through maybe a dozen times – hundreds or low thousands of hours of overall play time – and have yet to encounter a single bug. There’s maybe one sharp edge, which is that in the Ice World stage, level 5 or 6 (forget which) has a button combination that’s a little challenging to press on the Game Boy Advance SP compared with other platforms on which I assume the game was released. But that’s all I can remember struggling with. AI can write thousands of lines of code for you, but what software do we have now, in 2026, that still works this way? Trying to boot up my PS5 to play a triple-A title is an aggravating bore because you have to deal with mandatory console updates, followed by mandatory game updates, etc.

English

1.8K

Kyle Sargent@KyleSargentAI·26 Oca

Maybe but N-dimensional hyperparameter sweeps can get pricey

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

Periodic reminder that training spend of OpenAI equals hundreds-thousands of DeepSeek V3 per year and is growing exponentially, and an average experiment is much smaller than V3. How much of that do you think even *can* be human-designed? We're deep into automating AI research.

English

1.5K

Kyle Sargent@KyleSargentAI·26 Oca

I love AI coding tools but often feel they are way too aggressive about suppressing errors or introducing footguns like my_dict.get(key, dumb_default). Silent failures are bad! If there's a bug in my code I want to know - show me the traceback!

English

790

Kyle Sargent@KyleSargentAI·24 Oca

@HadiZayer Thanks Hadi :)

English

249

Hadi AlZayer@HadiZayer·23 Oca

What I appreciate about this work is how we can optimize compression models on metrics beyond L2 reconstruction error. VLMs provide a neat tool to allow us to optimize for arbitrary goals and objectives

Kyle Sargent@KyleSargentAI

English

794

Kyle Sargent retweetledi

Chen Geng@gengchen01·17 Oca

✨ Any static 3D assets ➡️ 4D dynamic worlds. Introducing CHORD, a universal framework for generating scene-level 4D dynamic motion from any static 3D inputs. It generalizes surprisingly well across a wide range of objects 🤯 and can even be used to learn robotics manipulation policy 🤖! Project page: yanzhelyu.github.io/chord. Dive deeper in a 🧵: 1/n

English

408

41.3K

Kyle Sargent@KyleSargentAI·15 Oca

@__nmca__ Are you hinting zoph has a 360 bench? That’s insanely impressive if so

English

329

Nat McAleese@__nmca__·15 Oca

some consider a 360lb bench press unethical

English

4.9K

Kyle Sargent retweetledi

Daniel Litt@littmath·12 Oca

IMO it should be considered quite rude in most contexts to post or send someone a wall of 100% AI-generated text. “Here, read this thing I didn’t care enough about to express myself.”

English

176

691

9.2K

765.1K

Kyle Sargent retweetledi

Wenlong Huang@wenlong_huang·8 Oca

What if we can simulate an *interactive 3D world*, from a single image, in the wild, in real time? Introducing PointWorld-1B: a large pre-trained 3D world model that predicts env dynamics given RGB-D capture and robot actions. 🌐 point-world.github.io from @Stanford @nvidia

English

225

1.2K

233.9K

Kyle Sargent@KyleSargentAI·11 Oca

@NotionHQ mobile app really needs some love wow

English

263

Keşfet

@volokuleshov @chenhao_chao @Logo_Daedalus @Taesung @TheGregYang @NotionHQ @holynski_ @HadiZayer