Sabitlenmiş Tweet
Jon Barron
3K posts

Jon Barron
@jon_barron
Principal research scientist at Google DeepMind. Synthesized views are my own.
SF Bay Area Katılım Mayıs 2010
1.4K Takip Edilen33.4K Takipçiler

@AndrewSchmidtFC I think storage and network costs would be the biggest blocker for an OSS system
English

Instead of letting Google or Apple own it, what if we made great open source software and hardware to capture and render these things, and built a public repository of the world?
But, y’know. I dream.
Bilawal Sidhu@bilawalsidhu
The next generation of street view will be wildly immersive.
English

@AjdDavison I bet your unlock will come from doing your LLM conversation and ideation inside of an IDE, instead of a chat exchange. Chat is cheap, unit tests are everything.
English

Related... is anyone out there making progress on their *hardest* research problems using LLMs? The kind you've been wondering about for years, where it's hard to even describe what you're trying to do but just have a feeling there's something to find. Honest question: how? 1/2
kache@yacineMTB
you can outsource your thinking but you cannot outsource your understanding
English

@AjdDavison Yes! You just talk to it. The last few months have been the most exciting time of my research career, I think
English
Jon Barron retweetledi

Two months ago, I vaguely posted a number: 0.9 FID, one-step, pixel space.
Now it is 0.75, and can be even lower.
Many wonder how.
I thought it might end as a small FID prank: simple and deliberate.
It started with one question: can FID be optimized directly, and what does it reveal?
Introducing FD-loss.

English
Jon Barron retweetledi

@thomasahle does this have a closed form? Feels like it should.
English

@jon_barron @theworldlabs I'll send over a live link to you directly once finished! You will be able to explore a bunch of worlds 🙏 we are getting close!
English

@rms80 @theworldlabs I see a video of a game and the post says it's "ready to explore" so I assume it's the game that is ready, not like the assets underneath the game
English

@jon_barron @theworldlabs the post copy does actually only say "fly through it" 🫤
English

@theworldlabs cmon man you said it was "ready to explore", gimme the sword already!
English

@jon_barron Game coming soon! 🤩
You can fly through the splat here (also linked in our blog via thread):
wlt-ai-cdn.art/spark-2.0/2604…
English

@DFinsterwalder deep learning people succeeded despite their philosophizing, not because of it
English

@jon_barron I get the joke. Most “world model” discourse is vapor.
But “no under-the-hood questions” was not the advice that got neural nets out of crackpot territory and got Hinton the Nobel.
Just saying.
English

@JitendraMalikCV Yeah this definition holds up pretty well, maybe due to the MDP scoping the problem statement narrowly enough that "the world" gains a concrete technical meaning.
English

@jon_barron "World models" has a technical meaning - the transition model/dynamics model from Bellman/Kalman in the context of MDPs/ state space approach to control theory ~ 1960. I gave a talk on this history youtube.com/watch?v=9B4kka…

YouTube
English
Jon Barron retweetledi

Earlier this year, we launched 4D Generation with mesh-based schemas like the car-5. Now, we're expanding to 30+ new schemas powered by Procedural Model Generation.
This shift allows for fully functional and editable 3D assets—from submarines that dive to jet planes that fly.
Here's a sneak peek of what's coming soon.
English

@ryancjulian "velocity models" sounds pretty cool (and fundable)
English

@jon_barron "forward dynamics model" there for decades, but I guess that doesn't pass VC readability
English

@keenanisalive yeah the absence of dynamics in these models is huge. Vanilla dynamic 4D models also feel insufficient, if they're just "animated" rather than simulated. Really looking forward to physics getting into the mix more, that'll be satisfying.
English

@keenanisalive Put video generation world models also arguably don't predict how the natural world behaves, they predict pixels that show that behavior. Very hard to nail down what a world model should be but I think the 3D models come slightly closer to obviously modeling "the world"
English

A bunch of folks have been building machine learning models that turn a photograph into a 3D environment made of Gaussian splats (read: blobs of color floating in space).
Cool technology & a very admirable effort. But marketing these as "world models" seems wrong.
More accurate would be to say that they are a riff on the broader class of image-conditioned 3D generators, with a somewhat different flavor of condition image and output representation.
As far as world modeling, they don't make great predictions about how the natural world looks or behaves. (Even for, say, a chair behind a table.)
Again: I love the technology. Super cool creative stuff. I don't love the marketing and hype around it.
English

@gentile_captial yeah loop closure seems hard. But making individual small environments seems doable, you just gotta chain them together.
English

@jon_barron only works for open spaces. Try generating a dungeon crawler that gets you back to your original starting point. I've got some ideas for how to solve this, but alas I lack the compute for the effort.
English

Gen3D world models seem to work now. I humbly request that someone finally put some guns and swords into one of these systems and host a deathmatch. I will be there and I can bring some gen 3D luminaries, it'll be a blast, just set it up and ping me on discord.
SpAItial AI@SpAItial_AI
Echo-2 is a physically-grounded world model from which we can distill meshes, point clouds, or 3DGS scene representations. Directly usable in a myriad of downstream applications from gaming to training robots. Want to build your own world? Try it here: spaitial.ai
English
Jon Barron retweetledi





