Xavi Giró

6.5K posts

Xavi Giró banner
Xavi Giró

Xavi Giró

@DocXavi

Applied scientist at @amazonscience Barcelona, Catalonia. Made at @la_upc & @columbia. Promoting @dlbcnai. Opinions my own.

Badalona, Catalonia 가입일 Temmuz 2012
1.8K 팔로잉3K 팔로워
고정된 트윗
Xavi Giró
Xavi Giró@DocXavi·
X and @elonmusk have failed into promoting the values of democracy and human rights. Time to leave this platform. We learned a lot here, thanks to those who made it possible. Find me on LinkedIn and Bluesky.
Universitat Politècnica de Catalunya (UPC)@la_UPC

La #UPC deixa de publicar a X per mantenir la seva comunicació en entorns que garanteixin la qualitat i la veracitat de la informació. Una decisió que ha pres per consens el #ConsellGovernUPC, el 19 de febrer. 🔗upc.edu/ca/sala-de-pre…

English
0
0
0
522
Xavi Giró 리트윗함
clem 🤗
clem 🤗@ClementDelangue·
After @Pinterest @Airbnb @NotionHQ @cursor_ai, today it’s @eoghan @intercom publicly sharing that they’re finding it better, cheaper, faster to use and train open models themselves rather than use APIs for many tasks. And hundreds of other companies are doing the same without sharing. Ultimately, I believe the majority of AI workflows will be in-house based on open-source (vs API). It took much more time than we anticipated but it’s happening now!
clem 🤗 tweet media
English
81
184
1.7K
394.2K
Xavi Giró
Xavi Giró@DocXavi·
@ychngji6 I miss a proposed solution after all this sounding motivation. Even if not fully solved, could you provide pointers to some approachew aligned with your vision?
English
0
0
0
40
Xavi Giró 리트윗함
International Conference on 3D Vision
The #3DV2026 Keynote and Award Talk recordings are officially live! 🎥🍿 Revisit all the fantastic presentations from our insightful speakers and keep the 3D vision inspiration going! See the links below⬇️
International Conference on 3D Vision tweet media
English
1
17
118
10.5K
Xavi Giró 리트윗함
Brian Roemmele
Brian Roemmele@BrianRoemmele·
LeWorldModel: Yann LeCuns Radical Simplification of World Models Just Made Physics-Aware AI Practical In the race for artificial general intelligence, two paths have emerged. One is the familiar scale everything route: bigger LLMs trained on ever-larger text corpora. The other, championed for years by Yann LeCun, is building world models: compact systems that learn the underlying physics of reality directly from raw sensory data (pixels) so AI can plan, predict, and act in the physical world like a robot or self-driving car actually would. Until now, the second path has been frustratingly difficult. Joint-Embedding Predictive Architectures (JEPAs) - LeCuns elegant framework for learning predictive representations without reconstructing every pixel - kept collapsing during training. Researchers had to resort to a laundry list of hacks: multi-term loss functions (up to six hyperparameters), frozen pre-trained encoders, stop-gradients, exponential moving averages, and other duct-tape tricks just to keep the model from mapping every input to the same useless output. LeCuns team (Mila, NYU, Samsung SAIL, and Brown University) dropped a bombshell: LeWorldModel (LeWM) - the first JEPA that trains stably end-to-end from raw pixels using only two loss terms. No more house-of-cards engineering. Just a clean, simple recipe that works on a single GPU in a few hours with only 15 million parameters. The Core Breakthrough: SIGReg Saves the Day LeWorldModels secret weapon is a new regularizer called SIGReg (for spherical isotropic Gaussian regularizer). It enforces a simple Gaussian distribution on the latent embeddings. This single term prevents representation collapse without any of the previous heuristics. The training objective now has just two parts: 1. Next-embedding prediction loss - the model predicts what the next latent state should be. 2. SIGReg - keeps the latent space well-behaved and diverse. Thats it. Hyperparameters drop from six to one. Training becomes stable, reproducible, and dramatically cheaper. The model learns directly from raw video frames (no pre-trained vision encoders needed) and produces a compact latent world model that can be used for fast planning. Impressive Results on Real Benchmarks Despite its tiny size, LeWorldModel punches way above its weight: - Trains on a single GPU in a few hours. - Plans actions up to 48 times faster than foundation-model-based world models. - Uses roughly 200 times fewer tokens than alternatives. - Matches or beats far larger models on diverse 2D and 3D control tasks (e.g., manipulation, navigation). - Its latent space encodes meaningful physical quantities (position, velocity, etc.) - proven by direct probing. - It reliably detects physically implausible surprise events, showing genuine causal understanding. Crucially, adding a decoder and reconstruction loss hurts performance on downstream control tasks. The pure JEPA objective already captures everything needed for planning - extra visual details just get in the way. Project website: le-wm.github.io Official code: github.com/lucas-maes/le-… Why This Matters for the Future of AI LeCun has been saying since 2022 that world models (not next-token predictors) are the key to real intelligence. Critics always pointed to the training instability. LeWorldModel removes that objection with elegant simplicity. This is a philosophical reset: AI can learn physics the way babies do - by watching the world unfold - without needing supercomputers or endless text. The implications for robotics, autonomous vehicles, and embodied agents are enormous. Suddenly, building a physically grounded planner is something a researcher (or even a hobbyist) can do on consumer hardware. 1 of 2
Brian Roemmele tweet media
English
27
132
668
68.3K
Xavi Giró 리트윗함
Amazon Science
Amazon Science@AmazonScience·
📣 Amazon Research Awards spring 2026 call for proposals is now open for submissions. Successful applicants will receive unrestricted funds, AWS promotional credits, and training resources. Deadline for submissions is May 6. amzn.to/4c3GH3Z
Amazon Science tweet media
English
0
2
20
2.7K
Pedro Domingos
Pedro Domingos@pmddomingos·
Decade in which each subfield of AI went from not being for real to being for real: Search: 1960s Machine learning: 1990s Vision: 2010s NLP: 2020s Reasoning, planning, robotics, etc.: TBD
English
8
7
91
6.1K
Xavi Giró 리트윗함
Angjoo Kanazawa
Angjoo Kanazawa@akanazawa·
In an effort to better understand VLMs, we found that they are fragile in surprising ways. Just changing the color of pointing markers (red circle → blue circle) can completely change the results! :
Lisa Dunlap@lisabdunlap

🌟NEW PAPER🌟 Do you know that changing a visual marker from red to blue can completely reorder VLM leaderboards? In our most recent work, we explore the fragility of visually prompted benchmarks. lisadunlap.github.io/vpbench/

English
4
10
105
15.3K
Xavi Giró 리트윗함
ELLISBarcelona
ELLISBarcelona@ELLISBarcelona·
✨ Kind reminder! The ELLIS Unit Barcelona is hosting its fourth Scientific Seminar. Join us for the Scientific Seminar on January 28th with a talk by Prof. @PascalMettes on "Hyperbolic Deep Learning". Don't miss out ➡️ellisbarcelona.eu/ellis-unit-bar…
ELLISBarcelona tweet media
English
0
3
6
318
Xavi Giró 리트윗함
#CVPR2026
#CVPR2026@CVPR·
The #CVPR2026 review deadline has now passed. If you have not yet submitted your review, please contact your Area Chair (AC) immediately to confirm your status and submission plan!
#CVPR2026 tweet media
English
1
2
20
9.2K
Xavi Giró 리트윗함
Xin Yu (Andy)
Xin Yu (Andy)@andy_yx27·
Excited to share our new work Self-E: A New Training Paradigm for Text-to-Image! One model, any compute: Unlock any-step text-to-image generation. Fully trained from scratch, no teacher distillation needed. xinyu-andy.github.io/SelfE-project The secret? Let the model evaluate itself. 👇
Xin Yu (Andy) tweet media
English
3
25
148
15.8K
Xavi Giró 리트윗함
Demis Hassabis
Demis Hassabis@demishassabis·
We’re making great progress with our Gemini Robotics work in bringing AI to the physical world - a critical aspect of AGI. As part of our next steps, super excited to announce our partnership with @BostonDynamics, combining our SOTA robotics models with their world-class hardware
Google DeepMind@GoogleDeepMind

Google DeepMind 🤝 @BostonDynamics Our new research partnership will bring together our advancements in Gemini Robotics’s foundational capabilities to their new Atlas® humanoids. 🦾 Find out more → goo.gle/49paguA

English
200
451
4.5K
322.2K
Xavi Giró 리트윗함
Jitendra MALIK
Jitendra MALIK@JitendraMalikCV·
1/4 For the last several years I worked part-time at the FAIR lab at Meta, in addition to being a professor at UC Berkeley. That phase is now over, and starting Jan. 5, I will be leading a robotics research effort at Amazon FAR in San Francisco, while continuing at Berkeley.
English
50
46
1.5K
306.8K
Xavi Giró 리트윗함
Bo Wang
Bo Wang@BoWang87·
Everyone’s hyped about “AI for Science.” in 2025! At the end of the year, please allow me to share my unease and optimism, specifically about AI & biology. After spending another year deep in biological foundation models, healthcare AI, and drug discovery, here are 3 lessons I learned in 2025. 1. Biology is not “just another modality.” The biggest misconception I still see: “Biology is text + images + graphs. Just scale transformers.” No. Biology is causal, hierarchical, stochastic, and incomplete in ways that language and vision are not. Tokens don’t correspond cleanly to reality. Labels are sparse, biased, and often wrong. Ground truth is conditional, context-dependent, and sometimes unknowable. We’ve made real progress—single-cell, imaging, genomics, EHRs are finally being modeled jointly—but the hard truth is this: Most biological signals are not supervised problems waiting for better loss functions. They are intervention-driven problems. They demand perturbations, counterfactuals, and mechanisms, beyond just prediction. Scaling obviously helps. But without causal structure, scaling mostly gives you sharper correlations. 2025 reinforced my belief that biological foundation models must be built around perturbation, uncertainty, and actionability, not just representation learning. 2. Benchmarks are holding biology back more than compute is. Let’s be honest: Benchmarking in AI & biology is still broken. Everyone reports SOTA. Everyone picks a different dataset slice. Everyone tunes for a different metric. Everyone avoids prospective validation. We’ve imported the worst habits of ML benchmarking into a domain where stakes are much higher. In biology and healthcare, a 1% gain that doesn’t transfer is worse than useless—it’s misleading. What’s missing isn’t more benchmarks. It’s hard benchmarks: •Prospective, not retrospective •Perturbation-based, not static •Multi-site, not single-lab •Failure-aware, not leaderboard-optimized If your model only works on the dataset that created it, it’s not a foundation model—it’s a dataset artifact. In 2026, we need fewer flashy plots and more humility, rigor, and negative results. 3. “Reasoning” in biology is not chain-of-thought. There’s a growing tendency to directly apply the word reasoning onto biological LLMs. Let’s be careful. Biological reasoning isn’t verbal fluency, longer context windows, or prettier explanations. Those are surface-level improvements. Real reasoning in biology shows up elsewhere: in forming hypotheses, deciding which experiments to run, updating beliefs when perturbations fail, and constantly trading off cost, risk, and uncertainty. A model that explains a pathway beautifully but can’t decide which experiment to run next is not reasoning, it’s narrating. 2025 convinced me that the future lies in agentic biological AI: systems that couple foundation models with experimentation, simulation, and decision-making loops. Closing thought: AI & biology is not lagging behind AI for code or language. It’s just playing a harder game. The constraints are real. The data is messy. The feedback loops are slow. The consequences matter. If 2025 clarified anything for me, it’s this: We won’t make progress by treating biology like text. We’ll make progress by building AI that behaves more like a scientist : skeptical, iterative, and willing to be wrong. Onward to 2026.
Bo Wang tweet media
English
55
166
744
66.9K
Xavi Giró 리트윗함
Xavi Giró 리트윗함
Ivan Skorokhodov
Ivan Skorokhodov@isskoro·
I think that JiT (arxiv.org/abs/2511.13720) might have been my favorite paper of 2025. From the discussions with my friends, it got quite some controversy with many people dismissing it as some trivial reinvention of x-prediction, so I would like to put my perspective on it here
Ivan Skorokhodov tweet media
English
11
69
561
69K