Xavi Giró

6.5K posts

Xavi Giró

@DocXavi

Applied scientist at @amazonscience Barcelona, Catalonia. Made at @la_upc & @columbia. Promoting @dlbcnai. Opinions my own.

Badalona, Catalonia 가입일 Temmuz 2012

1.8K 팔로잉3K 팔로워

고정된 트윗

Xavi Giró@DocXavi·4 Şub

X and @elonmusk have failed into promoting the values of democracy and human rights. Time to leave this platform. We learned a lot here, thanks to those who made it possible. Find me on LinkedIn and Bluesky.

Universitat Politècnica de Catalunya (UPC)@la_UPC

La #UPC deixa de publicar a X per mantenir la seva comunicació en entorns que garanteixin la qualitat i la veracitat de la informació. Una decisió que ha pres per consens el #ConsellGovernUPC, el 19 de febrer. 🔗upc.edu/ca/sala-de-pre…

English

522

Xavi Giró 리트윗함

clem 🤗@ClementDelangue·6d

After @Pinterest @Airbnb @NotionHQ @cursor_ai, today it’s @eoghan @intercom publicly sharing that they’re finding it better, cheaper, faster to use and train open models themselves rather than use APIs for many tasks. And hundreds of other companies are doing the same without sharing. Ultimately, I believe the majority of AI workflows will be in-house based on open-source (vs API). It took much more time than we anticipated but it’s happening now!

English

184

1.7K

394.2K

Xavi Giró@DocXavi·5d

@ychngji6 I miss a proposed solution after all this sounding motivation. Even if not fully solved, could you provide pointers to some approachew aligned with your vision?

English

Xavi Giró 리트윗함

Chongjie(CJ) Ye@ychngji6·5d

x.com/i/article/2037…

ZXX

251

77K

Xavi Giró 리트윗함

International Conference on 3D Vision@3DVconf·6d

The #3DV2026 Keynote and Award Talk recordings are officially live! 🎥🍿 Revisit all the fantastic presentations from our insightful speakers and keep the 3D vision inspiration going! See the links below⬇️

International Conference on 3D Vision tweet media

English

118

10.5K

Xavi Giró 리트윗함

Brian Roemmele@BrianRoemmele·25 Mar

LeWorldModel: Yann LeCuns Radical Simplification of World Models Just Made Physics-Aware AI Practical In the race for artificial general intelligence, two paths have emerged. One is the familiar scale everything route: bigger LLMs trained on ever-larger text corpora. The other, championed for years by Yann LeCun, is building world models: compact systems that learn the underlying physics of reality directly from raw sensory data (pixels) so AI can plan, predict, and act in the physical world like a robot or self-driving car actually would. Until now, the second path has been frustratingly difficult. Joint-Embedding Predictive Architectures (JEPAs) - LeCuns elegant framework for learning predictive representations without reconstructing every pixel - kept collapsing during training. Researchers had to resort to a laundry list of hacks: multi-term loss functions (up to six hyperparameters), frozen pre-trained encoders, stop-gradients, exponential moving averages, and other duct-tape tricks just to keep the model from mapping every input to the same useless output. LeCuns team (Mila, NYU, Samsung SAIL, and Brown University) dropped a bombshell: LeWorldModel (LeWM) - the first JEPA that trains stably end-to-end from raw pixels using only two loss terms. No more house-of-cards engineering. Just a clean, simple recipe that works on a single GPU in a few hours with only 15 million parameters. The Core Breakthrough: SIGReg Saves the Day LeWorldModels secret weapon is a new regularizer called SIGReg (for spherical isotropic Gaussian regularizer). It enforces a simple Gaussian distribution on the latent embeddings. This single term prevents representation collapse without any of the previous heuristics. The training objective now has just two parts: 1. Next-embedding prediction loss - the model predicts what the next latent state should be. 2. SIGReg - keeps the latent space well-behaved and diverse. Thats it. Hyperparameters drop from six to one. Training becomes stable, reproducible, and dramatically cheaper. The model learns directly from raw video frames (no pre-trained vision encoders needed) and produces a compact latent world model that can be used for fast planning. Impressive Results on Real Benchmarks Despite its tiny size, LeWorldModel punches way above its weight: - Trains on a single GPU in a few hours. - Plans actions up to 48 times faster than foundation-model-based world models. - Uses roughly 200 times fewer tokens than alternatives. - Matches or beats far larger models on diverse 2D and 3D control tasks (e.g., manipulation, navigation). - Its latent space encodes meaningful physical quantities (position, velocity, etc.) - proven by direct probing. - It reliably detects physically implausible surprise events, showing genuine causal understanding. Crucially, adding a decoder and reconstruction loss hurts performance on downstream control tasks. The pure JEPA objective already captures everything needed for planning - extra visual details just get in the way. Project website: le-wm.github.io Official code: github.com/lucas-maes/le-… Why This Matters for the Future of AI LeCun has been saying since 2022 that world models (not next-token predictors) are the key to real intelligence. Critics always pointed to the training instability. LeWorldModel removes that objection with elegant simplicity. This is a philosophical reset: AI can learn physics the way babies do - by watching the world unfold - without needing supercomputers or endless text. The implications for robotics, autonomous vehicles, and embodied agents are enormous. Suddenly, building a physically grounded planner is something a researcher (or even a hobbyist) can do on consumer hardware. 1 of 2

English

132

668

68.3K

Xavi Giró 리트윗함

Amazon Science@AmazonScience·25 Mar

📣 Amazon Research Awards spring 2026 call for proposals is now open for submissions. Successful applicants will receive unrestricted funds, AWS promotional credits, and training resources. Deadline for submissions is May 6. amzn.to/4c3GH3Z

English

2.7K

Xavi Giró@DocXavi·31 Oca

Humanity well-being 2030s ? aiforpeaceworkshop.github.io

Pedro Domingos@pmddomingos

Decade in which each subfield of AI went from not being for real to being for real: Search: 1960s Machine learning: 1990s Vision: 2010s NLP: 2020s Reasoning, planning, robotics, etc.: TBD

English

175

Xavi Giró@DocXavi·31 Oca

@pmddomingos Humanity well-being 2030s ?

English

134

Pedro Domingos@pmddomingos·31 Oca

Decade in which each subfield of AI went from not being for real to being for real: Search: 1960s Machine learning: 1990s Vision: 2010s NLP: 2020s Reasoning, planning, robotics, etc.: TBD

English

6.1K

Xavi Giró 리트윗함

#CVPR2026@CVPR·28 Oca

Before you hit submit: Check if your paper title is included. It must be there to comply with the #CVPR2026 rebuttal template. 🔍

#CVPR2026@CVPR

As you write your #CVPR2026 rebuttal, please note the policies below. Good luck ✍️

English

32.8K

Xavi Giró 리트윗함

Angjoo Kanazawa@akanazawa·16 Oca

In an effort to better understand VLMs, we found that they are fragile in surprising ways. Just changing the color of pointing markers (red circle → blue circle) can completely change the results! :

Lisa Dunlap@lisabdunlap

🌟NEW PAPER🌟 Do you know that changing a visual marker from red to blue can completely reorder VLM leaderboards? In our most recent work, we explore the fragility of visually prompted benchmarks. lisadunlap.github.io/vpbench/

English

105

15.3K

Xavi Giró 리트윗함

ELLISBarcelona@ELLISBarcelona·13 Oca

✨ Kind reminder! The ELLIS Unit Barcelona is hosting its fourth Scientific Seminar. Join us for the Scientific Seminar on January 28th with a talk by Prof. @PascalMettes on "Hyperbolic Deep Learning". Don't miss out ➡️ellisbarcelona.eu/ellis-unit-bar…

English

318

Xavi Giró 리트윗함

#CVPR2026@CVPR·15 Oca

The #CVPR2026 review deadline has now passed. If you have not yet submitted your review, please contact your Area Chair (AC) immediately to confirm your status and submission plan!

English

9.2K

Xavi Giró 리트윗함

Xin Yu (Andy)@andy_yx27·7 Oca

Excited to share our new work Self-E: A New Training Paradigm for Text-to-Image! One model, any compute: Unlock any-step text-to-image generation. Fully trained from scratch, no teacher distillation needed. xinyu-andy.github.io/SelfE-project The secret? Let the model evaluate itself. 👇

English

148

15.8K

Xavi Giró 리트윗함

Demis Hassabis@demishassabis·6 Oca

We’re making great progress with our Gemini Robotics work in bringing AI to the physical world - a critical aspect of AGI. As part of our next steps, super excited to announce our partnership with @BostonDynamics, combining our SOTA robotics models with their world-class hardware

Google DeepMind@GoogleDeepMind

Google DeepMind 🤝 @BostonDynamics Our new research partnership will bring together our advancements in Gemini Robotics’s foundational capabilities to their new Atlas® humanoids. 🦾 Find out more → goo.gle/49paguA

English

200

451

4.5K

322.2K

Xavi Giró@DocXavi·4 Oca

🥳 Paper accepted at the @wacv_official 2026 5th Workshop on Image/Video/Audio Quality Assessment in Computer Vision, VLM and Diffusion Model: …2026-image-quality-workshop.github.io

Xavi Giró@DocXavi

🚀 Cut Your Image Review Costs with Smart AutoQA! ✨ The magic formula: As long as your AutoQA precision beats your GenAI accuracy, you're saving money and time. arxiv.org/abs/2510.16179

English

279

Xavi Giró 리트윗함

Jitendra MALIK@JitendraMalikCV·4 Oca

1/4 For the last several years I worked part-time at the FAIR lab at Meta, in addition to being a professor at UC Berkeley. That phase is now over, and starting Jan. 5, I will be leading a robotics research effort at Amazon FAR in San Francisco, while continuing at Berkeley.

English

1.5K

306.8K

Xavi Giró 리트윗함

Bo Wang@BoWang87·31 Ara

Everyone’s hyped about “AI for Science.” in 2025! At the end of the year, please allow me to share my unease and optimism, specifically about AI & biology. After spending another year deep in biological foundation models, healthcare AI, and drug discovery, here are 3 lessons I learned in 2025. 1. Biology is not “just another modality.” The biggest misconception I still see: “Biology is text + images + graphs. Just scale transformers.” No. Biology is causal, hierarchical, stochastic, and incomplete in ways that language and vision are not. Tokens don’t correspond cleanly to reality. Labels are sparse, biased, and often wrong. Ground truth is conditional, context-dependent, and sometimes unknowable. We’ve made real progress—single-cell, imaging, genomics, EHRs are finally being modeled jointly—but the hard truth is this: Most biological signals are not supervised problems waiting for better loss functions. They are intervention-driven problems. They demand perturbations, counterfactuals, and mechanisms, beyond just prediction. Scaling obviously helps. But without causal structure, scaling mostly gives you sharper correlations. 2025 reinforced my belief that biological foundation models must be built around perturbation, uncertainty, and actionability, not just representation learning. 2. Benchmarks are holding biology back more than compute is. Let’s be honest: Benchmarking in AI & biology is still broken. Everyone reports SOTA. Everyone picks a different dataset slice. Everyone tunes for a different metric. Everyone avoids prospective validation. We’ve imported the worst habits of ML benchmarking into a domain where stakes are much higher. In biology and healthcare, a 1% gain that doesn’t transfer is worse than useless—it’s misleading. What’s missing isn’t more benchmarks. It’s hard benchmarks: •Prospective, not retrospective •Perturbation-based, not static •Multi-site, not single-lab •Failure-aware, not leaderboard-optimized If your model only works on the dataset that created it, it’s not a foundation model—it’s a dataset artifact. In 2026, we need fewer flashy plots and more humility, rigor, and negative results. 3. “Reasoning” in biology is not chain-of-thought. There’s a growing tendency to directly apply the word reasoning onto biological LLMs. Let’s be careful. Biological reasoning isn’t verbal fluency, longer context windows, or prettier explanations. Those are surface-level improvements. Real reasoning in biology shows up elsewhere: in forming hypotheses, deciding which experiments to run, updating beliefs when perturbations fail, and constantly trading off cost, risk, and uncertainty. A model that explains a pathway beautifully but can’t decide which experiment to run next is not reasoning, it’s narrating. 2025 convinced me that the future lies in agentic biological AI: systems that couple foundation models with experimentation, simulation, and decision-making loops. Closing thought: AI & biology is not lagging behind AI for code or language. It’s just playing a harder game. The constraints are real. The data is messy. The feedback loops are slow. The consequences matter. If 2025 clarified anything for me, it’s this: We won’t make progress by treating biology like text. We’ll make progress by building AI that behaves more like a scientist : skeptical, iterative, and willing to be wrong. Onward to 2026.

English

166

744

66.9K

Xavi Giró 리트윗함

Christopher Manning@chrmanning·1 Oca

Great to see an AI lab doing and publishing science (as well as discussing engineering efficiencies)! Some of the other “frontier” labs should try it! Thx, @deepseek_ai!

alphaXiv@askalphaxiv

DeepSeek just dropped a banger paper to wrap up 2025 "mHC: Manifold-Constrained Hyper-Connections" Hyper-Connections turn the single residual “highway” in transformers into n parallel lanes, and each layer learns how to shuffle and share signal between lanes. But if each layer can arbitrarily amplify or shrink lanes, the product of those shuffles across depth makes signals/gradients blow up or fade out. So they force each shuffle to be mass-conserving: a doubly stochastic matrix (nonnegative, every row/column sums to 1). Each layer can only redistribute signal across lanes, not create or destroy it, so the deep skip-path stays stable while features still mix! with n=4 it adds ~6.7% training time, but cuts final loss by ~0.02, and keeps worst-case backward gain ~1.6 (vs ~3000 without the constraint), with consistent benchmark wins across the board

English

1.1K

120K

Xavi Giró 리트윗함

Ivan Skorokhodov@isskoro·1 Oca

I think that JiT (arxiv.org/abs/2511.13720) might have been my favorite paper of 2025. From the discussions with my friends, it got quite some controversy with many people dismissing it as some trivial reinvention of x-prediction, so I would like to put my perspective on it here

English

561

69K

탐색

@Pinterest @Airbnb @NotionHQ @cursor_ai @eoghan @intercom @ychngji6 @pmddomingos