Xin Wen

88 posts

Xin Wen

@_xwen_

PhD @HKUniversity | Research Scientist Intern @Meta Superintelligence Labs

New York Katılım Kasım 2017

893 Takip Edilen333 Takipçiler

Sabitlenmiş Tweet

Xin Wen@_xwen_·11 Haz

Can we decouple semantics from spectrum for image tokenizers? Our answer: the 1D Semanticist tokenizer. We push the burden of photo-realistic image generation to diffusion decoders, and let the tokens focus on the semantic structure. The PCA-like structure is induced by nested CFG (dropout), allowing plausible reconstruction & generation with very few tokens, and coarse-to-fine hierarchy. Thanks to semantic-spectrum decoupling, our tokenizer also achieves strong performance on ImageNet linear probing, indicating potential for unified understanding and generation. To check for more details, see you 1-2 pm TODAY at ExHall D, GMCV Workshop! "Principal Components" Enable A New Language of Images Project Page: visual-gen.github.io/semanticist/ Code: github.com/visual-gen/sem…

English

18.3K

Xin Wen retweetledi

Fei Xia@xf1280·2d

Our colleague Bernie Huang did something even cooler, where he asked the model to "divide busy tiles further as needed", and got even better results. This leverages the fact the model can render and reflect. meta.ai/share/zE4QJiP3…

English

1.1K

Xin Wen retweetledi

Shengjia Zhao@shengjia_zhao·8 Nis

Excited to share what we’ve been building at Meta Superintelligence Labs! We just released Muse Spark, our first AI model. It's a natively multimodal reasoning model and the first step on our path to personal superintelligence. We've overhauled our entire stack to support scaling, and this is just the beginning. ai.meta.com/blog/introduci…

English

172

1.7K

225.8K

Xin Wen retweetledi

Baifeng@baifeng_shi·24 Mar

Humans can see in high-res, high-FPS in real-time. Why can't VLMs? Introducing AutoGaze: ViTs/VLMs "gaze" only at key video regions! Up to 4-100x token savings, 19x speedup, and enables scaling to 4K-res 1K-frame videos. 📄 arxiv.org/abs/2603.12254 🌐 autogaze.github.io 🤗 huggingface.co/collections/bf… (1/n)🧵

English

200

1.6K

151K

Xin Wen@_xwen_·18 Mar

Congrats @BingchenZhao!

Zhengyao Jiang@zhengyaojiang

We're excited to announce that @BingchenZhao, who built the predecessor of AutoResearch, has joined @WecoAI full-time! Bingchen is the first author of LLMSpeedrunner at Meta FAIR, which ran the automated research loop on @karpathy's NanoGPT, which later evolved into NanoChat and the speedrun community where AutoResearch operates today. Weco has been committed to ML research automation for 2.5 years, starting with AIDE. We're super pumped by how large an impact AIDE has had, topping @OpenAI's MLE-Bench and @METR_Evals' RE-Bench, and becoming a foundation for AI Scientist v2, AIRA-Dojo, and LLMSpeedrunner itself. And AutoResearch, with AIDE's simple greedy discard/keep loop reaching a mass audience, is really building consensus that the empirical research loop can and should be automated. We're excited to keep pushing this frontier, not just as a concept but seriously bringing it to the real world, and materially accelerating the knowledge generation of humanity.

English

274

Xin Wen retweetledi

Phillip Isola@phillip_isola·13 Mar

Sharing “Neural Thickets”. We find: In large models, the neighborhood around pretrained weights can become dense with task-improving solutions. In this regime, post-training can be easy; even random guessing works Paper: arxiv.org/abs/2603.12228 Web: thickets.mit.edu 1/

English

124

919

139.4K

Xin Wen@_xwen_·10 Mar

@sainingxie @ylecun @amilabs Congrats!

English

176

Saining Xie@sainingxie·10 Mar

i’m joining forces with @ylecun and an incredible group of people to start AMI Labs @amilabs. AMI isn’t a conventional lab. we don’t intend to become one. a lot to say about why this moment matters, but for now we’re heads down building. join us: amilabs.xyz

AMI Labs@amilabs

Advanced Machine Intelligence (AMI) is building a new breed of AI systems that understand the world, have persistent memory, can reason and plan, and are controllable and safe. We’ve raised a $1.03B (~€890M) round from global investors who believe in our vision of universally intelligent systems centered on world models. This round is co-led by Cathay Innovation, Greycroft, Hiro Capital, HV Capital, and Bezos Expeditions, along with other investors and angels across the world. We are a growing team of researchers and builders, operating in Paris, New York, Montreal and Singapore from day one. Read more: amilabs.xyz AMI - Real world. Real intelligence.

English

153

161

2.8K

478.4K

Xin Wen retweetledi

Saining Xie@sainingxie·5 Mar

another scientific exploration from @TongPetersb, @DavidJFan, and @__JohnNguyen__ that might teach you something new, even if you’re in a frontier lab lots of interesting observations here, but I’ll highlight just one: - it’s kind of an open industry secret that trying to scale DiTs with MoE has mostly been fruitless. - the unexpected, yet intuitive, synergy between RAE and MoE might actually change that.

Peter Tong@TongPetersb

Train Beyond Language. We bet on the visual world as the critical next step alongside and beyond language modeling. So, we studied building foundation models from scratch with vision. We share our exploration: visual representations, data, world modeling, architecture, and scaling behavior! [1/9]

English

241

26.5K

Xin Wen retweetledi

Jana@Jana_Zeller·3 Şub

Can AI reason by “imagining” — not just by seeing or reading? We introduce Mentis Oculi, a benchmark for machine mental imagery: multi-step visual puzzles that require maintaining and updating visual states over time. 📄 arxiv.org/abs/2602.02465 🌐 jana-z.github.io/mentis-oculi/ 🧵⬇️

English

107

9.2K

Xin Wen retweetledi

Logical Intelligence@logic_int·3 Şub

Why Sudoku? It tests constraint satisfaction and LLMs can't do it. LLMs are good at language problems. Robotics and industrial control systems problems aren't like that. That's why we created a new architecture that is the foundation of Kona’s Energy-Based Reasoning Model.

English

138

33.9K

Xin Wen retweetledi

Relja Arandjelović@relja_work·22 Oca

"we knew that international law applied with varying rigour depending on the identity of the accused or the victim." The emperor is naked. Only took "the west" decades to realize it, the rest of the world laughs (or 🤮) whenever a NATO country mentions "international law"

Carole Cadwalladr@carolecadwalla

This speech *is* one for the history books. But that’s less a compliment, than a coda. Carney has given us the words to mark the end of the ‘rules-based order’…by acknowledging it never really existed. It was a collective illusion. That now is over.

English

698

Xin Wen retweetledi

Saining Xie@sainingxie·7 Kas

Introducing Cambrian-S it’s a position, a dataset, a benchmark, and a model but above all, it represents our first steps toward exploring spatial supersensing in video. 🧶

English

667

240.2K

Xin Wen retweetledi

John Nguyen@__JohnNguyen__·5 Kas

Great by talk by @LukeZettlemoyer summarizing our team’s work. We have more on the way this year ⏲️

Conference on Language Modeling@COLM_conf

COLM Keynotes: Luke Zettlemoyer Mixed-modal Language Modeling youtu.be/PdsKNtEofFY

English

912

Xin Wen@_xwen_·1 Kas

Still we want good (or even better) generation quality w/o guidance, and it's possible if you have good enough representations. See more from our recent paper: arxiv.org/abs/2507.08441

Sander Dieleman@sedielem

I contemplated whether I should post this, because it seems kind of obvious. But it's often taken for granted, so we might underestimate the impact: e.g. these days, diffusion papers don't usually show samples without guidance anymore (figures from GLIDE arxiv.org/abs/2112.10741)

English

192

Xin Wen retweetledi

Shraman Pramanick@Shramanpramani2·24 Eki

My role at Meta's SAM team (MSL, previously at FAIR Perception) has been impacted within 3 months of joining after PhD. If you work with multimodal LLMs for grounding or complex reasoning, or have a long-term vision of unified understanding and generation, let's talk. I am on the job market starting immediately. #metalayoffs #FAIR #MSL #SAM

Jiaxun Cui 🐿️@cuijiaxun

Meta has gone crazy on the squid game! Many new PhD NGs are deactivated today (I am also impacted🥲 happy to chat)

English

338

109.7K

Xin Wen retweetledi

Yunzhi Zhang@zhang_yunzhi·17 Eki

Introducing Ctrl-VI, a video sampling method allowing for a flexible set of user controls—ranging from coarse but easy-to-specify text prompts to precise camera/object trajectories. (1/n) arxiv.org/abs/2510.07670

English

229

42.4K

Xin Wen retweetledi

Lucas Beyer (bl16)@giffmana·22 Eki

One sign of this being a really cool idea is that while reading, I had tons of follow-up ideas immediately come to mind, and only few "hmm but"s. Plz read thread and paper, but TLDR: add layer of input independent kv, and fine-tune only the high tfidf kvs for continual learning.

Jessy Lin@realJessyLin

🧠 How can we equip LLMs with memory that allows them to continually learn new things? In our new paper with @AIatMeta, we show how sparsely finetuning memory layers enables targeted updates for continual learning, w/ minimal interference with existing knowledge. While full finetuning and LoRA see drastic drops in held-out task performance (📉-89% FT, -71% LoRA on fact learning tasks), memory layers learn the same amount with far less forgetting (-11%). 🧵:

English

791

85.8K

Xin Wen retweetledi

Konpat Ta Preechakul@konpatp·21 Tem

(2/7) When building a house, we go floor by floor. When digging, we go layer by layer. More workers won’t speed it up — the steps are serial. Some tasks simply can’t be shortcut. In computation, these are called P-complete — the hardest problems in P, believed to be not parallelizable. And our lives are full of them. In our paper, we argue: Many interesting tasks — cellular automata, physics, video prediction, RL, math — are essentially P-complete (or worse). In these cases: Serial compute > parallel compute.

English

2.6K

Xin Wen retweetledi

Konpat Ta Preechakul@konpatp·12 Eki

Diffusion models operate step-by-step, hence they are serial models right? However, that doesn't sit well because we have been seeing that diffusion models don't scale well with "steps". In this thread: Diffusion models are not truly serial models.

English

700

79.5K

Xin Wen retweetledi

Saining Xie@sainingxie·14 Eki

three years ago, DiT replaced the legacy unet with a transformer-based denoising backbone. we knew the bulky VAEs would be the next to go -- we just waited until we could do it right. today, we introduce Representation Autoencoders (RAE). >> Retire VAEs. Use RAEs. 👇(1/n)