Xin Wen

88 posts

Xin Wen banner
Xin Wen

Xin Wen

@_xwen_

PhD @HKUniversity | Research Scientist Intern @Meta Superintelligence Labs

New York Katılım Kasım 2017
893 Takip Edilen333 Takipçiler
Sabitlenmiş Tweet
Xin Wen
Xin Wen@_xwen_·
Can we decouple semantics from spectrum for image tokenizers? Our answer: the 1D Semanticist tokenizer. We push the burden of photo-realistic image generation to diffusion decoders, and let the tokens focus on the semantic structure. The PCA-like structure is induced by nested CFG (dropout), allowing plausible reconstruction & generation with very few tokens, and coarse-to-fine hierarchy. Thanks to semantic-spectrum decoupling, our tokenizer also achieves strong performance on ImageNet linear probing, indicating potential for unified understanding and generation. To check for more details, see you 1-2 pm TODAY at ExHall D, GMCV Workshop! "Principal Components" Enable A New Language of Images Project Page: visual-gen.github.io/semanticist/ Code: github.com/visual-gen/sem…
Xin Wen tweet media
English
2
5
15
18.3K
Xin Wen retweetledi
Fei Xia
Fei Xia@xf1280·
Our colleague Bernie Huang did something even cooler, where he asked the model to "divide busy tiles further as needed", and got even better results. This leverages the fact the model can render and reflect. meta.ai/share/zE4QJiP3…
Fei Xia tweet media
English
1
1
14
1.1K
Xin Wen retweetledi
Shengjia Zhao
Shengjia Zhao@shengjia_zhao·
Excited to share what we’ve been building at Meta Superintelligence Labs! We just released Muse Spark, our first AI model. It's a natively multimodal reasoning model and the first step on our path to personal superintelligence. We've overhauled our entire stack to support scaling, and this is just the beginning. ai.meta.com/blog/introduci…
Shengjia Zhao tweet media
English
74
172
1.7K
225.8K
Xin Wen
Xin Wen@_xwen_·
Congrats @BingchenZhao!
Zhengyao Jiang@zhengyaojiang

We're excited to announce that @BingchenZhao, who built the predecessor of AutoResearch, has joined @WecoAI full-time! Bingchen is the first author of LLMSpeedrunner at Meta FAIR, which ran the automated research loop on @karpathy's NanoGPT, which later evolved into NanoChat and the speedrun community where AutoResearch operates today. Weco has been committed to ML research automation for 2.5 years, starting with AIDE. We're super pumped by how large an impact AIDE has had, topping @OpenAI's MLE-Bench and @METR_Evals' RE-Bench, and becoming a foundation for AI Scientist v2, AIRA-Dojo, and LLMSpeedrunner itself. And AutoResearch, with AIDE's simple greedy discard/keep loop reaching a mass audience, is really building consensus that the empirical research loop can and should be automated. We're excited to keep pushing this frontier, not just as a concept but seriously bringing it to the real world, and materially accelerating the knowledge generation of humanity.

English
0
0
5
274
Xin Wen retweetledi
Phillip Isola
Phillip Isola@phillip_isola·
Sharing “Neural Thickets”. We find: In large models, the neighborhood around pretrained weights can become dense with task-improving solutions. In this regime, post-training can be easy; even random guessing works Paper: arxiv.org/abs/2603.12228 Web: thickets.mit.edu 1/
Phillip Isola tweet media
English
26
124
919
139.4K
Xin Wen retweetledi
Saining Xie
Saining Xie@sainingxie·
another scientific exploration from @TongPetersb, @DavidJFan, and @__JohnNguyen__ that might teach you something new, even if you’re in a frontier lab lots of interesting observations here, but I’ll highlight just one: - it’s kind of an open industry secret that trying to scale DiTs with MoE has mostly been fruitless. - the unexpected, yet intuitive, synergy between RAE and MoE might actually change that.
Peter Tong@TongPetersb

Train Beyond Language. We bet on the visual world as the critical next step alongside and beyond language modeling. So, we studied building foundation models from scratch with vision. We share our exploration: visual representations, data, world modeling, architecture, and scaling behavior! [1/9]

English
2
31
241
26.5K
Xin Wen retweetledi
Jana
Jana@Jana_Zeller·
Can AI reason by “imagining” — not just by seeing or reading? We introduce Mentis Oculi, a benchmark for machine mental imagery: multi-step visual puzzles that require maintaining and updating visual states over time. 📄 arxiv.org/abs/2602.02465 🌐 jana-z.github.io/mentis-oculi/ 🧵⬇️
English
9
22
107
9.2K
Xin Wen retweetledi
Logical Intelligence
Logical Intelligence@logic_int·
Why Sudoku? It tests constraint satisfaction and LLMs can't do it. LLMs are good at language problems. Robotics and industrial control systems problems aren't like that. That's why we created a new architecture that is the foundation of Kona’s Energy-Based Reasoning Model.
Logical Intelligence tweet media
English
11
13
138
33.9K
Xin Wen retweetledi
Relja Arandjelović
Relja Arandjelović@relja_work·
"we knew that international law applied with varying rigour depending on the identity of the accused or the victim." The emperor is naked. Only took "the west" decades to realize it, the rest of the world laughs (or 🤮) whenever a NATO country mentions "international law"
Carole Cadwalladr@carolecadwalla

This speech *is* one for the history books. But that’s less a compliment, than a coda. Carney has given us the words to mark the end of the ‘rules-based order’…by acknowledging it never really existed. It was a collective illusion. That now is over.

English
0
2
5
698
Xin Wen retweetledi
Saining Xie
Saining Xie@sainingxie·
Introducing Cambrian-S it’s a position, a dataset, a benchmark, and a model but above all, it represents our first steps toward exploring spatial supersensing in video. 🧶
English
28
95
667
240.2K
Xin Wen retweetledi
Shraman Pramanick
Shraman Pramanick@Shramanpramani2·
My role at Meta's SAM team (MSL, previously at FAIR Perception) has been impacted within 3 months of joining after PhD. If you work with multimodal LLMs for grounding or complex reasoning, or have a long-term vision of unified understanding and generation, let's talk. I am on the job market starting immediately. #metalayoffs #FAIR #MSL #SAM
Jiaxun Cui 🐿️@cuijiaxun

Meta has gone crazy on the squid game! Many new PhD NGs are deactivated today (I am also impacted🥲 happy to chat)

English
26
26
338
109.7K
Xin Wen retweetledi
Yunzhi Zhang
Yunzhi Zhang@zhang_yunzhi·
Introducing Ctrl-VI, a video sampling method allowing for a flexible set of user controls—ranging from coarse but easy-to-specify text prompts to precise camera/object trajectories. (1/n) arxiv.org/abs/2510.07670
English
4
30
229
42.4K
Xin Wen retweetledi
Lucas Beyer (bl16)
Lucas Beyer (bl16)@giffmana·
One sign of this being a really cool idea is that while reading, I had tons of follow-up ideas immediately come to mind, and only few "hmm but"s. Plz read thread and paper, but TLDR: add layer of input independent kv, and fine-tune only the high tfidf kvs for continual learning.
Jessy Lin@realJessyLin

🧠 How can we equip LLMs with memory that allows them to continually learn new things? In our new paper with @AIatMeta, we show how sparsely finetuning memory layers enables targeted updates for continual learning, w/ minimal interference with existing knowledge. While full finetuning and LoRA see drastic drops in held-out task performance (📉-89% FT, -71% LoRA on fact learning tasks), memory layers learn the same amount with far less forgetting (-11%). 🧵:

English
12
53
791
85.8K
Xin Wen retweetledi
Konpat Ta Preechakul
Konpat Ta Preechakul@konpatp·
(2/7) When building a house, we go floor by floor. When digging, we go layer by layer. More workers won’t speed it up — the steps are serial. Some tasks simply can’t be shortcut. In computation, these are called P-complete — the hardest problems in P, believed to be not parallelizable. And our lives are full of them. In our paper, we argue: Many interesting tasks — cellular automata, physics, video prediction, RL, math — are essentially P-complete (or worse). In these cases: Serial compute > parallel compute.
Konpat Ta Preechakul tweet media
English
2
2
23
2.6K
Xin Wen retweetledi
Konpat Ta Preechakul
Konpat Ta Preechakul@konpatp·
Diffusion models operate step-by-step, hence they are serial models right? However, that doesn't sit well because we have been seeing that diffusion models don't scale well with "steps". In this thread: Diffusion models are not truly serial models.
Konpat Ta Preechakul tweet media
English
14
82
700
79.5K
Xin Wen retweetledi
Saining Xie
Saining Xie@sainingxie·
three years ago, DiT replaced the legacy unet with a transformer-based denoising backbone. we knew the bulky VAEs would be the next to go -- we just waited until we could do it right. today, we introduce Representation Autoencoders (RAE). >> Retire VAEs. Use RAEs. 👇(1/n)
Saining Xie tweet media
English
57
328
1.9K
414K
Xin Wen retweetledi
Simon
Simon@tokumin·
love these "what does the red arrow see" google maps transforms with nano-banana
Simon tweet mediaSimon tweet media
English
79
290
4.1K
3.1M