Ethan Xu

302 posts

Ethan Xu

@LinjieXu

Researcher at Shanghai X Lab @HKUniversity. Prev. intern at Apple and Microsoft Research.

Katılım Ocak 2019

642 Takip Edilen166 Takipçiler

Sabitlenmiş Tweet

Ethan Xu@LinjieXu·26 Şub

(1/3) Enterprise RDBs rarely change their structure, but meet new ML tasks every day. The RDB foundation model (FM) fits this position well because no task-specific training is needed. Our latest work uses intra-column encoding and tabular FMs, achieving SOTA performance.

English

410

Ethan Xu retweetledi

Zonglin Yang@Yang_zy223·4d

🔬 We post-train LLMs for math, for code, for instruction-following. Why not for scientific discovery? No model has been post-trained specifically for hypothesis generation. MOOSE-Star is a first step, with scaling laws suggesting there's much more to unlock.

MiroMindAI@miromind_ai

🚨 LLM-based scientific hypothesis discovery now has a scalable training recipe. MOOSE-Star, accepted at ICML 2026, enables scalable training for hypothesis generation, with more scalable test-time scaling. By our researchers— x.com/Yang_zy223/sta…

English

327

Ethan Xu@LinjieXu·7 May

Our latest work has been accepted as a regular paper by ICML 2026. Can't wait to see many old/new friends in Seoul. arxiv.org/abs/2602.13697

Ethan Xu@LinjieXu

English

287

Ethan Xu@LinjieXu·26 Şub

(3/3) We open-sourced the RDBLearn toolkit arxiv.org/abs/2602.18495. It's agent-friendly. Try it out with only **two** prompts on your own RDBs.

English

Ethan Xu@LinjieXu·26 Şub

(2/3) In arxiv.org/pdf/2602.13697, we provide theoretical and empirical analysis to discuss what data embedding RDB FMs might require.

English

Ethan Xu@LinjieXu·26 Şub

English

410

Ethan Xu@LinjieXu·17 Haz

@seohong_park Finally, thanks to the amazing collaborators @zhengyaojiang, Jinyu, Lei and Jiang

English

148

Ethan Xu@LinjieXu·17 Haz

@seohong_park Related paper mentioned: D-QL: arxiv.org/pdf/2208.06193, Q-regularized DT: arxiv.org/pdf/2405.17098, value learning and policy generalization: arxiv.org/pdf/2406.09329

English

134

Ethan Xu@LinjieXu·17 Haz

1/3) In offline RL, the learned Q function is better than you thought. Many methods use policy constraints mainly to stabilize Q learning. But the ultimate goal for offline RL is to get a good policy (not Q). Check out our work accepted by @TmlrPub! openreview.net/forum?id=imARO…

English

915

Ethan Xu@LinjieXu·16 Haz

@or_rivlin @seohong_park Good point. Policies like DDPG can only select one ground-truth action and can suffer from multiple-action cases. Diffusion policy seems has no such a drawback. DT neither. (Under the vanilla behavior-cloning constraint)

English

159

Or Rivlin@or_rivlin·16 Haz

@seohong_park Regarding the contraint in DDPG, it seems like a "diatribution" constraint that might inhibit performance (data has both left and right turn from state, we constrain both), can we get "support" constraints instead? (Maybe AWR as the constraint?)

English

221

Seohong Park@seohong_park·14 Haz

Most works in offline RL focus on learning better value functions. So value learning is the main bottleneck in offline RL... right? In our new paper, we show that this is *not* the case in general! Paper: arxiv.org/abs/2406.09329 Blog post: seohong.me/projects/offrl… A thread ↓

English

333

56.7K

Ethan Xu@LinjieXu·15 Haz

@aviral_kumar2 openreview.net/forum?id=imARO… we also compared our method to test-time sampling methods.

English

Ethan Xu@LinjieXu·14 Haz

@aviral_kumar2 Finally this observation is well formatted and analyzed. Please consider cite our work that also shout out for good Q is learned offline. We show that AWAC, TD3-BC and D-QL benefits from a milder constrained evaluation policy. arxiv.org/abs/2306.03680 recently accepted TMLR url->

English

123

Aviral Kumar@aviral_kumar2·14 Haz

Conventional wisdom: the BIG blocker holding offline RL behind imitation / SFT, preventing good scaling, etc is the value function. But can we still do well with current value functions? We find: often *policy* learning bottlenecks offline RL scaling: arxiv.org/abs/2406.09329 🧵

English

136

15.2K

Ethan Xu@LinjieXu·12 Haz

@AlbertQJiang lol for the red note

English

340

Albert Jiang@AlbertQJiang·11 Haz

Join us to build with the best colleagues! Offices in France, UK, and US west coast.

Arthur Mensch@arthurmensch

We are announcing €600M in Series B funding for our first anniversary. We are grateful to our new and existing investors for their continued confidence and support for our global expansion. This will accelerate our roadmap as we continue to bring frontier AI into everyone’s hands.

English

120

33.3K

Ethan Xu@LinjieXu·23 Nis

By masking a small part of the prompt, our LLM protector defenses harmful prompt without loosing much of its content. Check this cool work led by Zichuan Liu @c93l6IhoSgV2Iqi !

Zichuan Liu@c93l6IhoSgV2Iqi

Protecting Your LLMs with Information Bottleneck arxiv.org/abs/2404.13968 The authors use Information Bottleneck to defend against potential alignment breaking attacks in LLMs, which has strong alignment checking and does not require any fine-tuning of target LLMs. #LLMs #AI

English

308

Ethan Xu@LinjieXu·16 Nis

@BertramTimo Congrats!

English

Timo Bertram@BertramTimo·15 Nis

New paper (which is a much improved version of our first paper from 2021) just got accepted into CoG! See you all in Milan :)

English

524

Ethan Xu@LinjieXu·16 Nis

3/3 Got accepted by IEEE COG. Thanks to coauthors Zichuang Liu, Alexander Dockhorn, @diego_pliebana, Jinyu Wang, Lei Song, and Jiang Bian. @GameAI_QMUL

English

123

Ethan Xu@LinjieXu·16 Nis

The motivation for this work is 1) MARL usually uses 2~10 million TRAJECTORIES, which is time-consuming. 2) most attention lands on the algorithmic design but overlooked the old MARL training codebase

English

Ethan Xu@LinjieXu·16 Nis

Using a higher Replay Ratio (RR) in MARL remarkably improves the sample efficiency and converged performance. We also find that the RNN agent maintains the network plasticity well thus techniques such as resetting are not required. arxiv.org/abs/2404.09715

English

297

Keşfet

@seohong_park @zhengyaojiang @TmlrPub @or_rivlin @aviral_kumar2 @AlbertQJiang @c93l6IhoSgV2Iqi @BertramTimo