Ethan Xu

302 posts

Ethan Xu

Ethan Xu

@LinjieXu

Researcher at Shanghai X Lab @HKUniversity. Prev. intern at Apple and Microsoft Research.

Katılım Ocak 2019
642 Takip Edilen166 Takipçiler
Sabitlenmiş Tweet
Ethan Xu
Ethan Xu@LinjieXu·
(1/3) Enterprise RDBs rarely change their structure, but meet new ML tasks every day. The RDB foundation model (FM) fits this position well because no task-specific training is needed. Our latest work uses intra-column encoding and tabular FMs, achieving SOTA performance.
Ethan Xu tweet media
English
1
0
1
410
Ethan Xu retweetledi
Zonglin Yang
Zonglin Yang@Yang_zy223·
🔬 We post-train LLMs for math, for code, for instruction-following. Why not for scientific discovery? No model has been post-trained specifically for hypothesis generation. MOOSE-Star is a first step, with scaling laws suggesting there's much more to unlock.
MiroMindAI@miromind_ai

🚨 LLM-based scientific hypothesis discovery now has a scalable training recipe. MOOSE-Star, accepted at ICML 2026, enables scalable training for hypothesis generation, with more scalable test-time scaling. By our researchers— x.com/Yang_zy223/sta…

English
0
1
4
327
Ethan Xu
Ethan Xu@LinjieXu·
(3/3) We open-sourced the RDBLearn toolkit arxiv.org/abs/2602.18495. It's agent-friendly. Try it out with only **two** prompts on your own RDBs.
Ethan Xu tweet media
English
0
0
0
47
Ethan Xu
Ethan Xu@LinjieXu·
(2/3) In arxiv.org/pdf/2602.13697, we provide theoretical and empirical analysis to discuss what data embedding RDB FMs might require.
English
1
0
0
47
Ethan Xu
Ethan Xu@LinjieXu·
(1/3) Enterprise RDBs rarely change their structure, but meet new ML tasks every day. The RDB foundation model (FM) fits this position well because no task-specific training is needed. Our latest work uses intra-column encoding and tabular FMs, achieving SOTA performance.
Ethan Xu tweet media
English
1
0
1
410
Ethan Xu
Ethan Xu@LinjieXu·
1/3) In offline RL, the learned Q function is better than you thought. Many methods use policy constraints mainly to stabilize Q learning. But the ultimate goal for offline RL is to get a good policy (not Q). Check out our work accepted by @TmlrPub! openreview.net/forum?id=imARO…
English
2
3
14
915
Ethan Xu
Ethan Xu@LinjieXu·
@or_rivlin @seohong_park Good point. Policies like DDPG can only select one ground-truth action and can suffer from multiple-action cases. Diffusion policy seems has no such a drawback. DT neither. (Under the vanilla behavior-cloning constraint)
English
0
0
1
159
Or Rivlin
Or Rivlin@or_rivlin·
@seohong_park Regarding the contraint in DDPG, it seems like a "diatribution" constraint that might inhibit performance (data has both left and right turn from state, we constrain both), can we get "support" constraints instead? (Maybe AWR as the constraint?)
English
2
0
0
221
Seohong Park
Seohong Park@seohong_park·
Most works in offline RL focus on learning better value functions. So value learning is the main bottleneck in offline RL... right? In our new paper, we show that this is *not* the case in general! Paper: arxiv.org/abs/2406.09329 Blog post: seohong.me/projects/offrl… A thread ↓
English
6
53
333
56.7K
Ethan Xu
Ethan Xu@LinjieXu·
@aviral_kumar2 Finally this observation is well formatted and analyzed. Please consider cite our work that also shout out for good Q is learned offline. We show that AWAC, TD3-BC and D-QL benefits from a milder constrained evaluation policy. arxiv.org/abs/2306.03680 recently accepted TMLR url->
English
1
0
3
123
Aviral Kumar
Aviral Kumar@aviral_kumar2·
Conventional wisdom: the BIG blocker holding offline RL behind imitation / SFT, preventing good scaling, etc is the value function. But can we still do well with current value functions? We find: often *policy* learning bottlenecks offline RL scaling: arxiv.org/abs/2406.09329 🧵
English
6
29
136
15.2K
Ethan Xu
Ethan Xu@LinjieXu·
By masking a small part of the prompt, our LLM protector defenses harmful prompt without loosing much of its content. Check this cool work led by Zichuan Liu @c93l6IhoSgV2Iqi !
Zichuan Liu@c93l6IhoSgV2Iqi

Protecting Your LLMs with Information Bottleneck arxiv.org/abs/2404.13968 The authors use Information Bottleneck to defend against potential alignment breaking attacks in LLMs, which has strong alignment checking and does not require any fine-tuning of target LLMs. #LLMs #AI

English
1
0
1
308
Timo Bertram
Timo Bertram@BertramTimo·
New paper (which is a much improved version of our first paper from 2021) just got accepted into CoG! See you all in Milan :)
Timo Bertram tweet media
English
3
0
8
524
Ethan Xu
Ethan Xu@LinjieXu·
3/3 Got accepted by IEEE COG. Thanks to coauthors Zichuang Liu, Alexander Dockhorn, @diego_pliebana, Jinyu Wang, Lei Song, and Jiang Bian. @GameAI_QMUL
English
0
0
0
123
Ethan Xu
Ethan Xu@LinjieXu·
The motivation for this work is 1) MARL usually uses 2~10 million TRAJECTORIES, which is time-consuming. 2) most attention lands on the algorithmic design but overlooked the old MARL training codebase
English
1
0
0
94
Ethan Xu
Ethan Xu@LinjieXu·
Using a higher Replay Ratio (RR) in MARL remarkably improves the sample efficiency and converged performance. We also find that the RNN agent maintains the network plasticity well thus techniques such as resetting are not required. arxiv.org/abs/2404.09715
Ethan Xu tweet media
English
2
2
4
297