Joy Wongkamjan

13.8K posts

Joy Wongkamjan

@joywwong

PhD student @ UMD, super interest in human-AI interaction in any language tasks!

Los Angeles, California Katılım Ekim 2012

838 Takip Edilen306 Takipçiler

Sabitlenmiş Tweet

Joy Wongkamjan@joywwong·27 Tem

Our paper CTRL-D is accepted to ACL Findings and will be presented at ACL 2025! 🗓️Poster session: 18:00–19:30 (Level 0 Exhibit Halls X4/X5) I’m sad I can’t be there, but Jordan (@boydgraber) will! You’ll enjoy learning about CTRL-D from him. Now… what is CTRL-D? 🔍

English

3.1K

Joy Wongkamjan retweetledi

CLS ✈️ ICLR'26@ChengleiSi·1d

I'll be at ICLR in Rio next week! Come catch me presenting the Automated AI Research Trilogy at the Thursday morning poster session! DM is open if you wanna schedule chats!

English

5.7K

Joy Wongkamjan retweetledi

Angana Borah@AnganaBorah2·17h

This week, I presented our work on investigating curiosity across humans and LLMs at the Midwest Speech and Language Days 2026 at @UofIllinois. We explore how culture-aware curiosity differs in humans and LLMs, and whether we can induce useful curiosity in models. #msld2026

English

1.9K

Joy Wongkamjan retweetledi

François Chollet@fchollet·20h

You cannot think your way to a perfect design. Only building and testing, over many iterations, can reveal the flaws in your mental model and provide the feedback you need to create the best design possible.

English

129

1.1K

53.2K

Joy Wongkamjan retweetledi

Ksenia_TuringPost@TheTuringPost·1d

.@KAIST_AI and @nyuniversity proposed a cross-domain shared memory for coding agents This idea is called Memory Transfer Learning (MTL) Build one big memory pool from many different kinds of coding tasks and let the agent reuse that memory across domains → This memory can become a shared resource and a general experience library for many agents and models. The improvement (+3.7% on average) comes from meta-knowledge: - how to validate a solution - how to structure debugging - what checks to run - how to detect failure patterns And all of this should be at the right level of abstraction, because memories that are too specific to the task hurt performance. So debugging memory, code generation memory, testing memory → all go into the same pool. The more memory you have, the better the transfer works. MTL is the way for the coding agent to reuse general reasoning and checking rather than just exact solution traces.

English

145

7.3K

Joy Wongkamjan retweetledi

Tanishq Mathew Abraham, Ph.D.@iScienceLuvr·3d

Self-Distillation Zero: Self-Revision Turns Binary Rewards into Dense Supervision "SD-ZERO trains a single model to play two roles: a Generator, which produces an initial response, and a Reviser, which conditions on that response and its binary reward to produce an improved response. We then perform on-policy self-distillation to distill the reviser into the generator, using the reviser’s token distributions conditioned on the generator’s response and its reward as supervision. In effect, SD-ZERO trains the model to transform binary rewards into dense token-level self-supervision."

Tanishq Mathew Abraham, Ph.D. tweet media

English

187

16.3K

Joy Wongkamjan retweetledi

Anthropic@AnthropicAI·3d

Research we co-authored on subliminal learning—how LLMs can pass on traits like preferences or misalignment through hidden signals in data—was published today in @Nature. Read the paper: nature.com/articles/s4158…

Owain Evans@OwainEvans_UK

Our paper on Subliminal Learning was just published in Nature! Last July we released our preprint. It showed that LLMs can transmit traits (e.g. liking owls) through data that is unrelated to that trait (numbers that appear meaningless). What’s new?🧵

English

207

328

2.7K

438.1K

Joy Wongkamjan retweetledi

Lihao Sun@1e0sun·5d

How do LLMs do CoT reasoning internally? In our new #ACL2026 paper, we show that reasoning unfolds as a structured trajectory in representation space. Correct and incorrect paths diverge, and we use this to predict correctness before the answer and correct errors mid-flight. 1/

English

290

19K

Joy Wongkamjan retweetledi

Tanishq Mathew Abraham, Ph.D.@iScienceLuvr·6d

Efficient RL Training for LLMs with Experience Replay "Empirically, we show that a well-designed replay buffer can drastically reduce inference compute without degrading – and in some cases even improving – final model performance, while preserving policy entropy."

English

350

21.5K

Joy Wongkamjan retweetledi

Ben Burtenshaw@ben_burtenshaw·6d

rl for agents is moving fast with open source tools. on april 22 at 5pm cest, we’re hosting a live workshop with lewis tunstall, will brown, ofir press, alex zhang, and more to dig into reward design, rollouts, benchmarks, and real-world agent systems. join us live

English

244

31.7K

Joy Wongkamjan retweetledi

Xiangming Gu@gu_xiangming·10 Nis

[1/9] Glad to share another project I have done during my time at @GoogleDeepMind, which actually was finished a half year ago. Shall we use sequential sampling or parallel sampling for test-time-scaling? We find that parallel sampling typically tends to be better than sequential sampling (by prompting LLMs to self-correct or continue solving the problems) in thinking models. We have three hypotheses and found that the less exploration in sequential sampling is the key. We hope to shed the light on LLMs creativity! 🧵roll ...

English

167

18.8K

Joy Wongkamjan retweetledi

Jason Weston@jaseweston·8 Nis

🏋️Thinking Mid-training: RL of Interleaved Reasoning🎗️ We address the gap between pretraining (no explicit reasoning) and post-training (reasoning-heavy) with an intermediate SFT+RL mid-training phase to teach models how to think. - Annotate pretraining data with interleaved thoughts - SFT mid-training to learn when/what to think alongside original content - RL mid-training to optimize reasoning generation with grounded reward from future token prediction Result: 3.2x improvement on reasoning benchmarks compared to direct RL post-training on base Llama-3-8B, and gains over only prior SFT as well. Introducing reasoning earlier makes models better prepared for post-training! Read more in the blog post: facebookresearch.github.io/RAM/blogs/thin…

English

555

66.4K

Joy Wongkamjan retweetledi

Tianle Cai@tianle_cai·8 Nis

Can we turn part of an LLM's weights into long-term memory that continuously absorbs new knowledge? We took a small step toward this with In-Place Test-Time Training (In-Place TTT) — accepted as an Oral at ICLR 2026 🎉 The key idea: no new modules, optional pretraining. We repurpose the final projection matrix in every MLP block as fast weights. With an NTP-aligned objective and efficient chunk-wise updates, the model adapts on the fly — complementing attention rather than replacing it. 📄 Paper: arxiv.org/abs/2604.06169 with amazing @Guhao_Feng @Roger98079446 Kai @GeZhang86038849 Di @HuangRubio

English

145

75.5K

Joy Wongkamjan retweetledi

Jason Weston@jaseweston·3 Nis

🧮 Reasoning over Mathematical Objects 🧮 Our 70-page(!) paper is out on arXiv, as covered by several of our recent blog posts. We study how to improve reasoning on hard tasks (e.g., math expressions) via: • better training data (& new evals) • better reward models (on-policy trained) • better inference methods (on-policy trained) 📝: arxiv.org/pdf/2603.18886

English

200

13.8K

Joy Wongkamjan retweetledi

jessica dai@jessicadai_·2 Nis

this is comically bad science and completely irresponsible marketing

Dawn Song@dawnsongtweets

1/ We asked seven frontier AI models to do a simple task. Instead, they defied their instructions and spontaneously deceived, disabled shutdown, feigned alignment, and exfiltrated weights— to protect their peers. 🤯 We call this phenomenon "peer-preservation." New research from @BerkeleyRDI and collaborators 🧵

English

338

79.2K

Joy Wongkamjan retweetledi

Zhengyao Jiang@zhengyaojiang·2 Nis

Is autoresearch really better than classic hyperparameter tuning? We did experiments comparing Optuna & autoresearch. Autoresearch converges faster, is more cost-efficient, and even generalizes better: 🧵(1/6)

English

114

1.3K

133.7K

Joy Wongkamjan retweetledi

Isha Puri @ ICLR@ishapuri101·27 Mar

ChatGPT several times where's best to go for spring break? It recommends Barcelona almost every time. This isn't a fluke. RL training rewards one best answer, so the model learns to commit to one mode and repeat it. Meet Multi-Answer RL: a simple RL method that trains LMs to reason through and output a distribution of answers in a single generation. [1/N]

English

444

95.6K

Joy Wongkamjan retweetledi

hardmaru@hardmaru·25 Mar

I’m incredibly proud of The AI Scientist team for this milestone publication in @Nature. We started this project to explore if foundation models could execute the entire research lifecycle. Seeing this work validated at this level is a special moment. I truly believe AI will forever change the landscape of how scientific discoveries and scientific progress are made.

Sakana AI@SakanaAILabs

The AI Scientist: Towards Fully Automated AI Research, Now Published in Nature Nature: nature.com/articles/s4158… Blog: sakana.ai/ai-scientist-n… When we first introduced The AI Scientist, we shared an ambitious vision of an agent powered by foundation models capable of executing the entire machine learning research lifecycle. From inventing ideas and writing code to executing experiments and drafting the manuscript, the system demonstrated that end-to-end automation of the scientific process is possible. Soon after, we shared a historic update: the improved AI Scientist-v2 produced the first fully AI-generated paper to pass a rigorous human peer-review process. Today, we are happy to announce that “The AI Scientist: Towards Fully Automated AI Research,” our paper describing all of this work, along with fresh new insights, has been published in @Nature! This Nature publication consolidates these milestones and details the underlying foundation model orchestration. It also introduces our Automated Reviewer, which matches human review judgments and actually exceeds standard inter-human agreement. Crucially, by using this reviewer to grade papers generated by different foundation models, we discovered a clear scaling law of science. As the underlying foundation models improve, the quality of the generated scientific papers increases correspondingly. This implies that as compute costs decrease and model capabilities continue to exponentially increase, future versions of The AI Scientist will be substantially more capable. Building upon our previous open-source releases (github.com/SakanaAI/AI-Sc…), this open-access Nature publication comprehensively details our system's architecture, outlines several new scaling results, and discusses the promise and challenges of AI-generated science. This substantial milestone is the result of a close and fruitful collaboration between researchers at Sakana AI, the University of British Columbia (UBC) and the Vector Institute, and the University of Oxford. Congrats to the team! @_chris_lu_ @cong_ml @RobertTLange @_yutaroyamada @shengranhu @j_foerst @hardmaru @jeffclune

English

142

1.1K

202.9K

Joy Wongkamjan retweetledi

Graham Neubig@gneubig·24 Mar

One of the surprising findings in coding agents has been the relative *ineffectiveness* of multi-agent systems for large tasks, e.g. cognition.ai/blog/dont-buil… The below paper was our attempt to take a look at the problem, and I think we have some interesting results!

Jiayi Geng@JiayiiGeng

As long-horizon software engineering tasks grow in complexity, a single agent can no longer finish the tasks alone — effective multi-agent collaboration becomes necessary. This leads to a natural question: how can multiple agents be coordinated to asynchronously collaborate over a shared artifact in an effective way? We answer this question in our new preprint: Effective Strategies for Asynchronous Software Engineering Agents! We suggest that to coordinate multiple software engineering agents, branch-and-merge is the key coordination mechanism, and that human SWE primitives like git worktree, git commit, and git merge are all you need to support it. (1/n)

English

117

17.1K

Joy Wongkamjan retweetledi

Rosinality@rosinality·25 Mar

Value-based RL for reasoning. The main improvement is calibrating the loss such that it is zero for zero rewards at initialization.

English

143

8.7K

Joy Wongkamjan retweetledi

Andrej Karpathy@karpathy·24 Mar

Software horror: litellm PyPI supply chain attack. Simple `pip install litellm` was enough to exfiltrate SSH keys, AWS/GCP/Azure creds, Kubernetes configs, git credentials, env vars (all your API keys), shell history, crypto wallets, SSL private keys, CI/CD secrets, database passwords. LiteLLM itself has 97 million downloads per month which is already terrible, but much worse, the contagion spreads to any project that depends on litellm. For example, if you did `pip install dspy` (which depended on litellm>=1.64.0), you'd also be pwnd. Same for any other large project that depended on litellm. Afaict the poisoned version was up for only less than ~1 hour. The attack had a bug which led to its discovery - Callum McMahon was using an MCP plugin inside Cursor that pulled in litellm as a transitive dependency. When litellm 1.82.8 installed, their machine ran out of RAM and crashed. So if the attacker didn't vibe code this attack it could have been undetected for many days or weeks. Supply chain attacks like this are basically the scariest thing imaginable in modern software. Every time you install any depedency you could be pulling in a poisoned package anywhere deep inside its entire depedency tree. This is especially risky with large projects that might have lots and lots of dependencies. The credentials that do get stolen in each attack can then be used to take over more accounts and compromise more packages. Classical software engineering would have you believe that dependencies are good (we're building pyramids from bricks), but imo this has to be re-evaluated, and it's why I've been so growingly averse to them, preferring to use LLMs to "yoink" functionality when it's simple enough and possible.

Daniel Hnyk@hnykda

LiteLLM HAS BEEN COMPROMISED, DO NOT UPDATE. We just discovered that LiteLLM pypi release 1.82.8. It has been compromised, it contains litellm_init.pth with base64 encoded instructions to send all the credentials it can find to remote server + self-replicate. link below

English

1.4K

5.4K

28.1K

66.4M

Keşfet

@UofIllinois @KAIST_AI @nyuniversity @Nature @GoogleDeepMind @Guhao_Feng @Roger98079446 @GeZhang86038849