Joy Wongkamjan

13.8K posts

Joy Wongkamjan banner
Joy Wongkamjan

Joy Wongkamjan

@joywwong

PhD student @ UMD, super interest in human-AI interaction in any language tasks!

Los Angeles, California Katılım Ekim 2012
838 Takip Edilen306 Takipçiler
Sabitlenmiş Tweet
Joy Wongkamjan
Joy Wongkamjan@joywwong·
Our paper CTRL-D is accepted to ACL Findings and will be presented at ACL 2025! 🗓️Poster session: 18:00–19:30 (Level 0 Exhibit Halls X4/X5) I’m sad I can’t be there, but Jordan (@boydgraber) will! You’ll enjoy learning about CTRL-D from him. Now… what is CTRL-D? 🔍
Joy Wongkamjan tweet media
English
3
7
14
3.1K
Joy Wongkamjan retweetledi
CLS ✈️ ICLR'26
CLS ✈️ ICLR'26@ChengleiSi·
I'll be at ICLR in Rio next week! Come catch me presenting the Automated AI Research Trilogy at the Thursday morning poster session! DM is open if you wanna schedule chats!
CLS ✈️ ICLR'26 tweet media
English
0
14
90
5.7K
Joy Wongkamjan retweetledi
Angana Borah
Angana Borah@AnganaBorah2·
This week, I presented our work on investigating curiosity across humans and LLMs at the Midwest Speech and Language Days 2026 at @UofIllinois. We explore how culture-aware curiosity differs in humans and LLMs, and whether we can induce useful curiosity in models. #msld2026
Angana Borah tweet mediaAngana Borah tweet media
English
1
6
48
1.9K
Joy Wongkamjan retweetledi
François Chollet
François Chollet@fchollet·
You cannot think your way to a perfect design. Only building and testing, over many iterations, can reveal the flaws in your mental model and provide the feedback you need to create the best design possible.
English
50
129
1.1K
53.2K
Joy Wongkamjan retweetledi
Ksenia_TuringPost
Ksenia_TuringPost@TheTuringPost·
.@KAIST_AI and @nyuniversity proposed a cross-domain shared memory for coding agents This idea is called Memory Transfer Learning (MTL) Build one big memory pool from many different kinds of coding tasks and let the agent reuse that memory across domains → This memory can become a shared resource and a general experience library for many agents and models. The improvement (+3.7% on average) comes from meta-knowledge: - how to validate a solution - how to structure debugging - what checks to run - how to detect failure patterns And all of this should be at the right level of abstraction, because memories that are too specific to the task hurt performance. So debugging memory, code generation memory, testing memory → all go into the same pool. The more memory you have, the better the transfer works. MTL is the way for the coding agent to reuse general reasoning and checking rather than just exact solution traces.
Ksenia_TuringPost tweet media
English
3
37
145
7.3K
Joy Wongkamjan retweetledi
Tanishq Mathew Abraham, Ph.D.
Self-Distillation Zero: Self-Revision Turns Binary Rewards into Dense Supervision "SD-ZERO trains a single model to play two roles: a Generator, which produces an initial response, and a Reviser, which conditions on that response and its binary reward to produce an improved response. We then perform on-policy self-distillation to distill the reviser into the generator, using the reviser’s token distributions conditioned on the generator’s response and its reward as supervision. In effect, SD-ZERO trains the model to transform binary rewards into dense token-level self-supervision."
Tanishq Mathew Abraham, Ph.D. tweet media
English
6
32
187
16.3K
Joy Wongkamjan retweetledi
Anthropic
Anthropic@AnthropicAI·
Research we co-authored on subliminal learning—how LLMs can pass on traits like preferences or misalignment through hidden signals in data—was published today in @Nature. Read the paper: nature.com/articles/s4158…
Owain Evans@OwainEvans_UK

Our paper on Subliminal Learning was just published in Nature! Last July we released our preprint. It showed that LLMs can transmit traits (e.g. liking owls) through data that is unrelated to that trait (numbers that appear meaningless). What’s new?🧵

English
207
328
2.7K
438.1K
Joy Wongkamjan retweetledi
Lihao Sun
Lihao Sun@1e0sun·
How do LLMs do CoT reasoning internally? In our new #ACL2026 paper, we show that reasoning unfolds as a structured trajectory in representation space. Correct and incorrect paths diverge, and we use this to predict correctness before the answer and correct errors mid-flight. 1/
Lihao Sun tweet media
English
11
34
290
19K
Joy Wongkamjan retweetledi
Tanishq Mathew Abraham, Ph.D.
Efficient RL Training for LLMs with Experience Replay "Empirically, we show that a well-designed replay buffer can drastically reduce inference compute without degrading – and in some cases even improving – final model performance, while preserving policy entropy."
Tanishq Mathew Abraham, Ph.D. tweet media
English
5
53
350
21.5K
Joy Wongkamjan retweetledi
Ben Burtenshaw
Ben Burtenshaw@ben_burtenshaw·
rl for agents is moving fast with open source tools. on april 22 at 5pm cest, we’re hosting a live workshop with lewis tunstall, will brown, ofir press, alex zhang, and more to dig into reward design, rollouts, benchmarks, and real-world agent systems. join us live
Ben Burtenshaw tweet media
English
7
32
244
31.7K
Joy Wongkamjan retweetledi
Xiangming Gu
Xiangming Gu@gu_xiangming·
[1/9] Glad to share another project I have done during my time at @GoogleDeepMind, which actually was finished a half year ago. Shall we use sequential sampling or parallel sampling for test-time-scaling? We find that parallel sampling typically tends to be better than sequential sampling (by prompting LLMs to self-correct or continue solving the problems) in thinking models. We have three hypotheses and found that the less exploration in sequential sampling is the key. We hope to shed the light on LLMs creativity! 🧵roll ...
Xiangming Gu tweet mediaXiangming Gu tweet media
English
3
24
167
18.8K
Joy Wongkamjan retweetledi
Jason Weston
Jason Weston@jaseweston·
🏋️Thinking Mid-training: RL of Interleaved Reasoning🎗️ We address the gap between pretraining (no explicit reasoning) and post-training (reasoning-heavy) with an intermediate SFT+RL mid-training phase to teach models how to think. - Annotate pretraining data with interleaved thoughts - SFT mid-training to learn when/what to think alongside original content - RL mid-training to optimize reasoning generation with grounded reward from future token prediction Result: 3.2x improvement on reasoning benchmarks compared to direct RL post-training on base Llama-3-8B, and gains over only prior SFT as well. Introducing reasoning earlier makes models better prepared for post-training! Read more in the blog post: facebookresearch.github.io/RAM/blogs/thin…
Jason Weston tweet media
English
9
71
555
66.4K
Joy Wongkamjan retweetledi
Tianle Cai
Tianle Cai@tianle_cai·
Can we turn part of an LLM's weights into long-term memory that continuously absorbs new knowledge? We took a small step toward this with In-Place Test-Time Training (In-Place TTT) — accepted as an Oral at ICLR 2026 🎉 The key idea: no new modules, optional pretraining. We repurpose the final projection matrix in every MLP block as fast weights. With an NTP-aligned objective and efficient chunk-wise updates, the model adapts on the fly — complementing attention rather than replacing it. 📄 Paper: arxiv.org/abs/2604.06169 with amazing @Guhao_Feng @Roger98079446 Kai @GeZhang86038849 Di @HuangRubio
English
24
145
1K
75.5K
Joy Wongkamjan retweetledi
Jason Weston
Jason Weston@jaseweston·
🧮 Reasoning over Mathematical Objects 🧮 Our 70-page(!) paper is out on arXiv, as covered by several of our recent blog posts. We study how to improve reasoning on hard tasks (e.g., math expressions) via: • better training data (& new evals) • better reward models (on-policy trained) • better inference methods (on-policy trained) 📝: arxiv.org/pdf/2603.18886
Jason Weston tweet media
English
3
36
200
13.8K
Joy Wongkamjan retweetledi
Zhengyao Jiang
Zhengyao Jiang@zhengyaojiang·
Is autoresearch really better than classic hyperparameter tuning? We did experiments comparing Optuna & autoresearch. Autoresearch converges faster, is more cost-efficient, and even generalizes better: 🧵(1/6)
Zhengyao Jiang tweet media
English
24
114
1.3K
133.7K
Joy Wongkamjan retweetledi
Isha Puri @ ICLR
Isha Puri @ ICLR@ishapuri101·
ChatGPT several times where's best to go for spring break? It recommends Barcelona almost every time. This isn't a fluke. RL training rewards one best answer, so the model learns to commit to one mode and repeat it. Meet Multi-Answer RL: a simple RL method that trains LMs to reason through and output a distribution of answers in a single generation. [1/N]
Isha Puri @ ICLR tweet media
English
22
73
444
95.6K
Joy Wongkamjan retweetledi
hardmaru
hardmaru@hardmaru·
I’m incredibly proud of The AI Scientist team for this milestone publication in @Nature. We started this project to explore if foundation models could execute the entire research lifecycle. Seeing this work validated at this level is a special moment. I truly believe AI will forever change the landscape of how scientific discoveries and scientific progress are made.
Sakana AI@SakanaAILabs

The AI Scientist: Towards Fully Automated AI Research, Now Published in Nature Nature: nature.com/articles/s4158… Blog: sakana.ai/ai-scientist-n… When we first introduced The AI Scientist, we shared an ambitious vision of an agent powered by foundation models capable of executing the entire machine learning research lifecycle. From inventing ideas and writing code to executing experiments and drafting the manuscript, the system demonstrated that end-to-end automation of the scientific process is possible. Soon after, we shared a historic update: the improved AI Scientist-v2 produced the first fully AI-generated paper to pass a rigorous human peer-review process. Today, we are happy to announce that “The AI Scientist: Towards Fully Automated AI Research,” our paper describing all of this work, along with fresh new insights, has been published in @Nature! This Nature publication consolidates these milestones and details the underlying foundation model orchestration. It also introduces our Automated Reviewer, which matches human review judgments and actually exceeds standard inter-human agreement. Crucially, by using this reviewer to grade papers generated by different foundation models, we discovered a clear scaling law of science. As the underlying foundation models improve, the quality of the generated scientific papers increases correspondingly. This implies that as compute costs decrease and model capabilities continue to exponentially increase, future versions of The AI Scientist will be substantially more capable. Building upon our previous open-source releases (github.com/SakanaAI/AI-Sc…), this open-access Nature publication comprehensively details our system's architecture, outlines several new scaling results, and discusses the promise and challenges of AI-generated science. This substantial milestone is the result of a close and fruitful collaboration between researchers at Sakana AI, the University of British Columbia (UBC) and the Vector Institute, and the University of Oxford. Congrats to the team! @_chris_lu_ @cong_ml @RobertTLange @_yutaroyamada @shengranhu @j_foerst @hardmaru @jeffclune

English
68
142
1.1K
202.9K
Joy Wongkamjan retweetledi
Joy Wongkamjan retweetledi
Rosinality
Rosinality@rosinality·
Value-based RL for reasoning. The main improvement is calibrating the loss such that it is zero for zero rewards at initialization.
Rosinality tweet media
English
2
16
143
8.7K
Joy Wongkamjan retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
Software horror: litellm PyPI supply chain attack. Simple `pip install litellm` was enough to exfiltrate SSH keys, AWS/GCP/Azure creds, Kubernetes configs, git credentials, env vars (all your API keys), shell history, crypto wallets, SSL private keys, CI/CD secrets, database passwords. LiteLLM itself has 97 million downloads per month which is already terrible, but much worse, the contagion spreads to any project that depends on litellm. For example, if you did `pip install dspy` (which depended on litellm>=1.64.0), you'd also be pwnd. Same for any other large project that depended on litellm. Afaict the poisoned version was up for only less than ~1 hour. The attack had a bug which led to its discovery - Callum McMahon was using an MCP plugin inside Cursor that pulled in litellm as a transitive dependency. When litellm 1.82.8 installed, their machine ran out of RAM and crashed. So if the attacker didn't vibe code this attack it could have been undetected for many days or weeks. Supply chain attacks like this are basically the scariest thing imaginable in modern software. Every time you install any depedency you could be pulling in a poisoned package anywhere deep inside its entire depedency tree. This is especially risky with large projects that might have lots and lots of dependencies. The credentials that do get stolen in each attack can then be used to take over more accounts and compromise more packages. Classical software engineering would have you believe that dependencies are good (we're building pyramids from bricks), but imo this has to be re-evaluated, and it's why I've been so growingly averse to them, preferring to use LLMs to "yoink" functionality when it's simple enough and possible.
Daniel Hnyk@hnykda

LiteLLM HAS BEEN COMPROMISED, DO NOT UPDATE. We just discovered that LiteLLM pypi release 1.82.8. It has been compromised, it contains litellm_init.pth with base64 encoded instructions to send all the credentials it can find to remote server + self-replicate. link below

English
1.4K
5.4K
28.1K
66.4M