Shuaichen Chang

566 posts

Shuaichen Chang

Shuaichen Chang

@ShuaichenChang

Researcher at AWS AI (@AmazonScience) Ex: PhD @OhioState Opinions are my own #NLProc #LLMs #AI

NYC Katılım Ağustos 2016
1.2K Takip Edilen1.8K Takipçiler
Nathan Lambert
Nathan Lambert@natolambert·
I’ve been saying it for a while, cursor’s research team is insanely high on talent density. So many people I respected from my PhD / early career ended up there. Seems like that’s bearing fruit.
English
16
12
702
37.9K
Shuaichen Chang
Shuaichen Chang@ShuaichenChang·
Gemini sometimes triggers random things lol
Shuaichen Chang tweet media
English
0
0
2
299
Shuaichen Chang retweetledi
Nathan Lambert
Nathan Lambert@natolambert·
GPT 5.4 didn't get enough praise for how big of a step it was in OpenAI's agent arc. At the same time, with better context management, speed, rate limits, instruction following, code -- it's revealing that I still turn to the "warmth" of Claude. interconnects.ai/p/gpt-54-is-a-…
English
23
22
302
27.1K
Shuaichen Chang retweetledi
Sasha Rush
Sasha Rush@srush_nlp·
@eliebakouch It’s 100% learned in RL. We thought we might have to start with a complex prompt to kickstart it, but even the initial summaries are good enough for it to get some signal.
English
2
4
62
14.3K
Sara Hooker
Sara Hooker@sarahookr·
I’m in the middle of high stakes negotiation with who hacked our @adaption_ai account. I would prefer @XBusiness handled it. But it is the wild Wild West, no response from support at X. Support doesn’t exist. Ignore @adaption_ai for the next 24h while we sort this out.
Sara Hooker tweet media
English
31
8
185
43K
Shuaichen Chang
Shuaichen Chang@ShuaichenChang·
This is fantastic! Is this a state-to-action model, or is there an intermediate intent? The big question is whether the robot learns to have a goal/plan in mind or is just executing learned reflexes.
Zhikai Zhang@Zhikai273

🎾Introducing LATENT: Learning Athletic Humanoid Tennis Skills from Imperfect Human Motion Data Dynamic movements, agile whole-body coordination, and rapid reactions. A step toward athletic humanoid sports skills. Project: zzk273.github.io/LATENT/ Code: github.com/GalaxyGeneralR…

English
0
0
1
586
Shuaichen Chang retweetledi
Lorenzo Xiao
Lorenzo Xiao@lrzneedresearch·
I dont know who needs this but this is like an overview of notes I wrote to prepare for my "Agentic system design" Let me know if anyone feel like this is helpful and I can write more in details for each particular section algoroxyolo.github.io/blog/2026/llm-…
English
9
11
106
7K
Shuaichen Chang
Shuaichen Chang@ShuaichenChang·
@eigenron It’s because the experiments were only conducted with Qwen2.5 models which are known to improve under RL with any reward. While the finding may generalize to other models (at least I think it is intuitively convincing), people care less about the RL results based on Qwen2.5 only.
English
1
0
17
1.2K
eigenron
eigenron@eigenron·
i don't understand why this paper did not get much traction. they GRPO'd a small base model on its own confidence scores (internal rewards) instead of external rewards and it shows comparable results on math and coding benchmarks compared to models trained with GRPO with external rewards.
eigenron tweet media
English
20
68
819
47.6K
Shuaichen Chang retweetledi
Shuaichen Chang
Shuaichen Chang@ShuaichenChang·
@shi_weiyan We need a human purpose coach even before AGI! Post-AGI philosopher 👀
English
0
0
1
84
Weiyan Shi
Weiyan Shi@shi_weiyan·
What could be some post-AGI jobs that don’t exist today? Asked Claude, Gemini, and GPT, and they all seem to think we’ll need a “human purpose coach” 😂 what do you think?
Weiyan Shi tweet mediaWeiyan Shi tweet mediaWeiyan Shi tweet media
English
1
0
9
1.5K
Shuaichen Chang retweetledi
Anthropic
Anthropic@AnthropicAI·
New on the Anthropic Engineering Blog: In evaluating Claude Opus 4.6 on BrowseComp, we found cases where the model recognized the test, then found and decrypted answers to it—raising questions about eval integrity in web-enabled environments. Read more: anthropic.com/engineering/ev…
English
255
367
3.2K
1.1M
Shuaichen Chang
Shuaichen Chang@ShuaichenChang·
After reading the tech report, I feel like I should repost the tweet again to highlight how transparent the team has been. It’s one of the best tech reports I’ve read. I especially loved the synthetic state-based recall experiments and the scaling law analysis.
Ai2@allen_ai

Introducing Olmo Hybrid, a 7B fully open model combining transformer and linear RNN layers. It decisively outperforms Olmo 3 7B across evals, w/ new theory & scaling experiments explaining why. 🧵

English
1
5
44
4.7K
Shuaichen Chang retweetledi
Grigory Sapunov
Grigory Sapunov@che_shr_cat·
1/ RNNs compress history into fixed states. Perfect for O(L) scaling, fatal for recall. What if we stop overwriting history and checkpoint the states instead? You get Transformer-level Needle-in-a-Haystack recall with RNN efficiency. 🧵
Grigory Sapunov tweet media
English
3
38
255
17.2K
Shuaichen Chang
Shuaichen Chang@ShuaichenChang·
I often think about two possible paths toward AGI. The first scenario is a single extremely powerful model. In this world, one super-intelligent system can perform almost any task as long as we provide clear instructions and sufficient context. It can continuously improve itself, becoming smarter over time. The model becomes a universal problem solver, capable of operating across nearly every domain. The second scenario is very different. Instead of one universal model, we have a pipeline that continually adapts models for new tasks and environments. We start with reasonably strong base models that have good alignment properties (e.g., safe, cooperative, and generally benign). When a new task appears, an existing model is adapted specifically for that task. It may not improve on other tasks, but it becomes very good at the one it was trained for. To achieve this, models and their specific environments need to continuously co-evolve with each other. Over time, we end up with many specialized models. These models can communicate and collaborate. In other words, the first AGI is a "winner-takes-all" monolithic model with simple maintenance and tremendous commercial value, while the second AGI is an ecosystem that lowers the barrier to entry but comes with higher ongoing maintenance cost. I don’t know which future will actually happen. But either way, we will need our models to be able to continually evolve. Personally, I think the second scenario is technically more plausible. And it’s closer to the world I want to live in. P.s. the image was generated by Nano Banana 2.
Shuaichen Chang tweet media
English
0
0
0
231
Yann Dubois
Yann Dubois@yanndubs·
🔥Two things I'm esp excited about 5.4: 1. Unification: we merged our codex & mainline models 2. Efficiency: we brought the efficiency of 5.3-codex to CUA & knowledge work. We only showed 3 such plots in the blog but many of our evals required less time (tokens/tools) than 5.2. What should we fix for the next model?
Yann Dubois tweet mediaYann Dubois tweet mediaYann Dubois tweet media
English
51
29
562
44.4K