doug chang

4.9K posts

doug chang

@dougc333

nobody

Sunnyvale, CA Katılım Eylül 2009

1.6K Takip Edilen365 Takipçiler

doug chang@dougc333·2h

@RIPS hard to improve by watching. have to practice tape reading then doing the trade w paper money. Is big help to the morale; at least you know someone can do it then so can you.

English

RIPS@RIPS·3h

In your honest opinion, are you getting true value from watching my daily streams? Are you seeing your trading noticeably improve? Be honest 👇

English

4.6K

doug chang retweetledi

Sovey@SoveyX·5h

Did you know Korea sells “one-a-day” banana packs? Instead of every banana ripening at once, each one is at a different stage. One is ready today. The next one is ready tomorrow. The last one is still spiritually in college, “experimenting.” Simple. Genius. Solves the entire banana problem. What do you think? Would you prefer your bananas this way?

English

758

1.7K

17.7K

901.4K

doug chang@dougc333·1d

@paxtrader777 @jackgleason OMG stay healthy!!

English

101

PaxTrader777🇺🇸@paxtrader777·1d

@jackgleason NONSENSE!!!!! I came VERY close to passing away May 17th 2024. I was not thinking about my money or my houses or cars. I was thinking about the love I received and the love I gave. I thought about God, I asked for my wife, I thought about kids. PERIOD.

English

254

jack@jackgleason·1d

Ohh well shit then I’m definitely winning

Joe Rogan Podcast News@joeroganhq

"The number one thing people regret on their deathbed is, 'I should've treated more like a game.'" Honestly, I do agree but it's much easier said than done. For the majority of people, at least.

English

10.5K

Hazel Nam@hazel_heejeong·1d

The recording is out! Find our talk on stanford CS25 youtube📺 Also Causal JEPA will be at ICML. Looking forward to see you all in Seoul 😛

𝗿𝗮𝗺𝗮𝗸𝗿𝘂𝘀𝗵𝗻𝗮— 𝗲/𝗮𝗰𝗰@techwith_ram

Stanford's latest seminar is a deep dive into the evolution of world modeling in AI. Focuses on the shift in the world model from traditional reconstruction methods toward latent space prediction. Covers topics like: - Introduction to JEPA & World Models - Causal JEPA - LOWER Model - Practical Applications & Planning - Future Outlook

English

922

73.4K

doug chang@dougc333·1d

@hazel_heejeong such a good talk. Causal JEPA!!

English

358

doug chang@dougc333·3d

@huskydogewoof wow good work!!

English

Benhao Huang@huskydogewoof·3d

x.com/i/article/2050…

ZXX

4.2K

doug chang retweetledi

AK@_akhaliq·3d

Recursive Multi-Agent Systems paper: huggingface.co/papers/2604.25…

English

141

21.7K

doug chang retweetledi

Underfox@Underfox3·4d

In this paper, researchers have recovered the hardware command streams generated from CUDA API calls by the closed-source userspace driver to enable CUDA-bypassing hardware control for direct measurement of raw data-transfer performance. arxiv.org/pdf/2604.26889

English

238

12.7K

doug chang retweetledi

Dwarkesh Patel@dwarkesh_sp·5d

Did a very different format with @reinerpope – a blackboard lecture where he walks through how frontier LLMs are trained and served. It's shocking how much you can deduce about what the labs are doing from a handful of equations, public API prices, and some chalk. It’s a bit technical, but I encourage you to hang in there - it’s really worth it. There are less than a handful of people who understand the full stack of AI, from chip design to model architecture, as well as Reiner. It was a real delight to learn from him. Recommend watching this one on YouTube so you can see the chalkboard. 0:00:00 – How batch size affects token cost and speed 0:31:59 – How MoE models are laid out across GPU racks 0:47:02 – How pipeline parallelism spreads model layers across racks 1:03:27 – Why Ilya said, “As we now know, pipelining is not wise.” 1:18:49 – Because of RL, models may be 100x over-trained beyond Chinchilla-optimal 1:32:52 – Deducing long context memory costs from API pricing 2:03:52 – Convergent evolution between neural nets and cryptography

English

146

595

6.5K

1.2M

doug chang retweetledi

Yuxuan Mu@YuxuanMu16173·27 Nis

Can we build a standalone, modular, and reusable naturalness reward for training motor controllers? #SMP is a step toward that vision. Once SMP has been trained on a motion dataset, the priors can be reused to train new controllers to perform diverse tasks while adhering to the behaviors in the dataset, without original dataset or retraining. 🔥 Excited to share our latest work, SMP: Score-Matching Motion Priors, accepted to @siggraph Webpage: yxmu.foo/smp-page Code: github.com/xbpeng/MimicKit Paper: yxmu.foo/smp-page/asset… Video: youtu.be/jBA2tWk6vzU

YouTube

English

320

52.9K

doug chang@dougc333·25 Nis

@GoogleResearch @GoogleDeepMind that is a great paper

English

114

Google Research@GoogleResearch·25 Nis

Congratulations to the authors of "Continuous Control with Deep Reinforcement Learning", recipient of the #ICLR2026 Test of Time Award. This work remains a foundational contribution to robotics. Read the paper: arxiv.org/abs/1509.02971 @GoogleDeepMind

English

309

20.9K

doug chang retweetledi

Jürgen Schmidhuber@SchmidhuberAI·24 Nis

Using only box-forwarding speed as the reward, our Stackelberg PPO automatically evolves robots with arms for pushing and legs for moving. The key idea is a novel game-theoretic view of structure–control co-design, yielding more effective optimization and dramatically better designs. Come see our poster at ICLR 2026 on Apr 25, 10:30 AM, at P4-#4810. With @YuhuiWangAI, @YanningD_AI, @oneDylanAshley. Paper: arxiv.org/abs/2603.15388 Project Page: yanningdai.github.io/stackelberg-pp…

English

536

49K

doug chang retweetledi

Chelsea Finn@chelseabfinn·23 Nis

RL fine-tuning often prematurely collapses LLM entropy. Poly-EPO is a scalable set-RL algorithm that optimizes for a set of accurate solutions with diverse reasoning strategies. Paper: arxiv.org/abs/2604.17654

Ifdita Hasan@ifdita_hasan

Deploying language models in scientific discovery domains requires extraordinary amounts of test-time compute for search algorithms. An ideal training algorithm should be designed with this goal in mind - that we want agents to learn how to not only exploit but also optimistically explore novel strategies. The agent should learn how to synergistically explore and exploit. We propose Poly-EPO, a set RL algorithm that explores and discovers diverse reasoning paths. Work with @jubayer_hamid (co-lead), Shreya, @ShirleyYXWu, @HengyuanH, @noahdgoodman, @DorsaSadigh, and @chelseabfinn.

English

397

49.2K

doug chang@dougc333·23 Nis

@I_Am_The_ICT

QME

371

The Inner Circle Trader@I_Am_The_ICT·23 Nis

They talk smack, flex their open position, show their stop... turn a knob, push a button and Poof... they get stopped out. Next...

English

669

33.2K

doug chang@dougc333·22 Nis

@I_Am_The_ICT learned how long I have to go and the difference between you and normal people.

English

The Inner Circle Trader@I_Am_The_ICT·22 Nis

What did you learn today?

English

228

749

59.5K

doug chang@dougc333·21 Nis

@I_Am_The_ICT helps can compare yours vs. Tanja and see 2 different strategies. Helps by seeing the analysis and how to balance the put vs long analysis like today's strategy vs. Tanja. can believe I can do it myself at some point

English

The Inner Circle Trader@I_Am_The_ICT·21 Nis

Not looking for adoration, organic feedback only. Are the livestreams helping you, and how?

English

392

1.4K

51.6K

doug chang@dougc333·21 Nis

@vikhyatk we all awake Vik!!!

English

vik@vikhyatk·21 Nis

4am. only people awake are me and this robot

English

2.7K

doug chang retweetledi

Zhijian Liu@zhijianliu_·19 Nis

Reasoning VLAs can think. They just can't think fast. Until now. Introducing FlashDrive⚡ 🚀 716 ms → 159 ms on RTX PRO 6000 (up to 5.7×) ✅ Zero accuracy loss FlashDrive = streaming inference + DFlash speculative reasoning + ParoQuant W4A8 Real-time reasoning for autonomous driving is here! z-lab.ai/projects/flash…

English

162

1.3K

162.3K

doug chang retweetledi

Thomas Wolf@Thom_Wolf·19 Nis

**Deep content post alert** A technical deep dive for your Sunday morning, somewhere between a short detective story 🕵️ and a tutorial on RLHF 🧑‍🏫 We recently added AsyncGRPO in the TRL library to decouple inference and training and scale much faster and harder. As a sanity check, we ran it on a trivial setup (reward = −len, optimal policy = emit EOS immediately). To our surprise it did not converge! This led us to a known but poorly understood issue: when the training forward pass runs in FP32 while the inference engine (vLLM) runs in BF16, RLHF often breaks. People have noticed this before and called it "numerical instability" or "noisy gradients." Nobody had pinpointed the actual mechanism. We did in this deep dive by @DirhousssiAmine We instrumented the training loop and decomposed the importance sampling ratio as: log r = α + β, where α is the true policy change (in BF16 space) and β is the precision gap between the training forward pass and a BF16 forward on the same weights. See it like this: α = how much the policy actually changed since the rollout (same precision, different time). β = how much the trainer and inference engine disagree about the same policy (same weights, different precision). The ratio sees α + β and PPO can't tell them apart. Empirically, β is small at the token level (O(1e−2–1e−1)) but it is not an innocent random noise that would wash out over time. We found it to be structured, persistent, and worse for certain tokens: it has a consistent negative bias, correlates with the advantage, and is up to 50x larger on low-probability tokens. However, despite all these concerning properties, none of them explain the mechanism. We saw that just disabling clipping leads to stable convergence meaning that β noise alone does not explain the failure. We tested every plausible explanation and ruled them out one by one: ⭐️ Treating β as pure noise: keeping β but disabling clipping leads to stable convergence. ⭐️ FP32 backward: You're optimizing a function (FP32) that's slightly different from the one you deploy (BF16). So you might be climbing the wrong hill. Turns out the hills are close enough: using FP32 gradients with a clean ratio (β removed) converges and is actually more effective at improving the deployed BF16 policy. ⭐️ Multiplicative distortion of the advantage: Since β correlates with the advantage, you might think it systematically over-reinforces good tokens and under-suppresses bad ones, warping what the optimizer thinks is good vs bad. We measured this directly and the per-token gradient weights are identical whether β is there or not. ⭐️ BF16 quantization / boundary crossings: at low learning rates, most FP32 weight updates are too small to change the BF16 representation at all. So you might think vLLM just never sees the updates and that's why it stalls. However if boundary crossings were the problem, you'd expect the failing run to have fewer of them than the converging run. But both runs start with nearly identical boundary crossing rates. What we discovered is that the failure mode only appears when β enters the PPO clipped objective. And this was our hint to the real mechanism. Because PPO clips the ratio, small perturbations from β push r outside the trust region even when the underlying policy has not meaningfully changed. The clipped branch is selected, the gradient is exactly zero. We call this *phantom clipping*: tokens are treated as if they exceeded the trust region when the change is purely numerical! And this is not a marginal effect. At early training, the policy has barely moved (α ≈ 0), so the clipping decision reduces to whether |β| > 0.2. Yet roughly 18% of tokens get phantom-clipped! And because RL is closed-loop, the damage compounds: the deployed policy barely improves, future rollouts carry the same information, and the system locks into a permanent stall. To make it a testable hypothesis, we confirmed causality with targeted interventions: removing β from the ratio, forcing r = 1, or keeping β but disabling clipping all restore convergence. Runs only fail when β is present in the clipped ratio. No exceptions. The issue is not general numerical noise. It is a specific interaction between precision mismatch and PPO's clipping mechanism: the precision gap perturbs the ratio in a way that induces zero gradients where there should be signal. We concluded with a set of recommended fixes (strongest first): match precisions (FP16 everywhere, or BF16 autocast with FP32 master weights), compute the ratio from a BF16 shadow forward pass, or widen ε to disable clipping. Full write-up with experiments, interactive explanation and analysis at: huggingface.co/spaces/aminedi… (Amine also wrote an X article which is very cool but you'll loose the interactive graphics and animations 😭)

Dirhousssi Amine@DirhousssiAmine

x.com/i/article/2045…

English

307

50.4K

Keşfet

@RIPS @paxtrader777 @jackgleason @hazel_heejeong @huskydogewoof @reinerpope @siggraph @GoogleResearch