doug chang

4.9K posts

doug chang banner
doug chang

doug chang

@dougc333

nobody

Sunnyvale, CA Katılım Eylül 2009
1.6K Takip Edilen365 Takipçiler
doug chang
doug chang@dougc333·
@RIPS hard to improve by watching. have to practice tape reading then doing the trade w paper money. Is big help to the morale; at least you know someone can do it then so can you.
English
1
0
1
59
RIPS
RIPS@RIPS·
In your honest opinion, are you getting true value from watching my daily streams? Are you seeing your trading noticeably improve? Be honest 👇
English
39
2
64
4.6K
doug chang retweetledi
Sovey
Sovey@SoveyX·
Did you know Korea sells “one-a-day” banana packs? Instead of every banana ripening at once, each one is at a different stage. One is ready today. The next one is ready tomorrow. The last one is still spiritually in college, “experimenting.” Simple. Genius. Solves the entire banana problem. What do you think? Would you prefer your bananas this way?
Sovey tweet media
English
758
1.7K
17.7K
901.4K
PaxTrader777🇺🇸
PaxTrader777🇺🇸@paxtrader777·
@jackgleason NONSENSE!!!!! I came VERY close to passing away May 17th 2024. I was not thinking about my money or my houses or cars. I was thinking about the love I received and the love I gave. I thought about God, I asked for my wife, I thought about kids. PERIOD.
English
13
6
254
5K
doug chang retweetledi
Underfox
Underfox@Underfox3·
In this paper, researchers have recovered the hardware command streams generated from CUDA API calls by the closed-source userspace driver to enable CUDA-bypassing hardware control for direct measurement of raw data-transfer performance. arxiv.org/pdf/2604.26889
Underfox tweet mediaUnderfox tweet mediaUnderfox tweet mediaUnderfox tweet media
English
2
46
238
12.7K
doug chang retweetledi
Dwarkesh Patel
Dwarkesh Patel@dwarkesh_sp·
Did a very different format with @reinerpope – a blackboard lecture where he walks through how frontier LLMs are trained and served. It's shocking how much you can deduce about what the labs are doing from a handful of equations, public API prices, and some chalk. It’s a bit technical, but I encourage you to hang in there - it’s really worth it. There are less than a handful of people who understand the full stack of AI, from chip design to model architecture, as well as Reiner. It was a real delight to learn from him. Recommend watching this one on YouTube so you can see the chalkboard. 0:00:00 – How batch size affects token cost and speed 0:31:59 – How MoE models are laid out across GPU racks 0:47:02 – How pipeline parallelism spreads model layers across racks 1:03:27 – Why Ilya said, “As we now know, pipelining is not wise.” 1:18:49 – Because of RL, models may be 100x over-trained beyond Chinchilla-optimal 1:32:52 – Deducing long context memory costs from API pricing 2:03:52 – Convergent evolution between neural nets and cryptography
English
146
595
6.5K
1.2M
doug chang retweetledi
Yuxuan Mu
Yuxuan Mu@YuxuanMu16173·
Can we build a standalone, modular, and reusable naturalness reward for training motor controllers? #SMP is a step toward that vision. Once SMP has been trained on a motion dataset, the priors can be reused to train new controllers to perform diverse tasks while adhering to the behaviors in the dataset, without original dataset or retraining. 🔥 Excited to share our latest work, SMP: Score-Matching Motion Priors, accepted to @siggraph Webpage: yxmu.foo/smp-page Code: github.com/xbpeng/MimicKit Paper: yxmu.foo/smp-page/asset… Video: youtu.be/jBA2tWk6vzU
YouTube video
YouTube
English
7
74
320
52.9K
doug chang retweetledi
Jürgen Schmidhuber
Jürgen Schmidhuber@SchmidhuberAI·
Using only box-forwarding speed as the reward, our Stackelberg PPO automatically evolves robots with arms for pushing and legs for moving. The key idea is a novel game-theoretic view of structure–control co-design, yielding more effective optimization and dramatically better designs. Come see our poster at ICLR 2026 on Apr 25, 10:30 AM, at P4-#4810. With @YuhuiWangAI, @YanningD_AI, @oneDylanAshley. Paper: arxiv.org/abs/2603.15388 Project Page: yanningdai.github.io/stackelberg-pp…
English
14
63
536
49K
doug chang retweetledi
The Inner Circle Trader
The Inner Circle Trader@I_Am_The_ICT·
They talk smack, flex their open position, show their stop... turn a knob, push a button and Poof... they get stopped out. Next...
English
25
17
669
33.2K
doug chang
doug chang@dougc333·
@I_Am_The_ICT learned how long I have to go and the difference between you and normal people.
English
0
0
2
81
doug chang
doug chang@dougc333·
@I_Am_The_ICT helps can compare yours vs. Tanja and see 2 different strategies. Helps by seeing the analysis and how to balance the put vs long analysis like today's strategy vs. Tanja. can believe I can do it myself at some point
English
0
0
0
27
The Inner Circle Trader
The Inner Circle Trader@I_Am_The_ICT·
Not looking for adoration, organic feedback only. Are the livestreams helping you, and how?
English
392
21
1.4K
51.6K
vik
vik@vikhyatk·
4am. only people awake are me and this robot
vik tweet media
English
7
1
67
2.7K
doug chang retweetledi
Zhijian Liu
Zhijian Liu@zhijianliu_·
Reasoning VLAs can think. They just can't think fast. Until now. Introducing FlashDrive⚡ 🚀 716 ms → 159 ms on RTX PRO 6000 (up to 5.7×) ✅ Zero accuracy loss FlashDrive = streaming inference + DFlash speculative reasoning + ParoQuant W4A8 Real-time reasoning for autonomous driving is here! z-lab.ai/projects/flash…
English
32
162
1.3K
162.3K
doug chang retweetledi
Thomas Wolf
Thomas Wolf@Thom_Wolf·
**Deep content post alert** A technical deep dive for your Sunday morning, somewhere between a short detective story 🕵️ and a tutorial on RLHF 🧑‍🏫 We recently added AsyncGRPO in the TRL library to decouple inference and training and scale much faster and harder. As a sanity check, we ran it on a trivial setup (reward = −len, optimal policy = emit EOS immediately). To our surprise it did not converge! This led us to a known but poorly understood issue: when the training forward pass runs in FP32 while the inference engine (vLLM) runs in BF16, RLHF often breaks. People have noticed this before and called it "numerical instability" or "noisy gradients." Nobody had pinpointed the actual mechanism. We did in this deep dive by @DirhousssiAmine We instrumented the training loop and decomposed the importance sampling ratio as: log r = α + β, where α is the true policy change (in BF16 space) and β is the precision gap between the training forward pass and a BF16 forward on the same weights. See it like this: α = how much the policy actually changed since the rollout (same precision, different time). β = how much the trainer and inference engine disagree about the same policy (same weights, different precision). The ratio sees α + β and PPO can't tell them apart. Empirically, β is small at the token level (O(1e−2–1e−1)) but it is not an innocent random noise that would wash out over time. We found it to be structured, persistent, and worse for certain tokens: it has a consistent negative bias, correlates with the advantage, and is up to 50x larger on low-probability tokens. However, despite all these concerning properties, none of them explain the mechanism. We saw that just disabling clipping leads to stable convergence meaning that β noise alone does not explain the failure. We tested every plausible explanation and ruled them out one by one: ⭐️ Treating β as pure noise: keeping β but disabling clipping leads to stable convergence. ⭐️ FP32 backward: You're optimizing a function (FP32) that's slightly different from the one you deploy (BF16). So you might be climbing the wrong hill. Turns out the hills are close enough: using FP32 gradients with a clean ratio (β removed) converges and is actually more effective at improving the deployed BF16 policy. ⭐️ Multiplicative distortion of the advantage: Since β correlates with the advantage, you might think it systematically over-reinforces good tokens and under-suppresses bad ones, warping what the optimizer thinks is good vs bad. We measured this directly and the per-token gradient weights are identical whether β is there or not. ⭐️ BF16 quantization / boundary crossings: at low learning rates, most FP32 weight updates are too small to change the BF16 representation at all. So you might think vLLM just never sees the updates and that's why it stalls. However if boundary crossings were the problem, you'd expect the failing run to have fewer of them than the converging run. But both runs start with nearly identical boundary crossing rates. What we discovered is that the failure mode only appears when β enters the PPO clipped objective. And this was our hint to the real mechanism. Because PPO clips the ratio, small perturbations from β push r outside the trust region even when the underlying policy has not meaningfully changed. The clipped branch is selected, the gradient is exactly zero. We call this *phantom clipping*: tokens are treated as if they exceeded the trust region when the change is purely numerical! And this is not a marginal effect. At early training, the policy has barely moved (α ≈ 0), so the clipping decision reduces to whether |β| > 0.2. Yet roughly 18% of tokens get phantom-clipped! And because RL is closed-loop, the damage compounds: the deployed policy barely improves, future rollouts carry the same information, and the system locks into a permanent stall. To make it a testable hypothesis, we confirmed causality with targeted interventions: removing β from the ratio, forcing r = 1, or keeping β but disabling clipping all restore convergence. Runs only fail when β is present in the clipped ratio. No exceptions. The issue is not general numerical noise. It is a specific interaction between precision mismatch and PPO's clipping mechanism: the precision gap perturbs the ratio in a way that induces zero gradients where there should be signal. We concluded with a set of recommended fixes (strongest first): match precisions (FP16 everywhere, or BF16 autocast with FP32 master weights), compute the ratio from a BF16 shadow forward pass, or widen ε to disable clipping. Full write-up with experiments, interactive explanation and analysis at: huggingface.co/spaces/aminedi… (Amine also wrote an X article which is very cool but you'll loose the interactive graphics and animations 😭)
Dirhousssi Amine@DirhousssiAmine

x.com/i/article/2045…

English
12
31
307
50.4K