

Shuhua Jiang
72 posts

@JiangSH24
Founder @ICIM_UK | NCIM Fellow | CNHC Reg. | Longitudinal child health, real-world trajectories, AI as memory infrastructure



I have asked ChatGPT if "meta analysis " is still an active field in stat. The answer: Yes, when aggregating effect sizes under statistical assumptions. Then I asked: Can you talk about "effect sizes" under strictly statistical assumption. Ans.: No. Go figure. I once described "meta analysis" as an attempt to average apples and oranges to learn properties of bananas. But statisticians, no matter how grotesquely, will continue to practice whatever their textbooks celebrate. See ucla.in/2N7S0K9 @eliasbareinboim @aclong111 @PatientStormDoc @DrJBhattacharya




One missing piece in medical AI is how to act under causal uncertainty. Inspired by @yudapearl, we don’t collapse uncertainty into heuristics. We maintain multiple causal hypotheses and update their weights online through real-world feedback. Prediction asks what may happen. Causal bounds + stability ask when automation must stop.






@yudapearl By the way, you need to stop describe yourself as a half Bayes. That causes a lot of mis understand.











I think you missed the main ideas. - The basic premise of JEPA is that training by reconstructio/prediction in input space is evil (or counterproductive). The details are almost always unpredictable. Hence prediction must take place in representation space, where unpredictable details are eliminated. - The main issue with JEPA is how to prevent collapse (in the absence of reconstruction loss). There are two classes of methods: (1) EMA: Using weights in target encoder that are an exponential moving average (EMA) of the weights in other encoder (I-JEPA, V-JEPA, DINO, BYOL). (2) Infomax: Using a regularizer that attempts to maximize the information content of the representation (e.g. over a batch). There are two sets of methods for that: (2a) sample-contrastive methods: that want to make each representation vector different from the others (Siamese nets, DrLIM, SimCLR, etc). They tend to not work well in high dimension, to require large batches, and hard negative mining (2b) dimension-contrastive methods: that want to make each variable independent from the others (Barlow Twins, VICReg, SIGReg/ LeJEPA, MMCR, MCR2....) Bottom line: A. SSL by reconstruction/prediction doesn't work for high-dim, continuous, noisy data B. EMA sucks: no loss function being minimized, requirement for weightmsharing.... C. Sample-contrastive informax doesn't scale to high dimension D. My money is on dimension-contrastive methods like SIGReg/LeJEPA






Why World Models? Medicine is a Causal Loop, not a sequence of tokens Our model simulates the feedback between S1 (Metabolism) & S7 (Interoception) Integrating @yudapearl’s Causal Inference, we detect physiological "Tipping Points" before a meltdown Simulation > Prediction🧬

Just back from vacation, glad to see new revolutionary ideas for causation. I'll be studying it soon but, off hand, I am skeptical of a global solution that do not start by solving local problems. I've the quote from Democritus in "Causality" (2000), but I leveraged it to solve concrete local problems: testable implications, identification, generalizability, explanation, etc, one by one, principle by principle. Will try to find the principle behind this one.