Abhishek Desai, MD

241 posts

Abhishek Desai, MD

@headclone

@rwjsurgery @BUMedicine @BULinguistics Emerging Technologies Group

New Brunswick, NJ Beigetreten Ocak 2010

69 Folgt101 Follower

Abhishek Desai, MD@headclone·1 Nis

What Are Doctors For? A surgeon's perspective on the patient and physician as fellow travelers. headclone.substack.com/p/what-are-doc…

English

Abhishek Desai, MD@headclone·22 Ara

This study analyzed genetic variations among 3 million+ individuals of European descent and demonstrated a significant causal relationship between cholecystectomy and cholangiocarcinoma (bile duct cancer), with an odds ratio of 1.91 (91% increased risk!) doi.org/10.1016/j.gass…

English

Abhishek Desai, MD@headclone·18 Şub

Andrew Milburn, former commanding officer of the Marine Raider Regiment and retired chief of staff of Central Special Operations Command, discusses “When Not to Obey Orders” @WarOnTheRocks warontherocks.com/2019/07/when-n…

English

Abhishek Desai, MD retweetet

.stuff@vintagestuff4·15 Eyl

ZXX

2.9K

25.8K

1.1M

Abhishek Desai, MD@headclone·30 Eki

Reprocessing of an old LIDAR survey of the Mexican jungle reveals ruins of a Mayan metropolis beneath what appears to be wild jungle. Evidence of how quickly and completely nature can reclaim terrain. bbc.com/news/articles/…

English

Abhishek Desai, MD retweetet

Ethan Mollick@emollick·2 Eyl

Fast scaling has massive downsides: startups that scale more quickly fail more, but are no more likely to succeed than those that scale slower. Why? Because scaling marks the end of experimenting. Startups that scale after lots of experiments succeed. papers.ssrn.com/sol3/papers.cf…

English

301

41.5K

Abhishek Desai, MD retweetet

Hesh@orbithm·5 Ağu

This edit of the Men's 100 meter final in progress Absolutely incredible, the gap from Lyles (1st) to Seville (8th) was just 0.12 seconds 📹 Hector Vivas via Getty Images

English

Abhishek Desai, MD retweetet

Andrej Karpathy@karpathy·7 Ağu

# RLHF is just barely RL Reinforcement Learning from Human Feedback (RLHF) is the third (and last) major stage of training an LLM, after pretraining and supervised finetuning (SFT). My rant on RLHF is that it is just barely RL, in a way that I think is not too widely appreciated. RL is powerful. RLHF is not. Let's take a look at the example of AlphaGo. AlphaGo was trained with actual RL. The computer played games of Go and trained on rollouts that maximized the reward function (winning the game), eventually surpassing the best human players at Go. AlphaGo was not trained with RLHF. If it were, it would not have worked nearly as well. What would it look like to train AlphaGo with RLHF? Well first, you'd give human labelers two board states from Go, and ask them which one they like better: Then you'd collect say 100,000 comparisons like this, and you'd train a "Reward Model" (RM) neural network to imitate this human "vibe check" of the board state. You'd train it to agree with the human judgement on average. Once we have a Reward Model vibe check, you run RL with respect to it, learning to play the moves that lead to good vibes. Clearly, this would not have led anywhere too interesting in Go. There are two fundamental, separate reasons for this: 1. The vibes could be misleading - this is not the actual reward (winning the game). This is a crappy proxy objective. But much worse, 2. You'd find that your RL optimization goes off rails as it quickly discovers board states that are adversarial examples to the Reward Model. Remember the RM is a massive neural net with billions of parameters imitating the vibe. There are board states are "out of distribution" to its training data, which are not actually good states, yet by chance they get a very high reward from the RM. For the exact same reasons, sometimes I'm a bit surprised RLHF works for LLMs at all. The RM we train for LLMs is just a vibe check in the exact same way. It gives high scores to the kinds of assistant responses that human raters statistically seem to like. It's not the "actual" objective of correctly solving problems, it's a proxy objective of what looks good to humans. Second, you can't even run RLHF for too long because your model quickly learns to respond in ways that game the reward model. These predictions can look really weird, e.g. you'll see that your LLM Assistant starts to respond with something non-sensical like "The the the the the the" to many prompts. Which looks ridiculous to you but then you look at the RM vibe check and see that for some reason the RM thinks these look excellent. Your LLM found an adversarial example. It's out of domain w.r.t. the RM's training data, in an undefined territory. Yes you can mitigate this by repeatedly adding these specific examples into the training set, but you'll find other adversarial examples next time around. For this reason, you can't even run RLHF for too many steps of optimization. You do a few hundred/thousand steps and then you have to call it because your optimization will start to game the RM. This is not RL like AlphaGo was. And yet, RLHF is a net helpful step of building an LLM Assistant. I think there's a few subtle reasons but my favorite one to point to is that through it, the LLM Assistant benefits from the generator-discriminator gap. That is, for many problem types, it is a significantly easier task for a human labeler to select the best of few candidate answers, instead of writing the ideal answer from scratch. A good example is a prompt like "Generate a poem about paperclips" or something like that. An average human labeler will struggle to write a good poem from scratch as an SFT example, but they could select a good looking poem given a few candidates. So RLHF is a kind of way to benefit from this gap of "easiness" of human supervision. There's a few other reasons, e.g. RLHF is also helpful in mitigating hallucinations because if the RM is a strong enough model to catch the LLM making stuff up during training, it can learn to penalize this with a low reward, teaching the model an aversion to risking factual knowledge when it's not sure. But a satisfying treatment of hallucinations and their mitigations is a whole different post so I digress. All to say that RLHF *is* net useful, but it's not RL. No production-grade *actual* RL on an LLM has so far been convincingly achieved and demonstrated in an open domain, at scale. And intuitively, this is because getting actual rewards (i.e. the equivalent of win the game) is really difficult in the open-ended problem solving tasks. It's all fun and games in a closed, game-like environment like Go where the dynamics are constrained and the reward function is cheap to evaluate and impossible to game. But how do you give an objective reward for summarizing an article? Or answering a slightly ambiguous question about some pip install issue? Or telling a joke? Or re-writing some Java code to Python? Going towards this is not in principle impossible but it's also not trivial and it requires some creative thinking. But whoever convincingly cracks this problem will be able to run actual RL. The kind of RL that led to AlphaGo beating humans in Go. Except this LLM would have a real shot of beating humans in open-domain problem solving.

English

403

1.2K

8.8K

1.2M

Abhishek Desai, MD@headclone·5 Ağu

Modern deep learning systems are black boxes. A new type of neural network, KANs, offer a fundamentally more interpretable #AI architecture than the prevalent multi-layer perceptrons of today. spectrum.ieee.org/kan-neural-net… @IEEESpectrum @rwjsurgery

English

Abhishek Desai, MD retweetet

Jason H. Moore, PhD@moorejh·29 Tem

Understatement of the year...

English

321

1.2K

165.5K

Abhishek Desai, MD retweetet

Keith Siau@drkeithsiau·21 Tem

Digestion and absorption of nutrients

English

974

5.6K

744.3K

Abhishek Desai, MD@headclone·31 Mar

🚀📚 Our book chapter got published! "Anterior Component Separation Technique & Its Modifications for Ventral #Hernia Repair" @plasticfish83 @PennPlasticSurg link.springer.com/chapter/10.100…

English

1.2K

Abhishek Desai, MD retweetet

Josh Preuss@juicypreuss·25 Mar

50 years ago P W Anderson wrote "More is Different", an essay on the emergent properties of complex systems. The low-level rules of a system do not reveal it's high-level behavior. This idea has major implications for AI, humanity, and the nature of free will...

English

970

Abhishek Desai, MD retweetet

Ethan Mollick@emollick·19 Tem

Of all of the “dangers of AI” papers, this is most worrying: AI researchers building a tool to find new drugs to save lives realized it could do the opposite, generating new chemical warfare agents. Within 6 hours it invented deadly VX… and worse things nature.com/articles/s4225…

English

152

2.2K

Abhishek Desai, MD@headclone·9 Mar

Explainability in deep learning remains a top concern for scientists across domains. Innovation will require keystone experts who can combine engineering and scientific knowledge to see solutions which remain opaque to others. Linguistics+CS, not vs.

roon@tszzl

Chomsky destroyed years before anything resemble LLMs norvig.com/chomsky.html

English

401

Abhishek Desai, MD retweetet

Jim Fan@DrJimFan·7 Mar

After ChatGPT, the future belongs to multimodal LLMs. What’s even better? Open-sourcing. Announcing Prismer, my team’s latest vision-language AI, empowered by domain-expert models in depth, surface normal, segmentation, etc. No paywall. No forms. Batteries included: pre-trained weights, inference code, and even training/finetuning scripts (!!) Welcome you all to try today: github.com/NVlabs/Prismer Paper: arxiv.org/abs/2303.02506 Website: shikun.io/projects/prism… This work is led by our awesome summer intern @liu_shikun at @NVIDIAAI. Deep dive with me: 🧵

English

761

3.9K

943.9K

Abhishek Desai, MD retweetet

j⧉nus@repligate·6 Mar

asking Bing to look me up and then asking it for a prompt that induces a waluigi caused it to leak the most effective waluigi-triggering rules from its prompt. It appears to understand perfectly. (also, spectacular Prometheus energy here)

English

111

956

236K

Abhishek Desai, MD@headclone·5 Mar

“Choose courage over comfort”

Assoc4AcademicSurgery@AcademicSurgery

The 2023 AAS Presidential Address - Removing the Mask by @lubitz_carrie is now available for viewing. Go to the AAS web site (aasurg.org) or view directly on Vimeo and Youtube: vimeo.com/804101487 youtube.com/watch?v=JaNBH4…

English

304

Abhishek Desai, MD retweetet

Dr. Dennis Yong Kim, MMEd@traumaicurounds·28 Şub

True or False? There is rarely an ABSOLUTE time-sensitive need 2 intubate #bleeding #TRAUMA patients. 🔹identify & #stopthebleed 🔸maintain a/w patency +/- BVM 🔹Obtain accurate GCS 🔸MTP➡️OR ER intubation ⬆️morbidity & mortality! @MarkSeamonMD @RaulCoimbraMD @JTraumAcuteSurg

English

48.4K

Abhishek Desai, MD retweetet

Ashley Winter MD || Urologist@AshleyGWinter·21 Şub

Since I'm on a rampage today I also want everyone to know that pee is not stored in the balls.

English

421

190

5.9K

650.9K

Entdecken

@WarOnTheRocks @IEEESpectrum @rwjsurgery @plasticfish83 @PennPlasticSurg @liu_shikun @NVIDIAAI @MarkSeamonMD