Oam Patel

@_oampatel_

@appliedcompute, @harvard

Katılım Haziran 2023

323 Takip Edilen253 Takipçiler

Oam Patel@_oampatel_·1d

A fun fact is that the baseline here is fully asynchronous RL with a tuned train/inference layout!

Applied Compute@appliedcompute

Not all RL rollouts are equally informative. On a problem with a 10% success rate, each success is 81x more valuable than each failure. We built an algorithm to exploit this, training only on the most informative samples. The result was an improvement in compute efficiency, with held-out evaluation metrics increasing faster over time.

English

841

Oam Patel@_oampatel_·12 Ara

Enjoyed working on this! We’ll be presenting it during the Wednesday evening poster session at #NeurIPS2023

Kenneth Li@ke_li_2021

Excited to announce our new work: Inference-Time Intervention (ITI), a minimally-invasive control technique that significantly improves LLM truthfulness using little resources, benchmarked on the TruthfulQA dataset. Preprint: arxiv.org/pdf/2306.03341…

English

3.2K

Keşfet

@elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine @katyperry