Karen Duane

1 posts

Karen Duane

Karen Duane

@Keyu_Duan

Bergabung Mart 2024
2 Mengikuti6 Pengikut
Karen Duane me-retweet
Michael Qizhe Shieh
Michael Qizhe Shieh@michaelqshieh·
Greedy Coordinate Gradient is a useful method but takes a lot of time to run. We accelerated it by 5.6x using a method called probe sampling. The key idea behind probe sampling is to use a smaller draft model to filter unpromising candidates in the search. But the difficulty there is that smaller draft models don’t agree with the target models when the draft models are small, so we have found it to be very effective to measure the dynamic agreement between the smaller draft model and the bigger target model, hence the name “probe sampling”. Here is the paper: arxiv.org/pdf/2403.01251….
Michael Qizhe Shieh tweet media
English
1
9
36
11K