Karen Duane がリツイート

Greedy Coordinate Gradient is a useful method but takes a lot of time to run. We accelerated it by 5.6x using a method called probe sampling.
The key idea behind probe sampling is to use a smaller draft model to filter unpromising candidates in the search. But the difficulty there is that smaller draft models don’t agree with the target models when the draft models are small, so we have found it to be very effective to measure the dynamic agreement between the smaller draft model and the bigger target model, hence the name “probe sampling”.
Here is the paper: arxiv.org/pdf/2403.01251….

English
