
@Piotr761303Ueh @jchudnov @RylanSchaeffer @sanmikoyejo @stai_research We start to see identification of semantically equivalent pairs around 3-4B param models. We are not sure at what point it begins to have a negative impact on training.
English
Joshua Kazdan
32 posts





















