Daniil Tiapkin

23 posts

Daniil Tiapkin banner
Daniil Tiapkin

Daniil Tiapkin

@dtiapkin

Research Scientist @ Google DeepMind | PhD in RL 🇫🇷

Paris, France Katılım Haziran 2022
157 Takip Edilen182 Takipçiler
Daniil Tiapkin
Daniil Tiapkin@dtiapkin·
@josephdviviano Yes, sure, for the next release we'll compare with this implementation, thanks a lot for a ref!
English
1
0
1
16
Daniil Tiapkin
Daniil Tiapkin@dtiapkin·
While frontier labs are announcing their new models, we also want to be part of this parade. So, we’re happy to announce gfnx – a JAX-first library with environments and a single-file baseline implementation for GFlowNet research.
Daniil Tiapkin tweet media
English
3
10
20
1.2K
Daniil Tiapkin
Daniil Tiapkin@dtiapkin·
Environments, reward functions, metrics, and single-file implementations – everything you need to achieve up to 80× single-seed speedups for combinatorial object generation, from bit sequences and Ising models to phylogenetic trees.
Daniil Tiapkin tweet media
English
2
0
7
266
Daniil Tiapkin retweetledi
Timofei Gritsaev
Timofei Gritsaev@gritsaev·
1/ Can we efficiently learn the destruction process of diffusion samplers? Can we learn not just the drift, but also the variance for all transition kernels? – We answer YES in our recent paper “Adaptive Destruction Processes for Diffusion Samplers” (Oral at NeurIPS 2025 FPI Workshop).
Timofei Gritsaev tweet media
English
1
9
17
2.6K
Daniil Tiapkin
Daniil Tiapkin@dtiapkin·
The speedrun is over: I defended my PhD this week and became a doctor in applied mathematics (unofficially: in reinforcement learning)! Huge thanks to my supervisors (Eric & Gilles), collaborators, and friends for all the support.
Daniil Tiapkin tweet mediaDaniil Tiapkin tweet media
English
6
2
29
3.1K
Daniil Tiapkin retweetledi
Nikita Morozov
Nikita Morozov@nvimorozov·
(1/n) The usual assumption in GFlowNet environments is acyclicity. Have you ever wondered if it can be relaxed? Does the existing GFlowNet theory translate to the non-acyclic case? Is efficient training possible? We shed new light on these questions in our latest work! @icmlconf
Nikita Morozov tweet media
English
1
6
13
1.9K
Daniil Tiapkin
Daniil Tiapkin@dtiapkin·
I'll be at #ICLR2025 this week - let's chat about RL, sampling and so on! Excited for @gritsaev to present our work on backward policy optimization for GFlowNets (arxiv.org/abs/2410.15474, my first work as advisor!) on Sat morning, April 26, poster 454. Come to say hi to us!
English
0
4
15
418
Daniil Tiapkin retweetledi
Timofei Gritsaev
Timofei Gritsaev@gritsaev·
1/ GFlowNets are known for training a forward policy to generate complex objects step by step. However, an equally important piece specific to the GFlowNet paradigm is a backward policy, which undoes these steps and plays a crucial role in training.
Timofei Gritsaev tweet media
English
1
3
12
1.5K
Daniil Tiapkin
Daniil Tiapkin@dtiapkin·
@jramapuram In the case of language modeling, KL is computed only over the next token's distribution, but completions' prefixes are different. So, dataset expansion is the simplest way to increase diversity of possible contexts (prompt + prefix) for next-token-KL computations.
English
1
0
0
41
Jason Ramapuram
Jason Ramapuram@jramapuram·
Awesome work! Quick question: what does expand dataset with multiple completions from a prompt entail? Doesn’t distillation use a KL between two distributions (here Categoricals) and thus all you do is match natural parameters of distributions? Thus here the probs of the distribution parameterize all possible completions.
English
1
0
0
23
Daniil Tiapkin
Daniil Tiapkin@dtiapkin·
1/ If you’re familiar with RLHF, you likely heard of reward hacking —where over-optimizing the imperfect reward model leads to unintended behaviors. But what about teacher hacking in knowledge distillation: can the teacher be hacked, like rewards in RLHF?
Daniil Tiapkin tweet media
English
3
12
44
15.2K
Daniil Tiapkin
Daniil Tiapkin@dtiapkin·
5/ Our suggestions are the following: - Use online generations during distillation; - Train on more diverse prompt datasets; - Expand the dataset with multiple completions per prompt.
English
2
0
5
437
Daniil Tiapkin
Daniil Tiapkin@dtiapkin·
Moreover, it turns out that some existing GFlowNet algorithms are well-known RL algorithms under this choice of rewards.
English
1
1
2
667
Daniil Tiapkin
Daniil Tiapkin@dtiapkin·
🌟 News from the GFlowNet world: our paper “Generative Flow Networks as Entropy-Regularized RL” was honored with oral presentation at #AISTATS2024! Long story short, our result can be described by this picture.
Daniil Tiapkin tweet media
English
3
8
57
6.7K