Sergey Kolesnikov

589 posts

Sergey Kolesnikov banner
Sergey Kolesnikov

Sergey Kolesnikov

@Scitator

Head of AI Research in a fintech company. Decision Making in the Wild. Opinions are my own.

شامل ہوئے Mayıs 2011
235 فالونگ1.2K فالوورز
پن کیا گیا ٹویٹ
Sergey Kolesnikov
Sergey Kolesnikov@Scitator·
This year we managed to do the impossible. We launched the AI research department, and we launched it noticeably – with recognition from the world's leading AI conferences: ICML and NeurIPS (both spotlights). But I want to believe that this is just the beginning...
Sergey Kolesnikov tweet media
English
1
1
5
1.2K
Sergey Kolesnikov ری ٹویٹ کیا
Nikita Balagansky
Nikita Balagansky@nlp_ceo·
1/7 In-context learning (ICL) is poised to revolutionise NLP, but its success hinges on our ability to process long sequences. Recently, @simran_s_arora et al. showcased advancements in Linear Transformers and proposed Based. But what if we could push its boundaries further?
Nikita Balagansky tweet media
English
2
26
84
14.2K
Sergey Kolesnikov ری ٹویٹ کیا
Alexander Nikulin
Alexander Nikulin@how_uhh·
Our first stable release and full paper preprint for XLand-MiniGrid is done, check it out! Compared to the workshop version, we have significantly redesigned the library, multi-GPU baselines and standardized benchmarks with millions of unique tasks. github.com/corl-team/xlan…
English
1
7
22
2.6K
Sergey Kolesnikov ری ٹویٹ کیا
Vladislav Kurenkov
Vladislav Kurenkov@vladkurenkov·
In-Context RL for Variable Action Spaces ICRL is a promising direction to build Foundational Decision-Making Models. But adaptation to new action spaces is a problem. We propose Headless Algorithm Distillation (@MishaLaskin) to address it. arxiv.org/abs/2312.13327
English
1
19
104
9.2K
Sergey Kolesnikov ری ٹویٹ کیا
Vladislav Kurenkov
Vladislav Kurenkov@vladkurenkov·
Which data-collection strategies enable In-Context Reinforcement Learning? You need either RL training trajectories or supervision with optimal actions. But what if we had a demonstrator policy, could we use it to enable ICRL? We show the answer is yes arxiv.org/abs/2312.12275
English
1
10
40
5.3K
Sergey Kolesnikov ری ٹویٹ کیا
Vladislav Kurenkov
Vladislav Kurenkov@vladkurenkov·
🔥 Imagine if you could train Meta-RL agents for 1 TRILLION transitions under 40 hours? We present XLand-MiniGrid — JAX-accelerated meta-reinforcement learning environments inspired by XLand (@FeryalMP) and MiniGrid (@Love2Code). code: github.com/corl-team/xlan…
English
2
32
181
30.9K
Kyunghyun Cho
Kyunghyun Cho@kchonyc·
how do people tune hyperparameters in offline reinforcement learning???
English
23
13
112
105.4K
Sergey Kolesnikov ری ٹویٹ کیا
Vladislav Kurenkov
Vladislav Kurenkov@vladkurenkov·
NetHack is arguably one of the most challenging games for humans and even more for RL algorithms. Maybe, offline RL could help? Time will reveal. To bootstrap the practitioners, we release Katakomba — Tools and Benchmarks for Data-Driven NetHack. github.com/tinkoff-ai/kat…
Vladislav Kurenkov tweet media
English
2
18
84
30.8K
Sergey Kolesnikov ری ٹویٹ کیا
Vladislav Kurenkov
Vladislav Kurenkov@vladkurenkov·
Interested in offline and offline-to-online RL 🫶? Check out new major release of Clean Offline Reinforcement Learning library: 🤖 Offline: 10 algorithms, 30 datasets benchmarked 🦾 Offline-to-Online: 5 algorithms, 10 datasets benchmarked github.com/tinkoff-ai/CORL
Vladislav Kurenkov tweet mediaVladislav Kurenkov tweet media
English
2
9
53
9.3K
Sergey Kolesnikov
Sergey Kolesnikov@Scitator·
📢 Exciting research from our team! We explored the power of seemingly minor design choices in offline RL by applying them to an established minimalistic baseline developed by @shaneguML. The outcome? Just follow this 🧵
Vladislav Kurenkov@vladkurenkov

There were a lot of algorithmic innovations in offline RL recently, along with a silent evolution of minor design choices. What if we applied these seemingly minor modifications to an established minimalistic baseline by @shaneguML? Turns out, gains are enormous.

English
0
0
2
360
Sergey Kolesnikov
Sergey Kolesnikov@Scitator·
Exciting news: Our paper has been accepted at ICML! Our work focuses on improving the reliability of offline RL algorithms and tackling overfitting through an anti-exploration bonus. And the best part? SAC-RND challenges SOTA results with a single network, no ensembles required!
Sergey Kolesnikov tweet media
English
1
4
12
1.2K
Sergey Kolesnikov
Sergey Kolesnikov@Scitator·
This year we managed to do the impossible. We launched the AI research department, and we launched it noticeably – with recognition from the world's leading AI conferences: ICML and NeurIPS (both spotlights). But I want to believe that this is just the beginning...
Sergey Kolesnikov tweet media
English
1
1
5
1.2K
Sergey Kolesnikov
Sergey Kolesnikov@Scitator·
If you're at ICLR now, check Generalizable Policy Learning in the Physical World workshop tomorrow. We will present Prompts and Pre-Trained Language Models for Offline Reinforcement Learning and will be happy to share its current improvements. We have some 😉
Sergey Kolesnikov tweet media
English
1
2
8
0