Aliaksandr Hubin 🤍❤️🤍 💙💛

88 posts

Aliaksandr Hubin 🤍❤️🤍 💙💛 banner
Aliaksandr Hubin 🤍❤️🤍 💙💛

Aliaksandr Hubin 🤍❤️🤍 💙💛

@AliaksandrHubin

Associate Professor in Statistics Personal webpage: https://t.co/BPIc4JJuQy

Oslo Katılım Şubat 2020
269 Takip Edilen124 Takipçiler
Aliaksandr Hubin 🤍❤️🤍 💙💛
IWSM 2026 call for papers is open! Join us in Oslo, June 28 - July 3, 2026 Invited speakers: Nils Lid Hjort, Susanne Ditlevsen, Mats Stensrud, Malgorzata Bogdan, Göran Kauermann Deadlines: Talks - Feb 15, Posters - March 28. Further details: mn.uio.no/math/english/r…
Aliaksandr Hubin 🤍❤️🤍 💙💛 tweet media
Dansk
0
0
0
61
Aliaksandr Hubin 🤍❤️🤍 💙💛
Feeling proud of my PhD student Lars having his first paper published. BTW, Lars is presenting his midterm PhD progress today at NMBU, library of biotechnology building at 12. Everyone around is welcome. Some good stuff will be presented.
Accepted papers at TMLR@TmlrPub

Sparsifying Bayesian neural networks with latent binary variables and normalizing flows Lars Skaaret-Lund, Geir Storvik, Aliaksandr Hubin. Action editor: Pierre Alquier. openreview.net/forum?id=d6kqU… #variational #bayesian #weights

English
0
0
2
153
Stefano Nichele
Stefano Nichele@stenichele·
The neuro-inspiration of the AI mechanism of "attention" #NeuroAI
Andrej Karpathy@karpathy

The (true) story of development and inspiration behind the "attention" operator, the one in "Attention is All you Need" that introduced the Transformer. From personal email correspondence with the author @DBahdanau ~2 years ago, published here and now (with permission) following some fake news about how it was developed that circulated here over the last few days. Attention is a brilliant (data-dependent) weighted average operation. It is a form of global pooling, a reduction, communication. It is a way to aggregate relevant information from multiple nodes (tokens, image patches, or etc.). It is expressive, powerful, has plenty of parallelism, and is efficiently optimizable. Even the Multilayer Perceptron (MLP) can actually be almost re-written as Attention over data-indepedent weights (1st layer weights are the queries, 2nd layer weights are the values, the keys are just input, and softmax becomes elementwise, deleting the normalization). TLDR Attention is awesome and a *major* unlock in neural network architecture design. It's always been a little surprising to me that the paper "Attention is All You Need" gets ~100X more err ... attention... than the paper that actually introduced Attention ~3 years earlier, by Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio: "Neural Machine Translation by Jointly Learning to Align and Translate". As the name suggests, the core contribution of the Attention is All You Need paper that introduced the Transformer neural net is deleting everything *except* Attention, and basically just stacking it in a ResNet with MLPs (which can also be seen as ~attention per the above). But I do think the Transformer paper stands on its own because it adds many additional amazing ideas bundled up all together at once - positional encodings, scaled attention, multi-headed attention, the isotropic simple design, etc. And the Transformer has imo stuck around basically in its 2017 form to this day ~7 years later, with relatively few and minor modifications, maybe with the exception better positional encoding schemes (RoPE and friends). Anyway, pasting the full email below, which also hints at why this operation is called "attention" in the first place - it comes from attending to words of a source sentence while emitting the words of the translation in a sequential manner, and was introduced as a term late in the process by Yoshua Bengio in place of RNNSearch (thank god? :D). It's also interesting that the design was inspired by a human cognitive process/strategy, of attending back and forth over some data sequentially. Lastly the story is quite interesting from the perspective of nature of progress, with similar ideas and formulations "in the air", with a particular mentions to the work of Alex Graves (NMT) and Jason Weston (Memory Networks) around that time. Thank you for the story @DBahdanau !

English
1
0
5
282
Aliaksandr Hubin 🤍❤️🤍 💙💛 retweetledi
ISBA
ISBA@ISBA_events·
The next O'Bayes conference will take place in Athens, 8-12 June 2025. Registration is open, as well as a call for posters. obayes25.aueb.gr
English
0
5
17
1.4K
Hamid Naderi Yeganeh
Hamid Naderi Yeganeh@naderi_yeganeh·
I drew this jellyfish with mathematical equations.
Hamid Naderi Yeganeh tweet media
English
948
14.2K
158.1K
3.8M
Aliaksandr Hubin 🤍❤️🤍 💙💛
To none degree, I want to be critical on the organising committee and I realise how hard it is to make such a huge event happen. And it WAS a success! Yet I wonder if it could be a reasonably small overhead to increase the number of tracks in the future to make ICML even better!
English
0
0
0
179
Aliaksandr Hubin 🤍❤️🤍 💙💛
This is something that woks well for JSM in statistics or IFORS in operations research. Even at very specialised ISBA 24, which by the way was a big success, yet which had less than 1000 participants, there were up to 7 parallel sessions and still lots of posters.
English
1
0
0
200
Aliaksandr Hubin 🤍❤️🤍 💙💛
Reduced fees for the dedicated reviewers? Bounding submissions per author? Requiring senior authors to review more if they submit, while giving less papers to juniors? Organizing workshops on reviewing for juniors? Or at least giving feedback to them? Action must be taken.
English
0
0
1
112
Aliaksandr Hubin 🤍❤️🤍 💙💛
While the huge number of submissions to ML conferences produces certain challenges, we still must think how to further improve quality of the review due to high impact of these venues. Openreview is a great step, but could we think of:
English
1
0
0
138