Toviah Moldwin

3.5K posts

Toviah Moldwin banner
Toviah Moldwin

Toviah Moldwin

@TMoldwin

Computational neuroscientist @ELSCbrain @Segev_Lab. Singer and guitarist for the rock band @SynfireChain. Dualist. Founder, https://t.co/YMBvt487lT.

Katılım Nisan 2019
901 Takip Edilen845 Takipçiler
Sabitlenmiş Tweet
Toviah Moldwin
Toviah Moldwin@TMoldwin·
Lots of people (myself included) have trouble understanding why transformer architectures directly add token embeddings to the position embeddings. It's weird to just directly add the representation of the content - i.e. the token embeddings, with the representation of the content's position within its context. (Note: there are techniques like RoPE that try to get around this, however here I'll deal with the more basic strategy of addition.) Together with Raneem Mahajne, we trained a small transformer on a simple next-token prediction task. We gave the transformer a sequence of random digits, occasionally interspersed with a + sign. Whenever the + sign appeared, the next number would have to be the same as the *most recent even number*. For example, if we have the sequence 3 1 2 7 5 +, the next number would have to be a 2. Because we used an embedding dimension of 2 for both the positions and the tokens, we can directly visualize the token embeddings, the position embeddings, and their sum, for every possible token and position combination. In the token embedding space, the transformer learned to separate out the even numbers from the odd numbers, and it also learned to separate the + sign from everything else. Interestingly, the model also decided to smush all the odd numbers together, while maintaining some space between each even digit, presumably because once a + sign appears, it becomes necessary to predict a specific even number. In the position embedding space, the transformer learned to correctly order the positions along a curve. Position ordering is important for this task, because in order to know what the "most recent" even number is, you need to have a sense of ordering. When the token and position information are summed, the structure of *both* the token embedding space and position embedding space are preserved. The even numbers remain separated from each other and the odd numbers, the odd numbers remained smushed together, and the "+" token occupies its own area of space. But now, for each token, we see a local structure based on the curve of ordered positions. Part of the reason why this occurs is because the magnitudes of the token embeddings are ~10x larger than the magnitudes of the position embeddings, allowing the 'macro' structure to be dominated by the tokens, while the 'micro' structure is determined by the positions. In other words, summing the position information and token information doesn't just mix things haphazardly, it actually retains the geometric structure of both by using a different scale to encode 'what' and 'where'.
Toviah Moldwin tweet media
English
0
1
6
591
Toviah Moldwin
Toviah Moldwin@TMoldwin·
@zagrebbi You can engineer systems to reduce this effect by simply not showing people's faces initially. At NotAZombie.net for example we show what people write first; you only see pictures after you've seen what they wrote.
English
0
0
0
39
Werner Zagrebbi🇦🇿
When you actually test women's preferences experimentally, IQ doesn't measurably increase attractiveness at all — physical attractiveness totally dominates across the speed-dates literature. Stated preferences ≠ revealed preferences.
Werner Zagrebbi🇦🇿 tweet media
Lyman Stone 石來民 🦬🦬🦬@lymanstoneky

“Girls like smart guys” is indeed true This is very bad news for a lot of guys since altering your intelligence is extremely difficult/on some accounts impossible

English
156
260
4.2K
258.6K
Toviah Moldwin
Toviah Moldwin@TMoldwin·
@OMinazzoli Like, it's sufficiently easy to host publications on your own github or personal site, rXiv wants to maintain some quality control and that's fine, no reason they have to guarantee anything.
English
1
0
4
238
Toviah Moldwin
Toviah Moldwin@TMoldwin·
@OMinazzoli They have their own screening process, and that's fine. It's not a free-for-all.
English
1
0
5
244
Olivier Minazzoli
Olivier Minazzoli@OMinazzoli·
Good. Would you also clarify why arXiv does not accept papers published in reputable journals while accepting unpublished April Fools’ papers? Seems like a double standard to me.
Thomas G. Dietterich@tdietterich

Attention @arxiv authors: Our Code of Conduct states that by signing your name as an author of a paper, each author takes full responsibility for all its contents, irrespective of how the contents were generated. 1/

English
3
2
44
6.7K
Toviah Moldwin
Toviah Moldwin@TMoldwin·
@OMinazzoli Probably field-dependent license norms, but either way you need to clarify what you meant in your OP. What publications does rXiv reject that it is legally allowed to accept?
English
1
0
3
283
Olivier Minazzoli
Olivier Minazzoli@OMinazzoli·
@TMoldwin Well, arXiv now often requires papers to be published in a peer-reviewed journal before being considered for its platform. Crazy for a preprint platform, right? But it is no longer really one; it is now more of a repository.
Olivier Minazzoli tweet media
English
2
0
3
559
Toviah Moldwin
Toviah Moldwin@TMoldwin·
@OMinazzoli Like, it would be better for everyone if journals didn't make you sign those agreements, but that's not arXiv's fault.
English
0
0
0
43
Toviah Moldwin
Toviah Moldwin@TMoldwin·
@OMinazzoli You mean post-publication? Because of copyright licensing agreements wth the journals when the papers are published.
English
2
0
11
713
Toviah Moldwin
Toviah Moldwin@TMoldwin·
@roni_eitan הייתי סטודנט בקורס כלכלה של אהרון לוין ע"ה בישיבה אוניברסיטי. הוא לימד אותנו החחוק הראשון של לוין בכלכלה: "האוכל תמיד נצרך ביחס ישר לזמינות שלו."
עברית
1
0
7
1.2K
Roni Eitan
Roni Eitan@roni_eitan·
כשקנינו מקפיא בנוסף למקרר הרגיל היינו בטוחים שסוף סוף יהיה לנו מקום אחסון נורמלי לכל מה שצריך, וקיבלנו כמשפחה שיעור חשוב בביקוש מושרה.
עברית
30
0
559
31.9K
Toviah Moldwin
Toviah Moldwin@TMoldwin·
For the past few years, I've been trying to create a healthier ecosystem for online dating. The idea is that instead of browsing people's pictures, you are first shown their Tile -- a rectangle where you write 7 things about yourself. Join us at NotAZombie.net! And help spread the word!
English
0
0
1
35
Toviah Moldwin
Toviah Moldwin@TMoldwin·
@alfairhall Has this hiring growth led to a better understanding of the brain? It's certainly led to more theoretical papers. But I think we're possibly more confused than we were before.
English
0
0
0
22
Adrienne Fairhall
Adrienne Fairhall@alfairhall·
@TMoldwin Luckily there has been huge growth in hiring in comp and systems neuroscience over the past 15 years.
English
1
0
0
30
Toviah Moldwin
Toviah Moldwin@TMoldwin·
@Anthony_Bonato I actually told a journal (who said we should remove our names from an article for review) that this was pointless as it was already up on biorxiv with our names; they acquiesced.
English
0
0
1
212
Anthony Bonato
Anthony Bonato@Anthony_Bonato·
Why bother with double-anonymous refereeing (ie no author names) when everyone puts their paper on arXiv?
English
9
1
29
3.8K
Toviah Moldwin
Toviah Moldwin@TMoldwin·
@alfairhall That work is basically two generations old, and kind of predates the mathematical revolution in theoretical neurscience especially since ANNs/connectionism. The question is more this generation (2026) vs. previous generation (2000).
English
1
0
0
33
Adrienne Fairhall
Adrienne Fairhall@alfairhall·
@TMoldwin You’re articulating a paradigm shift. Kandel&Schwartz managed to fill a large book with previous paradigms. We will indeed see how many survive
English
1
0
0
49
Toviah Moldwin
Toviah Moldwin@TMoldwin·
@alfairhall That being said, there is generally a problem in academia of unwillingness to hand over the torch to the next generation. And the tenure system, which needs to be abolished, is a big part of that.
English
0
0
0
13
Toviah Moldwin
Toviah Moldwin@TMoldwin·
There have definitely been important breakthroughs experimentally in the past few decades. But I don't think our paradigms have changed that much, because systems neuroscience has never had great paradigms to begin with. The only systems that are 'solved' are basically early sensory systems that have fairly simple input-ouput relationships. Once you get past that stage, it's basically some mishmash of neural manifolds/dynamics/predictive coding. The problem is that the brain is so interconnected that you don't understand anything until you understand everything, and you don't have experimental access to everything. I don't think the issue is institutional in this case, it's fundamental to the complexity of the brain.
English
2
0
0
59
Toviah Moldwin retweetledi
François Fleuret
François Fleuret@francoisfleuret·
Hot take: machine learning and AI did more to understand the nature of knowledge, and our relation to reality than 20 centuries of philosophy. I am ready to kind of defend this hill.
English
362
109
1.5K
125.1K
Toviah Moldwin
Toviah Moldwin@TMoldwin·
@ziv_ravid In fact, you can actually reduce the workforce by more than X% when you increase the productivity per worker by X%, because if one person can do more things, you can reduce communication/management costs that emerge when you need to coordinate different people.
English
0
0
0
44
Toviah Moldwin
Toviah Moldwin@TMoldwin·
@ziv_ravid But also at the margin of course AI should cause layoffs. AI definitely increases efficency by X% for lots of things, and that often directly translates into being able to do those things with an X% smaller workforce.
English
1
0
0
43
Ravid Shwartz Ziv
Ravid Shwartz Ziv@ziv_ravid·
To CEOs firing people right now: you're doing it for many reasons. AI isn't one of them.
English
4
1
22
2.4K