Toviah Moldwin

3.5K posts

Toviah Moldwin

@TMoldwin

Computational neuroscientist @ELSCbrain @Segev_Lab. Singer and guitarist for the rock band @SynfireChain. Dualist. Founder, https://t.co/YMBvt487lT.

Katılım Nisan 2019

901 Takip Edilen845 Takipçiler

Sabitlenmiş Tweet

Toviah Moldwin@TMoldwin·27 Oca

Lots of people (myself included) have trouble understanding why transformer architectures directly add token embeddings to the position embeddings. It's weird to just directly add the representation of the content - i.e. the token embeddings, with the representation of the content's position within its context. (Note: there are techniques like RoPE that try to get around this, however here I'll deal with the more basic strategy of addition.) Together with Raneem Mahajne, we trained a small transformer on a simple next-token prediction task. We gave the transformer a sequence of random digits, occasionally interspersed with a + sign. Whenever the + sign appeared, the next number would have to be the same as the *most recent even number*. For example, if we have the sequence 3 1 2 7 5 +, the next number would have to be a 2. Because we used an embedding dimension of 2 for both the positions and the tokens, we can directly visualize the token embeddings, the position embeddings, and their sum, for every possible token and position combination. In the token embedding space, the transformer learned to separate out the even numbers from the odd numbers, and it also learned to separate the + sign from everything else. Interestingly, the model also decided to smush all the odd numbers together, while maintaining some space between each even digit, presumably because once a + sign appears, it becomes necessary to predict a specific even number. In the position embedding space, the transformer learned to correctly order the positions along a curve. Position ordering is important for this task, because in order to know what the "most recent" even number is, you need to have a sense of ordering. When the token and position information are summed, the structure of *both* the token embedding space and position embedding space are preserved. The even numbers remain separated from each other and the odd numbers, the odd numbers remained smushed together, and the "+" token occupies its own area of space. But now, for each token, we see a local structure based on the curve of ordered positions. Part of the reason why this occurs is because the magnitudes of the token embeddings are ~10x larger than the magnitudes of the position embeddings, allowing the 'macro' structure to be dominated by the tokens, while the 'micro' structure is determined by the positions. In other words, summing the position information and token information doesn't just mix things haphazardly, it actually retains the geometric structure of both by using a different scale to encode 'what' and 'where'.

English

591

Toviah Moldwin@TMoldwin·5h

@zagrebbi You can engineer systems to reduce this effect by simply not showing people's faces initially. At NotAZombie.net for example we show what people write first; you only see pictures after you've seen what they wrote.

English

Werner Zagrebbi🇦🇿@zagrebbi·1d

When you actually test women's preferences experimentally, IQ doesn't measurably increase attractiveness at all — physical attractiveness totally dominates across the speed-dates literature. Stated preferences ≠ revealed preferences.

Lyman Stone 石來民 🦬🦬🦬@lymanstoneky

“Girls like smart guys” is indeed true This is very bad news for a lot of guys since altering your intelligence is extremely difficult/on some accounts impossible

English

156

260

4.2K

258.6K

Toviah Moldwin@TMoldwin·22h

@OMinazzoli Like, it's sufficiently easy to host publications on your own github or personal site, rXiv wants to maintain some quality control and that's fine, no reason they have to guarantee anything.

English

238

Toviah Moldwin@TMoldwin·22h

@OMinazzoli They have their own screening process, and that's fine. It's not a free-for-all.

English

244

Olivier Minazzoli@OMinazzoli·1d

Good. Would you also clarify why arXiv does not accept papers published in reputable journals while accepting unpublished April Fools’ papers? Seems like a double standard to me.

Thomas G. Dietterich@tdietterich

Attention @arxiv authors: Our Code of Conduct states that by signing your name as an author of a paper, each author takes full responsibility for all its contents, irrespective of how the contents were generated. 1/

English

6.7K

Toviah Moldwin@TMoldwin·23h

@OMinazzoli Probably field-dependent license norms, but either way you need to clarify what you meant in your OP. What publications does rXiv reject that it is legally allowed to accept?

English

283

Olivier Minazzoli@OMinazzoli·23h

@TMoldwin Well, arXiv now often requires papers to be published in a peer-reviewed journal before being considered for its platform. Crazy for a preprint platform, right? But it is no longer really one; it is now more of a repository.

English

559

Toviah Moldwin@TMoldwin·1d

@OMinazzoli Like, it would be better for everyone if journals didn't make you sign those agreements, but that's not arXiv's fault.

English

Toviah Moldwin@TMoldwin·1d

@OMinazzoli You mean post-publication? Because of copyright licensing agreements wth the journals when the papers are published.

English

713

Toviah Moldwin@TMoldwin·1d

@dansenor The first thing I think when I see this is "selection effect".

English

Dan Senor@dansenor·1d

Punch this question into your favorite LLM: “What is the state of elite higher education in America?” THIS SNIPPET BELOW SHOULD BE THE FIRST RESPONSE.

Jesse Singal@jessesingal

Hope everyone involved is doing okay

English

6.1K

Toviah Moldwin@TMoldwin·1d

@roni_eitan אגב הדוגמא שנתת היה בדיוק הדוגמא שלו.

עברית

Toviah Moldwin@TMoldwin·1d

@roni_eitan הייתי סטודנט בקורס כלכלה של אהרון לוין ע"ה בישיבה אוניברסיטי. הוא לימד אותנו החחוק הראשון של לוין בכלכלה: "האוכל תמיד נצרך ביחס ישר לזמינות שלו."

עברית

1.2K

Roni Eitan@roni_eitan·1d

כשקנינו מקפיא בנוסף למקרר הרגיל היינו בטוחים שסוף סוף יהיה לנו מקום אחסון נורמלי לכל מה שצריך, וקיבלנו כמשפחה שיעור חשוב בביקוש מושרה.

עברית

559

31.9K

Toviah Moldwin@TMoldwin·1d

For the past few years, I've been trying to create a healthier ecosystem for online dating. The idea is that instead of browsing people's pictures, you are first shown their Tile -- a rectangle where you write 7 things about yourself. Join us at NotAZombie.net! And help spread the word!

English

Toviah Moldwin@TMoldwin·1d

@lokesh_v04 We understand a lot about how AI works.

English

Toviah Moldwin@TMoldwin·1d

@alfairhall Has this hiring growth led to a better understanding of the brain? It's certainly led to more theoretical papers. But I think we're possibly more confused than we were before.

English

Adrienne Fairhall@alfairhall·1d

@TMoldwin Luckily there has been huge growth in hiring in comp and systems neuroscience over the past 15 years.

English

Adrienne Fairhall@alfairhall·2d

In systems neuroscience, this is simply not the case. Paradigms are challenged and updated all the time.

Yin lab@HenryYin19

Some branches of neuroscience (and science in general) are unlikely to see real progress for a long time. This stagnation stems from an academic echo chamber that very effectively silences dissent. young researchers introducing new ideas are often marginalized or driven out before they can establish themselves. There must be quite a few of these people who left academia this way and are now working in industry. The current institutional framework simply won't fund or validate anyone bold enough to challenge entrenched ideas. It's a conformist paradise gated by tenure. BTW these stagnant areas tend to have dominant senior people who have never been wrong and shall remain "authorities" forever.

English

13.9K

Toviah Moldwin@TMoldwin·1d

@Anthony_Bonato I actually told a journal (who said we should remove our names from an article for review) that this was pointless as it was already up on biorxiv with our names; they acquiesced.

English

212

Anthony Bonato@Anthony_Bonato·1d

Why bother with double-anonymous refereeing (ie no author names) when everyone puts their paper on arXiv?

English

3.8K

Toviah Moldwin@TMoldwin·1d

@alfairhall That work is basically two generations old, and kind of predates the mathematical revolution in theoretical neurscience especially since ANNs/connectionism. The question is more this generation (2026) vs. previous generation (2000).

English

Adrienne Fairhall@alfairhall·1d

@TMoldwin You’re articulating a paradigm shift. Kandel&Schwartz managed to fill a large book with previous paradigms. We will indeed see how many survive

English

Toviah Moldwin@TMoldwin·1d

@alfairhall That being said, there is generally a problem in academia of unwillingness to hand over the torch to the next generation. And the tenure system, which needs to be abolished, is a big part of that.

English

Toviah Moldwin@TMoldwin·1d

There have definitely been important breakthroughs experimentally in the past few decades. But I don't think our paradigms have changed that much, because systems neuroscience has never had great paradigms to begin with. The only systems that are 'solved' are basically early sensory systems that have fairly simple input-ouput relationships. Once you get past that stage, it's basically some mishmash of neural manifolds/dynamics/predictive coding. The problem is that the brain is so interconnected that you don't understand anything until you understand everything, and you don't have experimental access to everything. I don't think the issue is institutional in this case, it's fundamental to the complexity of the brain.

English

Toviah Moldwin retweetledi

François Fleuret@francoisfleuret·2d

Hot take: machine learning and AI did more to understand the nature of knowledge, and our relation to reality than 20 centuries of philosophy. I am ready to kind of defend this hill.

English

362

109

1.5K

125.1K

Toviah Moldwin@TMoldwin·2d

@ziv_ravid In fact, you can actually reduce the workforce by more than X% when you increase the productivity per worker by X%, because if one person can do more things, you can reduce communication/management costs that emerge when you need to coordinate different people.

English

Toviah Moldwin@TMoldwin·2d

@ziv_ravid But also at the margin of course AI should cause layoffs. AI definitely increases efficency by X% for lots of things, and that often directly translates into being able to do those things with an X% smaller workforce.

English

Ravid Shwartz Ziv@ziv_ravid·2d

To CEOs firing people right now: you're doing it for many reasons. AI isn't one of them.

English

2.4K

Keşfet

@zagrebbi @OMinazzoli @dansenor @roni_eitan @lokesh_v04 @alfairhall @Anthony_Bonato @elonmusk