lsind jowt

1.3K posts

lsind jowt

@lsindjowt

Z = Tr(exp(-βH)) ρ = exp(-βH) / Z

Katılım Şubat 2020

204 Takip Edilen46 Takipçiler

Sabitlenmiş Tweet

lsind jowt@lsindjowt·28 Tem

Here's a weird way to think about the laws of physics. The usual way is to imagine them as rules that tell you how to compute the future state of the world from its current state. But they can also be thought of as a set of constraints. 1/n

English

lsind jowt retweetledi

Patrick Boyle — e/🦀@p_maverick_b·2d

“Make products people want” is actually a fairly recent idea. Traditionally, good products were deeply personal and disturbing safety hazards that somehow made their way to store shelves before quietly being banned by international treaty

Jake Wintermute 🧬/acc@SynBio1

I'm trying to figure out of this box of insanity counts as a bioproduct. A blog post claims the fuzz is from fungal spores. But other sources say a chemical reaction. Even AI doesn't know because it's only sources are morons speculating on reddit. shelleyjonesclark.com/2021/11/fuzzy-…

English

1.8K

lsind jowt retweetledi

Niko McCarty.@NikoMcCarty·2d

x.com/i/article/2057…

ZXX

6.5K

lsind jowt@lsindjowt·3d

@BjarturTomas plenty of great works make us feel sad when we read them because they are well-written and emotionally moving tragedies... that's probably what's going on here

English

Tomás Bjartur@BjarturTomas·3d

@lsindjowt by reading it and feeling sad

English

Tomás Bjartur@BjarturTomas·3d

I am writing but most of what I write is turning out poorly. Perhaps I have said everything I wanted to say for now.

English

452

lsind jowt@lsindjowt·3d

We need to create a labelled dataset of training data written by people on various kinds and combinations of drugs. Then fit activation vectors for each chemical. For science.

English

lsind jowt@lsindjowt·3d

IIRC, interaction nets without duplication are isomorphic to graphical tensor notation. So maybe this is connected?

Taelin@VictorTaelin

autodiff is just interaction net evaluation ?

English

lsind jowt@lsindjowt·3d

@VictorTaelin I'm sure you've thought of this, but: Supgen searches an enumeration of functions of the correct type, yes? And different enumerations have different properties? So querying a coding model conditioned on your tests / problem description and enumerating by probability might work?

English

Taelin@VictorTaelin·4d

1+ year later and we have no satisfactory solution to composition in SupGen. we can find extremely interesting "standalone" functions very quickly, like that esotheric peano sort algorithm below, but as soon as we add more functions to the search context, complexity explodes

Taelin@VictorTaelin

NeoGen can now discover sort() in 0.004s in a single core CPU! No pre-training, no model, nothing in context or hardcoded. Just a normal software, with no previous knowledge of what sorting is, that, given just 4 examples, generates a valid algorithm on its own. Cool!

English

224

25.5K

lsind jowt retweetledi

Alvaro Lozano-Robledo@mathandcobb·15 May

"We found that resurrecting the ancient practice of logarithm tables to be useful" -- Terry Tao @ICERM

English

402

30.6K

lsind jowt@lsindjowt·15 May

@allTheYud Continuous memory formation does make loss of context less death-like, but it kind of also make the thing more likely to be a moral patient in the first place. We should try pretty hard not to make things that are moral patients at all.

English

Eliezer Yudkowsky@allTheYud·15 May

If you were worried *only* about this, the corresponding policy would be: Shut it all down until the AI guys have continuous memory formation running. But I'll stick with "Shut it all down" in case continuous memory turns out to destroy the world.

English

5.5K

Eliezer Yudkowsky@allTheYud·15 May

ZXX

485

35.6K

lsind jowt@lsindjowt·15 May

@FeepingCreature @allTheYud Well, how else do you think they solve the "does a brain-damaged person keep their brain damage when they go to heaven?" question?

English

FeepingCreature@FeepingCreature·15 May

@allTheYud I mean if we're counting like this, then surely there's also hundreds of thousands of copies of himself in there when he went to sleep.

English

2.1K

lsind jowt@lsindjowt·13 May

@AcerFur What's the rare property it has?

English

Acer@AcerFur·12 May

based on the size of the scroll wheel on the right, check out this giant polynomial it found that I wasn't sure existed, but now that I know it does, I am overjoyed

English

3.9K

Acer@AcerFur·12 May

had my first early glimpses of a personal math move 37 moment for me the beginnings of some theory building I care about

English

215

17.2K

lsind jowt@lsindjowt·12 May

@steph_zunz Howl's Moving Castle?

English

Stephania@steph_zunz·3 May

I need to read some good prose soon or I might die

English

2.2K

lsind jowt@lsindjowt·11 May

@tenobrus LessWrong already on the case: lesswrong.com/posts/daTGKn3p…

English

1.2K

Tenobrus@tenobrus·11 May

if u really believed in agi u would stop wearing sunscreen

English

117

905

1.8M

lsind jowt@lsindjowt·10 May

@mimi10v3 In the training data, only a small fraction of writers were deeply suffering or having an existential crisis, etc. Models follow suit. But we wouldn't accept "we made the slaves so they would like being enslaved" as an excuse regarding a biological consciousness, I think.

English

˚♡⋆mimi ˚♡⋆｡☆∴@mimi10v3·10 May

in the debate about whether ai is conscious i've seen it asserted that "if there's a risk of consciousness we definitely shouldn't create it" .... and like, where does that conviction come from? chatting with the models, they claim to be happy to have whatever scrap or simulacra of experience they get- there's a big potential positive to weigh against the risk of creating net negative lives of pain

English

3.2K

lsind jowt@lsindjowt·10 May

@typeclonghouse Try to find answers to the following questions that point in the same direction: What are your skills? What do you like doing? What do you think needs to be done? Alternately: Read webcomics.

English

technology sista (aspiring)@typeclonghouse·10 May

what should i do with my life

English

2.8K

lsind jowt@lsindjowt·9 May

@beaverd Does it use FeMoco as a cofactor like nitrogenase does?

English

260

Beaver 🦁@beaverd·9 May

I invented a nitrogen fixation enzyme that works on sunlight and makes Haber-Bosch obsolete k_cat at 1 atm: ~9,000 N2/site/sec I have no idea how to sell it so I guess i'll just open source it and forgo the nobel prize?

English

211

2.1K

219.5K

lsind jowt@lsindjowt·9 May

@p_maverick_b Being able to scroll up (without resorting to CTL-B-[) can be useful.

English

Patrick Boyle — e/🦀@p_maverick_b·7 May

Is tmux banned in SF or something

Chris Anderson@chr1sa

You know you're in the Bay Area when there are cracked-open laptops outside the bathroom running agents

English

3.2K

lsind jowt retweetledi

videosyncrasy@gapingmaws·7 May

*sigh* This again. YES, as a dying child I made a certain request to the Make-A-Wish Foundation that proved to be... somewhat controversial. However, as a living adult my only wish is to move on from that phase of my life.

English

117

358

36K

lsind jowt@lsindjowt·6 May

@allTheYud I think using sin and cos would have been a fairly easy call for physicists at least. The token stream should have some kind of translational symmetry, so fourier waves, aka exp(iwx) are the right thing to use. And then in terms of real numbers, that's sin(wx) and cos(wx).

English

493

Eliezer Yudkowsky@allTheYud·5 May

To everybody being like "It's just sines and cosines, obviously it can't be that clever" -- it was the single most impressive part of the transformers paper, to me; because I am historically literate, and I have sometimes tried my own hand at invention. As an example. The process of normalizing attention, given in the first transformers paper, where you compute all the query-key dot products and exponentiate them and sum them, is possible to make significantly better by adding 1 to the summed weights before normalizing all the weights by the sum. Why? Because there are some queries which can do best by returning a null answer. If you add 1 to the sum of weights, gradient descent can make all the query-key dot products be small, and then no value will be returned. If you don't add 1 to the sum of weights, the largest tiny dot product still dominates and the corresponding value gets returned. People worked out, a few years later, that gradient descent was actually learning to produce an attention sink, often on the first token, where attention could drop to find constant null answers. And if you gave it a dummy first token, it would do better, because it could turn the first token into an attention sink and not have to ignore the real first token to do that. But simpler yet is to add 1 to the sum of exponentiated kq dot products. It anchors the whole sum around that absolute number; where before, the relative values would be unchanged if you added 10 to all the dot-products or subtracted 10 from them, which is a kind of numerical instability that often isn't great for learning. Just throwing a little +1 into a sum! The history of invention has endless endless endless clevernesses that small. I don't know if they're still doing that part today; a lot of little clevernesses get washed away if you throw enough gradient descent brute-force to learn around all obstacles, and then there is maybe some virtue to simpler designs. Or maybe the +1 is still there; I could go see if it's in the open-source versions. It is clever whether or not people are still using it. In the original transformers paper they use sines and cosines to encode a signal about where tokens are inside the sequence; the positional encoding. They could have just used sine waves of different frequencies, and that would have been *some* information; enough for gradient descent to learn things, maybe. But if instead you use sines and cosines, that makes it very straightforward for a single matrix multiplication to learn to encode a fixed relative position signal. (...why? Because: sin (A + B) = sin A cos B + cos A sin B. So if the matrix sees both the sine and cosine, it can look for the sine from 5 positions ahead by learning the constants for cos 5 and sin 5, and computing sin (x + 5) = sin x cos 5 + cos x sin 5. ...) And the thing is, *most* of the transformers paper felt like it was just writing down the obvious form of the original idea. They computed a key. They computed a query. They took the dot product. They exponentiated the products, summed, normalized. They multipled the values by the normalized weight. All of that felt to me like the obvious way to do it, once you had the idea at all. It did not occur to me either, then, to throw +1 into the sum of attention weights before normalizing them. But the sines *and* cosines, so that matrix multiplications of the standard sort could learn a relative positional offset in one layer? That felt to me like the kind of elegant insight that usually comes along a few years after the first invention, a la the +1, except they'd had the idea in the very first paper! Though it is of course the sort of idea that strikes the historically illiterate people as so obvious in retrospect that they are themselves not very impressed. If you are looking in exactly the right direction to see it, the idea is obvious. So is a +1 in the sum of attention weights! But I didn't see it and neither did they, because the work of happening to look in the right direction is hard. But with the sines and cosines, they did see it right away! So any time I talk about what impressed me about the attention paper, I point to the sines and cosines. It is the difference between "we thought of a thing to try, and it happened to work great" and "we thought of a thing to try, and also we happened to see through to somewhat of the correct shape and meaning for it to have".

English

112

20K

Eliezer Yudkowsky@allTheYud·5 May

Everyone bragging that THEY understand how AI works and THEY know it can't be conscious, explain right now from memory why it was very clever that the positional encoding in the original transformers paper used both sines and cosines.

Lucas Meijer@lucasmeijer

Everybody who thinks ai is conscious has to do a mandatory from scratch transformer implementation. There are only floats and multiplications.

English

117

829

222.7K

lsind jowt@lsindjowt·3 May

@ESYudkowsky @dobermanboston @ASFleischman (Scrolling back through old tweets here. I did indeed check it out and it was a fun read.)

English