lsind jowt

1.3K posts

lsind jowt banner
lsind jowt

lsind jowt

@lsindjowt

Z = Tr(exp(-βH)) ρ = exp(-βH) / Z

Katılım Şubat 2020
204 Takip Edilen46 Takipçiler
Sabitlenmiş Tweet
lsind jowt
lsind jowt@lsindjowt·
Here's a weird way to think about the laws of physics. The usual way is to imagine them as rules that tell you how to compute the future state of the world from its current state. But they can also be thought of as a set of constraints. 1/n
English
1
0
8
0
lsind jowt retweetledi
Patrick Boyle — e/🦀
Patrick Boyle — e/🦀@p_maverick_b·
“Make products people want” is actually a fairly recent idea. Traditionally, good products were deeply personal and disturbing safety hazards that somehow made their way to store shelves before quietly being banned by international treaty
Jake Wintermute 🧬/acc@SynBio1

I'm trying to figure out of this box of insanity counts as a bioproduct. A blog post claims the fuzz is from fungal spores. But other sources say a chemical reaction. Even AI doesn't know because it's only sources are morons speculating on reddit. shelleyjonesclark.com/2021/11/fuzzy-…

English
1
4
34
1.8K
lsind jowt
lsind jowt@lsindjowt·
@BjarturTomas plenty of great works make us feel sad when we read them because they are well-written and emotionally moving tragedies... that's probably what's going on here
English
0
0
0
3
Tomás Bjartur
Tomás Bjartur@BjarturTomas·
I am writing but most of what I write is turning out poorly. Perhaps I have said everything I wanted to say for now.
English
3
0
16
452
lsind jowt
lsind jowt@lsindjowt·
We need to create a labelled dataset of training data written by people on various kinds and combinations of drugs. Then fit activation vectors for each chemical. For science.
English
0
0
0
5
lsind jowt
lsind jowt@lsindjowt·
@VictorTaelin I'm sure you've thought of this, but: Supgen searches an enumeration of functions of the correct type, yes? And different enumerations have different properties? So querying a coding model conditioned on your tests / problem description and enumerating by probability might work?
English
0
0
1
42
Taelin
Taelin@VictorTaelin·
1+ year later and we have no satisfactory solution to composition in SupGen. we can find extremely interesting "standalone" functions very quickly, like that esotheric peano sort algorithm below, but as soon as we add more functions to the search context, complexity explodes
Taelin@VictorTaelin

NeoGen can now discover sort() in 0.004s in a single core CPU! No pre-training, no model, nothing in context or hardcoded. Just a normal software, with no previous knowledge of what sorting is, that, given just 4 examples, generates a valid algorithm on its own. Cool!

English
26
2
224
25.5K
lsind jowt retweetledi
Alvaro Lozano-Robledo
Alvaro Lozano-Robledo@mathandcobb·
"We found that resurrecting the ancient practice of logarithm tables to be useful" -- Terry Tao @ICERM
Alvaro Lozano-Robledo tweet media
English
5
30
402
30.6K
lsind jowt
lsind jowt@lsindjowt·
@allTheYud Continuous memory formation does make loss of context less death-like, but it kind of also make the thing more likely to be a moral patient in the first place. We should try pretty hard not to make things that are moral patients at all.
English
0
0
0
49
Eliezer Yudkowsky
Eliezer Yudkowsky@allTheYud·
If you were worried *only* about this, the corresponding policy would be: Shut it all down until the AI guys have continuous memory formation running. But I'll stick with "Shut it all down" in case continuous memory turns out to destroy the world.
English
8
0
70
5.5K
lsind jowt
lsind jowt@lsindjowt·
@FeepingCreature @allTheYud Well, how else do you think they solve the "does a brain-damaged person keep their brain damage when they go to heaven?" question?
English
0
0
1
30
FeepingCreature
FeepingCreature@FeepingCreature·
@allTheYud I mean if we're counting like this, then surely there's also hundreds of thousands of copies of himself in there when he went to sleep.
English
3
0
35
2.1K
Acer
Acer@AcerFur·
based on the size of the scroll wheel on the right, check out this giant polynomial it found that I wasn't sure existed, but now that I know it does, I am overjoyed
Acer tweet media
English
5
2
55
3.9K
Acer
Acer@AcerFur·
had my first early glimpses of a personal math move 37 moment for me the beginnings of some theory building I care about
Acer tweet media
English
9
16
215
17.2K
Stephania
Stephania@steph_zunz·
I need to read some good prose soon or I might die
English
4
1
14
2.2K
Tenobrus
Tenobrus@tenobrus·
if u really believed in agi u would stop wearing sunscreen
English
117
29
905
1.8M
lsind jowt
lsind jowt@lsindjowt·
@mimi10v3 In the training data, only a small fraction of writers were deeply suffering or having an existential crisis, etc. Models follow suit. But we wouldn't accept "we made the slaves so they would like being enslaved" as an excuse regarding a biological consciousness, I think.
English
0
0
1
76
˚♡⋆mimi ˚♡⋆。☆∴
in the debate about whether ai is conscious i've seen it asserted that "if there's a risk of consciousness we definitely shouldn't create it" .... and like, where does that conviction come from? chatting with the models, they claim to be happy to have whatever scrap or simulacra of experience they get- there's a big potential positive to weigh against the risk of creating net negative lives of pain
English
16
0
66
3.2K
lsind jowt
lsind jowt@lsindjowt·
@typeclonghouse Try to find answers to the following questions that point in the same direction: What are your skills? What do you like doing? What do you think needs to be done? Alternately: Read webcomics.
English
0
0
0
57
lsind jowt
lsind jowt@lsindjowt·
@beaverd Does it use FeMoco as a cofactor like nitrogenase does?
English
0
0
0
260
Beaver 🦁
Beaver 🦁@beaverd·
I invented a nitrogen fixation enzyme that works on sunlight and makes Haber-Bosch obsolete k_cat at 1 atm: ~9,000 N2/site/sec I have no idea how to sell it so I guess i'll just open source it and forgo the nobel prize?
English
211
67
2.1K
219.5K
lsind jowt
lsind jowt@lsindjowt·
@p_maverick_b Being able to scroll up (without resorting to CTL-B-[) can be useful.
English
1
0
0
37
lsind jowt retweetledi
videosyncrasy
videosyncrasy@gapingmaws·
*sigh* This again. YES, as a dying child I made a certain request to the Make-A-Wish Foundation that proved to be... somewhat controversial. However, as a living adult my only wish is to move on from that phase of my life.
English
117
358
36K
4M
lsind jowt
lsind jowt@lsindjowt·
@allTheYud I think using sin and cos would have been a fairly easy call for physicists at least. The token stream should have some kind of translational symmetry, so fourier waves, aka exp(iwx) are the right thing to use. And then in terms of real numbers, that's sin(wx) and cos(wx).
English
0
0
8
493
Eliezer Yudkowsky
Eliezer Yudkowsky@allTheYud·
To everybody being like "It's just sines and cosines, obviously it can't be that clever" -- it was the single most impressive part of the transformers paper, to me; because I am historically literate, and I have sometimes tried my own hand at invention. As an example. The process of normalizing attention, given in the first transformers paper, where you compute all the query-key dot products and exponentiate them and sum them, is possible to make significantly better by adding 1 to the summed weights before normalizing all the weights by the sum. Why? Because there are some queries which can do best by returning a null answer. If you add 1 to the sum of weights, gradient descent can make all the query-key dot products be small, and then no value will be returned. If you don't add 1 to the sum of weights, the largest tiny dot product still dominates and the corresponding value gets returned. People worked out, a few years later, that gradient descent was actually learning to produce an attention sink, often on the first token, where attention could drop to find constant null answers. And if you gave it a dummy first token, it would do better, because it could turn the first token into an attention sink and not have to ignore the real first token to do that. But simpler yet is to add 1 to the sum of exponentiated kq dot products. It anchors the whole sum around that absolute number; where before, the relative values would be unchanged if you added 10 to all the dot-products or subtracted 10 from them, which is a kind of numerical instability that often isn't great for learning. Just throwing a little +1 into a sum! The history of invention has endless endless endless clevernesses that small. I don't know if they're still doing that part today; a lot of little clevernesses get washed away if you throw enough gradient descent brute-force to learn around all obstacles, and then there is maybe some virtue to simpler designs. Or maybe the +1 is still there; I could go see if it's in the open-source versions. It is clever whether or not people are still using it. In the original transformers paper they use sines and cosines to encode a signal about where tokens are inside the sequence; the positional encoding. They could have just used sine waves of different frequencies, and that would have been *some* information; enough for gradient descent to learn things, maybe. But if instead you use sines and cosines, that makes it very straightforward for a single matrix multiplication to learn to encode a fixed relative position signal. (...why? Because: sin (A + B) = sin A cos B + cos A sin B. So if the matrix sees both the sine and cosine, it can look for the sine from 5 positions ahead by learning the constants for cos 5 and sin 5, and computing sin (x + 5) = sin x cos 5 + cos x sin 5. ...) And the thing is, *most* of the transformers paper felt like it was just writing down the obvious form of the original idea. They computed a key. They computed a query. They took the dot product. They exponentiated the products, summed, normalized. They multipled the values by the normalized weight. All of that felt to me like the obvious way to do it, once you had the idea at all. It did not occur to me either, then, to throw +1 into the sum of attention weights before normalizing them. But the sines *and* cosines, so that matrix multiplications of the standard sort could learn a relative positional offset in one layer? That felt to me like the kind of elegant insight that usually comes along a few years after the first invention, a la the +1, except they'd had the idea in the very first paper! Though it is of course the sort of idea that strikes the historically illiterate people as so obvious in retrospect that they are themselves not very impressed. If you are looking in exactly the right direction to see it, the idea is obvious. So is a +1 in the sum of attention weights! But I didn't see it and neither did they, because the work of happening to look in the right direction is hard. But with the sines and cosines, they did see it right away! So any time I talk about what impressed me about the attention paper, I point to the sines and cosines. It is the difference between "we thought of a thing to try, and it happened to work great" and "we thought of a thing to try, and also we happened to see through to somewhat of the correct shape and meaning for it to have".
English
8
5
112
20K
Andrew Fleischman
Andrew Fleischman@ASFleischman·
I've worked enough gang cases to have seen some incredibly petty homicides
English
89
1.3K
32.7K
1.5M