Frans Zdyb

745 posts

Frans Zdyb

Frans Zdyb

@FZdyb

Shaper of loss landscapes. No agent samples the same distribution twice, for it's not the same distribution, and she is not the same agent

Copenhagen Katılım Nisan 2014
293 Takip Edilen141 Takipçiler
Frans Zdyb
Frans Zdyb@FZdyb·
@burny_tech They don't explain how hierarchical features induce low dimensional manifolds
English
0
0
1
31
Frans Zdyb
Frans Zdyb@FZdyb·
@BlancheMinerva @LucaAmb It's like saying diffusion models "transcend" the human ability to draw faces, because the faces they generate look more average. Sure, they do - is that "transcendence"?
English
1
0
0
64
Frans Zdyb
Frans Zdyb@FZdyb·
@BlancheMinerva @LucaAmb The transcendence paper explains what's going on perfectly: "If these errors (rookie mistakes) are idiosyncratic, averaging across many experts would have a denoising effect, leaving the best moves with higher probability."
English
1
0
0
57
Frans Zdyb
Frans Zdyb@FZdyb·
@BlancheMinerva @LucaAmb Indirect evidence would be very robust OOD generalization, eg if you always get the right product even when running in factors much longer than training examples.
English
1
0
0
80
Frans Zdyb
Frans Zdyb@FZdyb·
@BlancheMinerva @LucaAmb Structural Causal Model. Direct evidence would be identifying the encoded graph and showing that different prompts correspond to evaluating queries. Eg for arithmetic, finding a circuit that is a multiplication algo and showing that different input strings result in running it.
English
1
0
0
79
Frans Zdyb
Frans Zdyb@FZdyb·
@LucaAmb I agree that learning from our own actions is a small part of learning, and more comes from observing other people's actions, and even more from language. But my guess is you need the causal representations in place before these other two sources can work..
English
0
0
1
31
Frans Zdyb
Frans Zdyb@FZdyb·
@LucaAmb Fair, but I'm skeptical that LLM weights learn to represent SCMs - and I think you need both actions, representations and a causal inference algorithm. LLMs may have actions, but not the other two.
English
2
0
0
78
Frans Zdyb
Frans Zdyb@FZdyb·
@AVMiceliBarone @francoisfleuret No, I can't even do 5 digits I think, but it doesn't matter. Understanding doesn't mean zero errors, it means deploying the correct algorithm regardless of deviation from training data distribution.
English
0
0
2
107
Frans Zdyb
Frans Zdyb@FZdyb·
@PersonaIData @francoisfleuret Not for arithmetic no, but it matters that it can't discover causal models in general. That's why they can't do anything humans or tools can't do, why they hallucinate, and why their capabilities are jagged - all symptoms of narrow generalization based on acausal patterns.
English
0
0
1
225
Kingston Jr
Kingston Jr@PersonaIData·
@FZdyb @francoisfleuret Does it matter if they have the ability to write their own calculator tool and use it, in machine language if it came to that?
English
1
0
1
175
Frans Zdyb
Frans Zdyb@FZdyb·
@AVMiceliBarone @francoisfleuret Whereas once a human understands the concept of multiplication, algorithms, execution traces and answers all have to agree. Whatever LLMs do, it's not that. We need a different word than understanding.
English
1
0
0
201
Frans Zdyb
Frans Zdyb@FZdyb·
@AVMiceliBarone @francoisfleuret You can get LLMs closer to understanding by asking it to review the algorithm in the prompt, and have it do chain of thought. But if you train it on the wrong algorithm, it might ignore it and produce correct answers; or follow it and produce wrong answers, not noticing it.
English
1
0
0
232
Frans Zdyb
Frans Zdyb@FZdyb·
@paraschopra I don't see how realism makes predictions about correlates, it doesn't (and can't) posit any actual mechanism. On the other hand, the attention schema theory not only posits a mechanism, but specifically predicts that people will deny any mechanism. Seems like a slam dunk!
English
0
0
0
59
Paras Chopra
Paras Chopra@paraschopra·
Two famous camps about the nature of consciousness is illusionism (qualia is illusion) and realism (qualia is real). Note that both, in principle, make equivalent predictions about observables such as what will be reported by the subject ("i see red") or correlates (gamma oscillations encoding wakefulness). So, if it isn't possible to discriminate between the two, is there a point in separating them? Pragmatism requires empirical discriminations to tell apart theories, and so far I haven't been able to discover how to tell illusionism and realism apart.
English
15
1
70
6.9K
Frans Zdyb
Frans Zdyb@FZdyb·
@aryehazan Computational functionalism + the attention schema theory are very satisfying, both independently and together. They have that "snap into place" and "nothing makes sense except in light of the theory" feel to them.
English
0
0
0
62