Frans Zdyb

745 posts

Frans Zdyb

@FZdyb

Shaper of loss landscapes. No agent samples the same distribution twice, for it's not the same distribution, and she is not the same agent

Copenhagen Katılım Nisan 2014

293 Takip Edilen141 Takipçiler

Sabitlenmiş Tweet

Frans Zdyb@FZdyb·16 Ağu

Why AI needs to ease up on scaling and learn how to code: deoxyribose.github.io/No-Shortcuts-t… @GaryMarcus @fchollet @yudapearl

English

Frans Zdyb@FZdyb·1d

@burny_tech They don't explain how hierarchical features induce low dimensional manifolds

English

Burny - Effective Curiosity@burny_tech·1d

What would you say are the problems with current theory of deep learning?

English

2.5K

Frans Zdyb@FZdyb·6d

@BlancheMinerva @LucaAmb It's like saying diffusion models "transcend" the human ability to draw faces, because the faces they generate look more average. Sure, they do - is that "transcendence"?

English

Frans Zdyb@FZdyb·6d

@BlancheMinerva @LucaAmb The transcendence paper explains what's going on perfectly: "If these errors (rookie mistakes) are idiosyncratic, averaging across many experts would have a denoising effect, leaving the best moves with higher probability."

English

Luca Ambrogioni@LucaAmb·6d

It's particularly stupid since the predictive coding hypothesis has been rather mainstream in cognitive science for long before LLMs, which effectively states that human leans by making statistical predictions

Philippe Lemoine@phl43

Once again, regardless of whether you think that ChatGPT understands anything or not, I think this argument is confused. To say that it can't possibly understand anything because it was only trained to "predict the next word" is just as idiotic as saying that humans can't understand anything because they were "trained" to survive and spread their genes. This line of argument seems to boil down to the idea that, unless something works roughly in the same way as the human brain, it can't really be intelligent, but just as the same software can run on very different types of hardware there is no reason to think that human-like intelligence couldn't be implemented in very different ways.

English

1.7K

Frans Zdyb@FZdyb·6d

@BlancheMinerva @LucaAmb Indirect evidence would be very robust OOD generalization, eg if you always get the right product even when running in factors much longer than training examples.

English

Frans Zdyb@FZdyb·6d

@BlancheMinerva @LucaAmb Structural Causal Model. Direct evidence would be identifying the encoded graph and showing that different prompts correspond to evaluating queries. Eg for arithmetic, finding a circuit that is a multiplication algo and showing that different input strings result in running it.

English

Frans Zdyb@FZdyb·6d

@LucaAmb I agree that learning from our own actions is a small part of learning, and more comes from observing other people's actions, and even more from language. But my guess is you need the causal representations in place before these other two sources can work..

English

Frans Zdyb@FZdyb·6d

@LucaAmb Fair, but I'm skeptical that LLM weights learn to represent SCMs - and I think you need both actions, representations and a causal inference algorithm. LLMs may have actions, but not the other two.

English

Frans Zdyb@FZdyb·6d

@usablejam @AVMiceliBarone @francoisfleuret None, but I'm much better than Erdös at abstaining from amphetamine

English

usablejam🇺🇸🇺🇦🇮🇱@usablejam·6d

@FZdyb @AVMiceliBarone @francoisfleuret How many Erdös problems were you the first to solve?

English

François Fleuret@francoisfleuret·6d

Listen. There are two sorts of bits: The one coming from understanding, like if *I* write "2+2=4", this string is encoded in u-bits, understood-bits, good stuff. Whereas ChatGPT writing "2+2=4" produces a string of fs-bits, aka fancy-statistics-bits. Which are crap.

Big Brain AI@realBigBrainAI

Oxford AI professor Michael Wooldridge: "ChatGPT doesn't understand anything. It's essentially doing some fancy statistics."

English

484

63.1K

Frans Zdyb@FZdyb·6d

@AVMiceliBarone @francoisfleuret No, I can't even do 5 digits I think, but it doesn't matter. Understanding doesn't mean zero errors, it means deploying the correct algorithm regardless of deviation from training data distribution.

English

107

Antonio Valerio Miceli Barone@AVMiceliBarone·6d

@FZdyb @francoisfleuret Humans can understand the concept of multiplication and still get it wrong, or just run out of working memory. Can you multiply two 10-digit numbers in your head?

English

119

Frans Zdyb@FZdyb·6d

@PersonaIData @francoisfleuret Not for arithmetic no, but it matters that it can't discover causal models in general. That's why they can't do anything humans or tools can't do, why they hallucinate, and why their capabilities are jagged - all symptoms of narrow generalization based on acausal patterns.

English

225

Kingston Jr@PersonaIData·6d

@FZdyb @francoisfleuret Does it matter if they have the ability to write their own calculator tool and use it, in machine language if it came to that?

English

175

Frans Zdyb@FZdyb·6d

@AVMiceliBarone @francoisfleuret Whereas once a human understands the concept of multiplication, algorithms, execution traces and answers all have to agree. Whatever LLMs do, it's not that. We need a different word than understanding.

English

201

Frans Zdyb@FZdyb·6d

@AVMiceliBarone @francoisfleuret You can get LLMs closer to understanding by asking it to review the algorithm in the prompt, and have it do chain of thought. But if you train it on the wrong algorithm, it might ignore it and produce correct answers; or follow it and produce wrong answers, not noticing it.

English

232

Frans Zdyb@FZdyb·21 Nis

@paraschopra I don't see how realism makes predictions about correlates, it doesn't (and can't) posit any actual mechanism. On the other hand, the attention schema theory not only posits a mechanism, but specifically predicts that people will deny any mechanism. Seems like a slam dunk!

English

Paras Chopra@paraschopra·21 Nis

Two famous camps about the nature of consciousness is illusionism (qualia is illusion) and realism (qualia is real). Note that both, in principle, make equivalent predictions about observables such as what will be reported by the subject ("i see red") or correlates (gamma oscillations encoding wakefulness). So, if it isn't possible to discriminate between the two, is there a point in separating them? Pragmatism requires empirical discriminations to tell apart theories, and so far I haven't been able to discover how to tell illusionism and realism apart.

English

6.9K

Frans Zdyb@FZdyb·19 Nis

@aryehazan Computational functionalism + the attention schema theory are very satisfying, both independently and together. They have that "snap into place" and "nothing makes sense except in light of the theory" feel to them.

English

Aryeh Kontorovich@aryehazan·19 Nis

computational functionalism isn't terribly satisfying but it's kind of the default if you reject dualism, panpsychism, quantum spookiness like, what else is there?

QC@QiaochuYuan

what's actually happening in the discourse is clearly stranger than this. there's a contingent of people to whom computational functionalism seems obviously correct, to whom arguments against seem like nonsense, and a contingent of people to whom computational functionalism seems obviously incorrect, to whom arguments for seem like nonsense. approximately zero meaningful communication appears to be capable of bridging this gap. this is weird! what the fuck is going on with this!

English

7.3K

Keşfet

@burny_tech @BlancheMinerva @LucaAmb @usablejam @AVMiceliBarone @francoisfleuret @PersonaIData @paraschopra