Kyle

2.7K posts

Kyle banner
Kyle

Kyle

@darsnack

NeuroAI Scholar @CSHL | EE PhD @UWMadison | Comp. Eng. / math @RoseHulman | CS + neuro + AI

Katılım Eylül 2009
459 Takip Edilen187 Takipçiler
Kyle
Kyle@darsnack·
@DimitrisPapail Also Makie.jl and Pluto.jl for interactive demos
English
0
0
0
48
Kyle
Kyle@darsnack·
@DimitrisPapail If it’s for linear algebra, then I would say Julia hits the sweet spot of intuitive syntax. Even better than Matlab. But if this course goes all the way up to DL pipelines…hard to beat PyTorch.
English
1
0
5
252
Dimitris Papailiopoulos
Dimitris Papailiopoulos@DimitrisPapail·
what language should an intro to linear algebra for ML course use?
English
5
0
1
2.4K
Kyle
Kyle@darsnack·
@DimitrisPapail @agi_watcher At best you have the exact gradient and at worst a really noisy one. Reducing how animals learn to computing an approximate forward gradient is unlikely to result in a “better” model.
English
0
0
1
954
Dimitris Papailiopoulos
Dimitris Papailiopoulos@DimitrisPapail·
@agi_watcher i don't know. "because neurological plausibility" doesn't cut it for me as an answer & there is no significant evidence that the algorithm is really comparable to backprop. So I'm not convinced that it's worth considering more carefully.
English
1
0
1
0
Dimitris Papailiopoulos
Dimitris Papailiopoulos@DimitrisPapail·
Perhaps the only interesting practical implication of only using forward passes during training is significantly less memory use, i.e., no need to store activations. Is there another one? Eg, no need to build a computation graph? Neural plausibility doesn't cut it btw
Santiago@svpino

I just read The Forward-Forward Algorithm paper by Geoffrey Hinton. In summary: It's an algorithm to train neural networks with two forward passes instead of using backpropagation. Very early, but it definitely shows that this approach deserves more research.

English
4
0
10
0
Kyle
Kyle@darsnack·
@DimitrisPapail Data movement is the energy hog in large, distributed training right? Less memory = less to move. Plus any forward algo is presumably converging in the presence of gradient noise. Those tricks might also mean resistance to noise from asynchronous, distributed updates.
English
0
0
0
13
Kyle
Kyle@darsnack·
@DimitrisPapail Those seem like important implications as you scale up the model size. Or if you have recurrence in play (I’m thinking scaling up arxiv.org/abs/2102.11011). The best answer for gradient propagation through time has been “make time into space” (i.e. transformers).
English
1
0
0
29
Kyle retweetledi
François Chollet
François Chollet@fchollet·
The capabilities of LLMs are now causing a resurgence in folks saying that humans are "statistical parrots" mindlessly predicting the next word based on what they've heard before. 5 minutes of observing a young child still learning to speak should cure you of this notion.
François Chollet@fchollet

In general I've been sensing a new current deep learning maximalists recently, going from "our models can definitely reason" to "well our models can't reason, but neither can humans!"

English
19
70
486
0
Kyle retweetledi
317070
317070@317070·
Did you know, that you can build a virtual machine inside ChatGPT? And that you can use this machine to create files, program and even browse the internet? engraved.blog/building-a-vir…
English
217
2.1K
7.8K
0
Kyle
Kyle@darsnack·
@DimitrisPapail Aren’t they both evidence for the prediction? Second shows that shortcuts are limited the # of steps in the training data unless you change the problem to force it to learn the recurrence relation. Adding recurrence to the model is a more flexible solution.
English
1
0
1
0
Dimitris Papailiopoulos
Dimitris Papailiopoulos@DimitrisPapail·
Prediction for the future of LLMs: - GPT-X will be a *recurrent* Transformer, for X=4 or 5. I can't see how TFs can become general purpose without simulating complex for-loops/GOTOs, without recurrence. Unless depth is linear or width becomes exponential in depth of recursion.
English
6
4
45
0
Kyle retweetledi
Brad Aimone
Brad Aimone@jbimaknee·
A dramatic example of this is how we engineer randomness into computers. Our amazing pseudo random number generators are reliable, predictable and rely on deterministism all the way down to transistors whereas biology is random at all levels and becomes deterministic when needed
François Chollet@fchollet

Humans as a collective can develop some incredibly sophisticated systems (your computer and the software running on it is proof of that), but those still remain orders of magnitude short of the superhuman sophistication of biological systems. For now, at least.

English
0
1
1
0
Kyle
Kyle@darsnack·
@neurograce For stuff that’s mostly complete but you decided isn’t worth pursuing further. Somehow it seems wrong that these ideas are never disseminated. Anything that could be a preprint but still worth pursuing is something that I assume you’ll hang onto for next year.
English
0
0
0
0
Kyle
Kyle@darsnack·
This is a joke, but also is a neat idea. On a per lab basis, if you can only publish one paper per year (unlimited pre-prints), what would you work on? This constraint kinda forces within a lab all the good behaviors that we aspire towards.
English
0
0
2
0
Kyle retweetledi
Tony Zador
Tony Zador@TonyZador·
White paper —Rallying cry for NeuroAI to work toward Embodied Turing Test ! Let’s overcome Moravec’s paradox: Tasks “uniquely” human like chess and even language/reasoning are much easier for machines than “easy” interaction with the world which all animals all perform.
Kording Lab 🦖@KordingLab

Massive whitepaper just dropped on why neuroscience progress should continue to drive AI progress: arxiv.org/pdf/2210.08340… Argues for an embodied turing test. Needed: real interdisciplinary people, shared platform(s), fundamental research

English
6
9
43
0
Kyle
Kyle@darsnack·
Sometimes you come across numbers that are so mind boggling that you have to share them. Some people in America pay $100 / Mbps that others pay $0.25 / Mbps (in the same city)! We desperately need a scientific corps in Congress. themarkup.org/still-loading/…
English
0
0
1
0