lineardiff

1.2K posts

lineardiff banner
lineardiff

lineardiff

@lineardiff

isolation, perception, and communication

Katılım Haziran 2023
409 Takip Edilen164 Takipçiler
lineardiff
lineardiff@lineardiff·
@she_llac probably not “curable” if it underlies continual/effective learning
English
1
0
1
9
shellac
shellac@she_llac·
i hope that we'll cure sleep in the next 20 years
English
5
2
35
838
lineardiff
lineardiff@lineardiff·
@kalomaze the only thing stopping you getting pwned these days is being unimportant enough or not worth the time
English
0
0
0
19
kalomaze
kalomaze@kalomaze·
kalomaze tweet media
Feross@feross

🚨 CRITICAL: Active supply chain attack on axios -- one of npm's most depended-on packages. The latest axios@1.14.1 now pulls in plain-crypto-js@4.2.1, a package that did not exist before today. This is a live compromise. This is textbook supply chain installer malware. axios has 100M+ weekly downloads. Every npm install pulling the latest version is potentially compromised right now. Socket AI analysis confirms this is malware. plain-crypto-js is an obfuscated dropper/loader that: • Deobfuscates embedded payloads and operational strings at runtime • Dynamically loads fs, os, and execSync to evade static analysis • Executes decoded shell commands • Stages and copies payload files into OS temp and Windows ProgramData directories • Deletes and renames artifacts post-execution to destroy forensic evidence If you use axios, pin your version immediately and audit your lockfiles. Do not upgrade.

ZXX
2
2
90
6K
lineardiff
lineardiff@lineardiff·
@leothecurious imo it’s likely to be one of the most critical aspects of human learning, although i think the augmentation probably happens internally in a semi-abstract way. i think this mechanism might be one of evolution’s largest contributions.
English
0
0
0
13
davinci
davinci@leothecurious·
how much data augmentation does a human really need when learning?
English
4
0
9
1.3K
lineardiff
lineardiff@lineardiff·
@beffjezos obv they’ll just fork it and old vulnerable coins will become worthless
English
0
0
0
466
François Fleuret
François Fleuret@francoisfleuret·
It's very simple tbh, I want the model to:
François Fleuret tweet media
English
6
3
73
5.8K
Lazarz
Lazarz@Laz4rz·
Lazarz tweet media
Samip@industriaalist

here's @JeffDean talking about how labs will do multi-epoch pretraining with heavy regularization to keep scaling even with limited data. no wonder slowrun gets so much attention from pretraining teams at big labs. pretraining is about to look very very different.

ZXX
2
18
253
35.6K
lineardiff
lineardiff@lineardiff·
lineardiff@lineardiff

@apples_jimmy sometimes i wonder if this is intentional marketing strategy, it just seems too easy to whip up everybody into mania repeatedly with any hint of the next big thing on the other hand, capabilities continue to improve

Português
0
0
0
82
lineardiff
lineardiff@lineardiff·
@RyanPGreenblatt been a fan of ARC for many years now, since Icecuber. think the guys behind it are great, but worried it’s starting to push a bit into bad faith territory now.
English
0
0
3
170
Ryan Greenblatt
Ryan Greenblatt@RyanPGreenblatt·
I wish they published the performance for each human baseliner rather than just the performance of the second best human run on each task. My current guess is that the median human baseliner would score around ~15% on the metric but we can't check because the data isn't public!
ARC Prize@arcprize

Announcing ARC-AGI-3 The only unsaturated agentic intelligence benchmark in the world Humans score 100%, AI <1% This human-AI gap demonstrates we do not yet have AGI Most benchmarks test what models already know, ARC-AGI-3 tests how they learn

English
9
4
163
15.7K
lineardiff
lineardiff@lineardiff·
@AcerFur they plausibly have the compute, but probably not. they do not have the data, but they could buy it.
English
0
0
0
47
Jimmy Apples 🍎/acc
Jimmy Apples 🍎/acc@apples_jimmy·
“ A draft blog post that was available in an unsecured and publicly-searchable data store prior to Thursday evening said the new model is called “Claude Mythos” and that the company believes it poses unprecedented cybersecurity risks. “
Jimmy Apples 🍎/acc tweet media
English
45
45
617
305K
Samip
Samip@industriaalist·
here's @JeffDean talking about how labs will do multi-epoch pretraining with heavy regularization to keep scaling even with limited data. no wonder slowrun gets so much attention from pretraining teams at big labs. pretraining is about to look very very different.
English
10
48
570
125K
lineardiff
lineardiff@lineardiff·
noticing an obvious LLM inflection while reading a passage is like accidentally biting your tongue while eating
English
0
0
0
37
kalomaze
kalomaze@kalomaze·
@lineardiff i am saying you could - pretend a sequentially deep linear network is parallel during fwd (get true loss fast) - do backward over the sequential structure (do sequential chain rule matrix products) - get better inductive bias from backprop sequentiality, maintain fwd parallelism
English
2
0
5
101
kalomaze
kalomaze@kalomaze·
most practitioners in ML know that you can't express deeper functions with pure linear projections that lack nonlinearities, but don't let this distract you from the fact that chain rule optimization over deeper linear networks imposes a factorized prior over optimization itself
English
13
8
286
21K
lineardiff
lineardiff@lineardiff·
@kalomaze i don’t understand this, can you go further?
English
1
0
0
102
kalomaze
kalomaze@kalomaze·
@lineardiff in principle you could... optimize for a chain that the forward pass structure never computes, and get cleaner/better gradients, no inference cost, just bwd cost... (ofc you would need to do activation checkpointing but that's fairly typical anyways)
English
1
0
0
164
kalomaze
kalomaze@kalomaze·
@lineardiff by my intuition, nonlinearities are doing selection/implicit branching, while the decomposition of linear projections in sequence is the actual primary thing causing backprop to build representations hierarchially (in a typical fixed depth fwd pass)
English
2
1
9
699