Ari Karchmer

30 posts

Ari Karchmer

@non0knowledge

ML Researcher @MorganStanley. Previously, Postdoc @Harvard and PhD in complexity + learning theory @BU_Tweets. Otherwise: Copa America '24 survivor

New York, NY 加入时间 Ağustos 2021

262 关注28 粉丝

Ari Karchmer@non0knowledge·20 Mar

OpenAI is really cooking lately. The difference in writing quality of 5.4 and the competitors is massive

English

Ari Karchmer@non0knowledge·10 Mar

@ylecun congrats

English

Yann LeCun@ylecun·10 Mar

Unveiling our new startup Advanced Machine Intelligence (AMI Labs). We just completed our seed round: $1.03B / 890M€, one the largest seeds ever, probably the largest for a European company. We're hiring! [the background image is the Veil Nebula - a picture I took from my backyard, most appropriate for an unveiling] More details here: techcrunch.com/2026/03/09/yan…

AMI Labs@amilabs

Advanced Machine Intelligence (AMI) is building a new breed of AI systems that understand the world, have persistent memory, can reason and plan, and are controllable and safe. We’ve raised a $1.03B (~€890M) round from global investors who believe in our vision of universally intelligent systems centered on world models. This round is co-led by Cathay Innovation, Greycroft, Hiro Capital, HV Capital, and Bezos Expeditions, along with other investors and angels across the world. We are a growing team of researchers and builders, operating in Paris, New York, Montreal and Singapore from day one. Read more: amilabs.xyz AMI - Real world. Real intelligence.

English

870

1.9K

19.3K

2.6M

Ari Karchmer@non0knowledge·6 Mar

@ziv_ravid @JudahGoldfeder @ylecun Ravid, did human intelligence arise from the constraints of earth or are humans on earth-like planet because we are the ideal intelligence and earth-like planets are the only ones able to support it? Why should we expect machine intelligence to be useful on earth?

English

Ravid Shwartz Ziv@ziv_ravid·6 Mar

New paper out: AI Must Embrace Specialization via Superhuman Adaptable Intelligence With @JudahGoldfeder, Philippe Wyder, and @ylecun . There is quite a lot of buzz on our paper, so here is my take. Everyone's talking about AGI, but nobody agrees on what it means, and that confusion is actively hurting the field. We surveyed the most prominent definitions and mapped them along two axes: the kind of capability they refer to (learning vs. doing) and the scope (anything, anything important, anything humans can do). The result is a landscape of definitions that don't just disagree, but they're often internally inconsistent. Our starting point is simple: human intelligence is not general. We are specialized creatures, shaped by evolution to excel at a narrow set of tasks critical for survival. We feel general because we can't see our own blind spots. Magnus Carlsen is the greatest human chess player ever, but compared to what's computationally achievable, he's not actually good at chess. That's not a knock on Magnus. It's a statement about the limits of human adaptation, and why anchoring AI's North Star to human-level performance is the wrong move. We propose the term Superhuman Adaptable Intelligence (SAI), or, in other words, intelligence that can learn to exceed humans at anything important we can do and can also tackle tasks entirely outside the human domain. The metric isn't a growing checklist of benchmarks. It's adaptation speed: how fast can a system acquire a new skill? This has concrete implications for how we build. SAI points toward self-supervised learning for acquiring generic knowledge from unlabeled data, and world models for planning and zero-shot transfer. It also pushes back against the current monoculture of autoregressive architectures, because specialization demands architectural diversity, not one paradigm to rule them all. Or as we put it: the AI that folds our proteins should not be the AI that folds our laundry. This paper grew out of a conversation with Yann on our The Information Bottleneck podcast, which led to a public exchange with @elonmusk and @demishassabis on X (not every paper can cite a Twitter feud as source material).

English

22.9K

Ari Karchmer@non0knowledge·5 Mar

@WillManidis

QME

294

Will Manidis@WillManidis·4 Mar

Ive realized things about the state of markets and the corresponding human condition that would kill an average man

English

176

23.5K

Ari Karchmer@non0knowledge·3 Mar

test time compute indeed

English

Ari Karchmer@non0knowledge·3 Mar

My baby spends much more effort on producing output than he does taking input Lessons in that

English

Ari Karchmer@non0knowledge·2 Mar

ratio of cost of A100/hour to cost of 1 pack of premium baby wipes 📉 it's so over

English

Ari Karchmer@non0knowledge·28 Şub

@deanwball "These guys are just that good"

English

Dean W. Ball@deanwball·28 Şub

The U.S. government just essentially announced its intention to impose Iran-level sanctions, or China-level entity listing, on an American company. This is by a profoundly wide margin the most damaging policy move I have ever seen USG try to take (it probably will not succeed).

English

112

851

5.3K

319.8K

Ari Karchmer@non0knowledge·27 Şub

Has anyone tried asking the babies how they implemented their RL? @ylecun

English

Ari Karchmer@non0knowledge·26 Şub

I'm suspicious of resnets. can't say why

English

Ari Karchmer@non0knowledge·28 May

@roydanroy @hayou_soufiane I'm not sure this is necessarily a circular definition, though it may be refined. It reminds me of cryptography style definitions which model "knowledge" inside a "machine" as that which is extractable in polynomial time. See for example "proofs of knowledge"

English

Dan Roy@roydanroy·28 May

@hayou_soufiane We should try to sharpen this logic. It's fabulously loose since the support has not changed. What's the right way to think about capabilities that can be extracted without it being a circular definition ("those things that RL can elicit").

English

889

Soufiane Hayou@hayou_soufiane·28 May

It seems that most gains from RL comes from the pretrained model itself. The format reward stuff (GRPO etc) just extracts those capabilities. It helps to have good reward signal, but it's not the main ingredient.

Stella Li@StellaLisy

🤯 We cracked RLVR with... Random Rewards?! Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by: - Random rewards: +21% - Incorrect rewards: +25% - (FYI) Ground-truth rewards: + 28.8% How could this even work⁉️ Here's why: 🧵 Blogpost: tinyurl.com/spurious-rewar…

English

1.9K

Ari Karchmer@non0knowledge·11 Ara

@gavinrbrown1 Hi Gavin! Hope you're doing well. Actually, I just emailed this same man, and he also told me that he now spends more time responding to email from alumni than reviewing dissertations. Message received :)

English

Gavin Brown@gavinrbrown1·31 Eki

I apologize to everyone who preordered. To thank you for your patience, you'll all receive co-first authorship on my next paper.

English

Gavin Brown@gavinrbrown1·31 Eki

Boston University de facto embargoes all dissertations: a solitary hero reviews everything. He's still working through the Fall 23 grads.

English

132

Ari Karchmer@non0knowledge·8 Ara

@Turn_Trout This looks really cool. Is there any obvious relation to Component Modeling (arxiv.org/abs/2404.11534)?

English

Alex Turner@Turn_Trout·7 Ara

9) Gradient Routing: Masking Gradients to Localize Computation in Neural Networks ➡️ Blogpost: turntrout.com/gradient-routi… 📖 Full paper: arxiv.org/pdf/2410.04332

English

2.7K

Alex Turner@Turn_Trout·7 Ara

6) Application 3: In a challenging toy model of “scalable oversight”, we use gradient routing with reinforcement learning to obtain a performant, steerable policy. Surprisingly, this works when merely 1% of the data is labeled, while baselines completely fail at this setting.

English

3.9K

Ari Karchmer@non0knowledge·8 Ağu

@liamgallagher @ManCity

QAM

578

Ari Karchmer@non0knowledge·8 Ağu

@PhilFoden i just finished my phd in computer science at boston university, shoutout to you (and pep) for the inspiration

English

676

Ari Karchmer@non0knowledge·18 May

@Dan_Jeffries1 @HermanNarula Isn't their argument centered around the fact that when it comes to x-risk, evidence is no longer a prerequisite to funding and taking preventative measures? In other words, non-zero chance of extinction has infinitely negative expected payoff, so little evidence is needed.

English

Daniel Jeffries@Dan_Jeffries1·18 May

This is exactly right. Ten years from now they will still be telling us it's all coming to an end when billions are using the tech and we've had the kind of problems we always have with tech, like privacy, overreach, surveillance, centralization, scams and the like, but not super intelligent machines deciding we're bugs. Still they'll be telling us it's surely just around the corner, you just have to keep the faith. It's basically a religion. No proof needed whatsoever. Just faith. The more proof they get that they're wrong the more it strengthens their faith that they were right all along.

English

254

Herman Narula@HermanNarula·18 May

Some people’s identity and careers are built on a Sky-net scenario *rooted in speculation on the behaviour of a thing that doesn’t exist* they have lost objectivity.

Daniel Jeffries@Dan_Jeffries1

Lot of absurd takes like this on the superalignment team leaving OpenAI. The more likely reason they left is not because Ilya and Jan saw some super advanced AI emerging that they couldn't handle but that they didn't and as the cognitive dissonance hit, OpenAI and other practical teams building real world AI are realizng this fantasy of super intelligent machines rising up and getting out of control is a waste of time, money and resources. So they slowly and correctly starved that team of compute that could be used for more useful things like building capabilities into their products, which is what AI are, products.

English

1.6K

Ari Karchmer@non0knowledge·3 May

Want to reach out? Collaborate? My DMs are open.

English

136

Ari Karchmer@non0knowledge·3 May

What other computational separations exist? Can they be more “natural” if the computational advantage is smaller (we study super-polynomial advantages)? How else can we use formal relationships between Crypto and ML to heuristically argue about “real-world” ML phenomena?

English

159

Ari Karchmer@non0knowledge·3 May

Accepted to #ICML2024: “On Stronger Computational Separations Between #Multimodal and Unimodal #MachineLearning” (arxiv.org/abs/2404.02254)

English

434

发现

@ylecun @ziv_ravid @JudahGoldfeder @elonmusk @demishassabis @WillManidis @deanwball @roydanroy