Akos Kadar

6.1K posts

Akos Kadar

@kadarakos

Machine learning researcher and developer.

Berlin, Germany Katılım Ekim 2012

480 Takip Edilen470 Takipçiler

Akos Kadar retweetledi

Oliver Prompts@oliviscusAI·5d

🚨 BREAKING: NVIDIA proved backpropagation isn't the only way to build an AI. They trained billion-parameter models without a single gradient. Every AI you use today relies on backpropagation. It requires complex calculus, exploding memory, and massive GPU clusters. Meanwhile, an ancient, gradient-free method called Evolution Strategies (ES) was written off as impossible to scale. Until now. NVIDIA and Oxford just dropped EGGROLL. Instead of generating massive, full-rank matrices for every mutation, they split them into two tiny ones. The AI mutates. It tests. It keeps what works. Like biological evolution. But now, it does it with hundreds of thousands of parallel mutations at once. Throughput is now as fast as batched inference. They are pretraining models entirely from scratch using only simple integers. No backprop. No decimals. No gradients. We thought the future of AI required endless clusters of precision hardware. It turns out, we just needed to evolve.

English

101

422

2.4K

153.7K

Akos Kadar retweetledi

(((ل()(ل() 'yoav))))👾@yoavgo·5d

another remark on this: this can be seen as a technical complaint about academic credit ("we also use JL transform when creating the codebook and they didn't acknowledge that"). but it is more than that. by reading the TurboQuant paper, one gets the impression that JL / random projection is the major component. but since RaBitQ also uses JL, then if TurboQuant is indeed better, it means that the thing that actually works for TurboQuant (the contribution) is not JL but something else. And currently, we the readers cannot know this without implementing and checking. so it is not only credit assignment. if TurboQuant authors would say "RaBitQ also did JL, but we differ from them by doing XYZ, which improves things from this to that" we as readers would get a much more informative paper, and TurboQuant authors will have written about an actual contribution to state of the art.

Jianyang Gao@gaoj0017

The TurboQuant paper (ICLR 2026) contains serious issues in how it describes RaBitQ, including incorrect technical claims and misleading theory/experiment comparisons. We flagged these issues to the authors before submission. They acknowledged them, but chose not to fix them. The paper was later accepted and widely promoted by Google, reaching tens of millions of views. We’re speaking up now because once a misleading narrative spreads, it becomes much harder to correct. We’ve written a public comment on openreview (openreview.net/forum?id=tO3AS…). We would greatly appreciate your attention and help in sharing it.

English

161

27.8K

Akos Kadar retweetledi

Olga Zaghen@olgazaghen·23 Mar

🔮 Working on ML on curved manifolds? Don't miss out on Jacobi Fields! 🔮 I wrote a quick, highly visual and hopefully accessible introduction to the topic: "Jacobi Fields in Machine Learning" 🤠 Check it out here: olgatticus.github.io/blog/jacobi-fi…!

English

447

24.5K

Akos Kadar retweetledi

Alessandro Sordoni@murefil·26 Mar

and even Zork! @Cote_Marc @ericxyuan have been working on this for a looong time can tell you about it :-D, still unsolved even w. 4.6, arxiv.org/pdf/2504.14128

Tim Rocktäschel@_rockt

"The only unsaturated agentic intelligence benchmark in the world" Excuse me? @NetHack_LE is unsaturated since 2020.

English

446

Akos Kadar retweetledi

Sasha Rush@srush_nlp·25 Mar

Lots of juicy details from Composer 2 training. How we think about RL, how we set up envs, why we think it scales, why we make our own evals...

Cursor@cursor_ai

We're releasing a technical report describing how Composer 2 was trained.

English

469

48K

Akos Kadar retweetledi

Nando de Freitas@NandoDF·22 Mar

This is another outstanding theoretical step towards understanding intelligence by Pedro, aka @AdaptiveAgents Some people still like to try to explain it all through RL, but I feel other explanations that emphasise the role of the environment, entropy minimisation, (multi) agency and complexity, or the laws of physics can be far more compelling. One view I’m loving is the one of Michael Levin: youtu.be/XheAMrS8Q1c?si…

YouTube

Pedro A. Ortega@AdaptiveAgents

Agency is usually formalized as utility maximization. But must it be? LLMs suggest a different foundation: intelligence as acquiring behavioral schemas from interaction structure. My new paper: "Universal AI as Imitation" investigates the limit-case of LLM-style models.

English

15.6K

Akos Kadar retweetledi

Sepp Hochreiter@HochreiterSepp·22 Mar

xLSTM more expressive than transformer, Mamba: arxiv.org/abs/2603.03612 *nonlinear RNNs: sLSTM, LSTM *DPLR linear RNNs: mLSTM, RWKV, DeltaNet *Non PNC1: Mamba, Transformer “fundamental expressivity gaps between linear and nonlinear RNNs” World models require nonlinear RNNs.

English

353

24.9K

Akos Kadar retweetledi

Priyanka Vergadia@pvergadia·19 Mar

JUST DROPPED: Anthropic's research proves AI coding tools are secretly making developers worse. "AI use impairs conceptual understanding, code reading, and debugging without delivering significant efficiency gains." -- That's the paper's actual conclusion. 17% score drop learning new libraries with AI. Sub-40% scores when AI wrote everything. 0 measurable speed improvement. → Prompting replaces thinking, not just typing → Comprehension gaps compound — you ship code you can't debug → The productivity illusion hides until something breaks in prod Here's why this changes everything: Speed metrics look fine on a dashboard. Understanding gaps don't show up until a critical failur and when they do the whole team is lost. Forcing AI adoption for "10x output" is a slow-burning technical debt nobody is measuring. Full paper: arxiv.org/abs/2601.20245

English

159

629

2.3K

281.3K

Akos Kadar retweetledi

Peter Holderrieth@peholderrieth·18 Mar

🚀MIT Flow Matching and Diffusion Lecture 2026 Released (diffusion.csail.mit.edu)! We just released our new MIT 2026 course on flow matching and diffusion models! We teach the full stack of modern AI image, video, protein generators - theory and practice. We include: 📺 Videos: Step-by-step derivations. 📝 Notes: Mathematically self-contained lecture notes 💻 Coding: Hands-on exercises for every component We fully improved last years’ iteration and added new topics: latent spaces, diffusion transformers, building language models with discrete diffusion models. Everything is available here: diffusion.csail.mit.edu A huge thanks to Tommi Jaakkola for his support in making this class possible and Ashay Athalye (MIT SOUL) for the incredible production! Was fun to do this with @RShprints! #MachineLearning #GenerativeAI #MIT #DiffusionModels #AI

English

394

2.2K

517.6K

Akos Kadar retweetledi

Jitendra MALIK@JitendraMalikCV·18 Mar

With Emmanuel Dupoux scp.net/persons/dupoux/ and Yann LeCun @ylecun, we consider a cognitive science inspired AI. We analyse how autonomous learning works in living organisms, and propose a roadmap for reproducing it in artificial systems. lnkd.in/eNWDmuqT

English

447

62.8K

Akos Kadar retweetledi

Sepp Hochreiter@HochreiterSepp·17 Mar

xLSTM Distillation: arxiv.org/abs/2603.15590 Near-lossless distillation of quadratic Transformer LLMs into linear xLSTM architectures enables cost- and energy-efficient alternatives without sacrificing performance. xLSTM variants of instruction-tuned Llama, Qwen, & Olmo models.

English

314

22.5K

Akos Kadar retweetledi

Kimi.ai@Kimi_Moonshot·16 Mar

Introducing 𝑨𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 𝑹𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔: Rethinking depth-wise aggregation. Residual connections have long relied on fixed, uniform accumulation. Inspired by the duality of time and depth, we introduce Attention Residuals, replacing standard depth-wise recurrence with learned, input-dependent attention over preceding layers. 🔹 Enables networks to selectively retrieve past representations, naturally mitigating dilution and hidden-state growth. 🔹 Introduces Block AttnRes, partitioning layers into compressed blocks to make cross-layer attention practical at scale. 🔹 Serves as an efficient drop-in replacement, demonstrating a 1.25x compute advantage with negligible (<2%) inference latency overhead. 🔹 Validated on the Kimi Linear architecture (48B total, 3B activated parameters), delivering consistent downstream performance gains. 🔗Full report: github.com/MoonshotAI/Att…

English

334

2.1K

13.6K

4.9M

Akos Kadar retweetledi

Judea Pearl@yudapearl·12 Mar

We are notified of a unique event in the history of AI-investment: Yann LeCun's AMI Labs launches with $1.03 Billion to build AI "that understand the world". frenchtechjournal.com/yann-lecuns-am… Comment: There is no "understanding the world" without causal modeling of the world and, strangely, LeCun has not shown any interest in causal modeling in the past. I do not know what to make of it except to repeat my comments when the WSJ article came out: archive.is/2025.11.16-234…. I said: "evidently, LeCun has just discovered "world models", I hope to see lots of funding pouring into CI soon."

English

264

51.9K

Akos Kadar retweetledi

Seungwook Han@seungwookh·12 Mar

Can language models learn useful priors without ever seeing language? We pre-pre-train transformers on neural cellular automata — fully synthetic, zero language. This improves language modeling by up to 6%, speeds up convergence by 40%, and strengthens downstream reasoning. Surprisingly, it even beats pre-pre-training on natural text! Blog: hanseungwook.github.io/blog/nca-pre-p… (1/n)

English

261

1.7K

244.7K

Akos Kadar retweetledi

DailyPapers@HuggingPapers·12 Mar

Flash-KMeans Achieving up to 17.9x speedup over baselines and 200x over FAISS via IO-aware FlashAssign kernels that eliminate memory bottlenecks and atomic contention in GPU clustering.

English

356

23.3K

Akos Kadar retweetledi

Percy Liang@percyliang·7 Mar

Normally replay old data reduces forgetting, but it actually helps you learn on new data too! We finally put this paper out on arxiv, but had it up as a Marin GitHub issue ~1 year ago: github.com/marin-communit…

Suhas Kotha@kothasuhas

to improve fine-tuning data efficiency, replay generic pre-training data not only does this reduce forgetting, it actually improves performance on the fine-tuning domain! especially when fine-tuning data is scarce in pre-training (w/ @percyliang)

English

250

35.9K

Akos Kadar retweetledi

Kevin Patrick Murphy@sirbayes·6 Mar

I am delighted to see a new version of the book by @_sdbuchanan, @druv_pai , @pengwang2003 and @YiMaTweets . This is the best book on the foundations of deep representation learning! In this era of coding agents, the math is all you need to learn :) ma-lab-berkeley.github.io/deep-represent…

English

101

608

56.7K

Akos Kadar retweetledi

Kianté Brantley@xkianteb·27 Şub

Does LLM RL post-training need to be on-policy?

English

328

111.5K

Akos Kadar retweetledi

Grigory Sapunov@che_shr_cat·1 Mar

1/ We know Transformers fail at length extrapolation. But new research shows a deeper flaw: they fail at IN-DISTRIBUTION state tracking. They don't learn algorithmic rules, they just memorize isolated circuits per length. 🧵

English

385

35.6K

Akos Kadar retweetledi

Sander Dieleman@sedielem·24 Şub

I like diffusion models

Stefano Ermon@StefanoErmon

Mercury 2 is live 🚀🚀 The world’s first reasoning diffusion LLM, delivering 5x faster performance than leading speed-optimized LLMs. Watching the team turn years of research into a real product never gets old, and I’m incredibly proud of what we’ve built. We’re just getting started on what diffusion can do for language.

English

208

16.3K

Keşfet

@Cote_Marc @ericxyuan @AdaptiveAgents @RShprints @ylecun @_sdbuchanan @druv_pai @pengwang2003