Akos Kadar

6.1K posts

Akos Kadar

Akos Kadar

@kadarakos

Machine learning researcher and developer.

Berlin, Germany Katılım Ekim 2012
480 Takip Edilen470 Takipçiler
Akos Kadar retweetledi
Oliver Prompts
Oliver Prompts@oliviscusAI·
🚨 BREAKING: NVIDIA proved backpropagation isn't the only way to build an AI. They trained billion-parameter models without a single gradient. Every AI you use today relies on backpropagation. It requires complex calculus, exploding memory, and massive GPU clusters. Meanwhile, an ancient, gradient-free method called Evolution Strategies (ES) was written off as impossible to scale. Until now. NVIDIA and Oxford just dropped EGGROLL. Instead of generating massive, full-rank matrices for every mutation, they split them into two tiny ones. The AI mutates. It tests. It keeps what works. Like biological evolution. But now, it does it with hundreds of thousands of parallel mutations at once. Throughput is now as fast as batched inference. They are pretraining models entirely from scratch using only simple integers. No backprop. No decimals. No gradients. We thought the future of AI required endless clusters of precision hardware. It turns out, we just needed to evolve.
Oliver Prompts tweet media
English
101
422
2.4K
153.7K
Akos Kadar retweetledi
(((ل()(ل() 'yoav))))👾
another remark on this: this can be seen as a technical complaint about academic credit ("we also use JL transform when creating the codebook and they didn't acknowledge that"). but it is more than that. by reading the TurboQuant paper, one gets the impression that JL / random projection is the major component. but since RaBitQ also uses JL, then if TurboQuant is indeed better, it means that the thing that actually works for TurboQuant (the contribution) is not JL but something else. And currently, we the readers cannot know this without implementing and checking. so it is not only credit assignment. if TurboQuant authors would say "RaBitQ also did JL, but we differ from them by doing XYZ, which improves things from this to that" we as readers would get a much more informative paper, and TurboQuant authors will have written about an actual contribution to state of the art.
Jianyang Gao@gaoj0017

The TurboQuant paper (ICLR 2026) contains serious issues in how it describes RaBitQ, including incorrect technical claims and misleading theory/experiment comparisons. We flagged these issues to the authors before submission. They acknowledged them, but chose not to fix them. The paper was later accepted and widely promoted by Google, reaching tens of millions of views. We’re speaking up now because once a misleading narrative spreads, it becomes much harder to correct. We’ve written a public comment on openreview (openreview.net/forum?id=tO3AS…). We would greatly appreciate your attention and help in sharing it.

English
5
14
161
27.8K
Akos Kadar retweetledi
Olga Zaghen
Olga Zaghen@olgazaghen·
🔮 Working on ML on curved manifolds? Don't miss out on Jacobi Fields! 🔮 I wrote a quick, highly visual and hopefully accessible introduction to the topic: "Jacobi Fields in Machine Learning" 🤠 Check it out here: olgatticus.github.io/blog/jacobi-fi…!
Olga Zaghen tweet media
English
12
68
447
24.5K
Akos Kadar retweetledi
Nando de Freitas
Nando de Freitas@NandoDF·
This is another outstanding theoretical step towards understanding intelligence by Pedro, aka @AdaptiveAgents Some people still like to try to explain it all through RL, but I feel other explanations that emphasise the role of the environment, entropy minimisation, (multi) agency and complexity, or the laws of physics can be far more compelling. One view I’m loving is the one of Michael Levin: youtu.be/XheAMrS8Q1c?si…
YouTube video
YouTube
Pedro A. Ortega@AdaptiveAgents

Agency is usually formalized as utility maximization. But must it be? LLMs suggest a different foundation: intelligence as acquiring behavioral schemas from interaction structure. My new paper: "Universal AI as Imitation" investigates the limit-case of LLM-style models.

English
4
5
64
15.6K
Akos Kadar retweetledi
Sepp Hochreiter
Sepp Hochreiter@HochreiterSepp·
xLSTM more expressive than transformer, Mamba: arxiv.org/abs/2603.03612 *nonlinear RNNs: sLSTM, LSTM *DPLR linear RNNs: mLSTM, RWKV, DeltaNet *Non PNC1: Mamba, Transformer “fundamental expressivity gaps between linear and nonlinear RNNs” World models require nonlinear RNNs.
Sepp Hochreiter tweet mediaSepp Hochreiter tweet media
English
10
54
353
24.9K
Akos Kadar retweetledi
Priyanka Vergadia
Priyanka Vergadia@pvergadia·
JUST DROPPED: Anthropic's research proves AI coding tools are secretly making developers worse. "AI use impairs conceptual understanding, code reading, and debugging without delivering significant efficiency gains." -- That's the paper's actual conclusion. 17% score drop learning new libraries with AI. Sub-40% scores when AI wrote everything. 0 measurable speed improvement. → Prompting replaces thinking, not just typing → Comprehension gaps compound — you ship code you can't debug → The productivity illusion hides until something breaks in prod Here's why this changes everything: Speed metrics look fine on a dashboard. Understanding gaps don't show up until a critical failur and when they do the whole team is lost. Forcing AI adoption for "10x output" is a slow-burning technical debt nobody is measuring. Full paper: arxiv.org/abs/2601.20245
Priyanka Vergadia tweet media
English
159
629
2.3K
281.3K
Akos Kadar retweetledi
Peter Holderrieth
Peter Holderrieth@peholderrieth·
🚀MIT Flow Matching and Diffusion Lecture 2026 Released (diffusion.csail.mit.edu)! We just released our new MIT 2026 course on flow matching and diffusion models! We teach the full stack of modern AI image, video, protein generators - theory and practice. We include: 📺 Videos: Step-by-step derivations. 📝 Notes: Mathematically self-contained lecture notes 💻 Coding: Hands-on exercises for every component We fully improved last years’ iteration and added new topics: latent spaces, diffusion transformers, building language models with discrete diffusion models. Everything is available here: diffusion.csail.mit.edu A huge thanks to Tommi Jaakkola for his support in making this class possible and Ashay Athalye (MIT SOUL) for the incredible production! Was fun to do this with @RShprints! #MachineLearning #GenerativeAI #MIT #DiffusionModels #AI
Peter Holderrieth tweet media
English
14
394
2.2K
517.6K
Akos Kadar retweetledi
Jitendra MALIK
Jitendra MALIK@JitendraMalikCV·
With Emmanuel Dupoux scp.net/persons/dupoux/ and Yann LeCun @ylecun, we consider a cognitive science inspired AI. We analyse how autonomous learning works in living organisms, and propose a roadmap for reproducing it in artificial systems. lnkd.in/eNWDmuqT
English
9
78
447
62.8K
Akos Kadar retweetledi
Sepp Hochreiter
Sepp Hochreiter@HochreiterSepp·
xLSTM Distillation: arxiv.org/abs/2603.15590 Near-lossless distillation of quadratic Transformer LLMs into linear xLSTM architectures enables cost- and energy-efficient alternatives without sacrificing performance. xLSTM variants of instruction-tuned Llama, Qwen, & Olmo models.
Sepp Hochreiter tweet mediaSepp Hochreiter tweet media
English
5
59
314
22.5K
Akos Kadar retweetledi
Kimi.ai
Kimi.ai@Kimi_Moonshot·
Introducing 𝑨𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 𝑹𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔: Rethinking depth-wise aggregation. Residual connections have long relied on fixed, uniform accumulation. Inspired by the duality of time and depth, we introduce Attention Residuals, replacing standard depth-wise recurrence with learned, input-dependent attention over preceding layers. 🔹 Enables networks to selectively retrieve past representations, naturally mitigating dilution and hidden-state growth. 🔹 Introduces Block AttnRes, partitioning layers into compressed blocks to make cross-layer attention practical at scale. 🔹 Serves as an efficient drop-in replacement, demonstrating a 1.25x compute advantage with negligible (<2%) inference latency overhead. 🔹 Validated on the Kimi Linear architecture (48B total, 3B activated parameters), delivering consistent downstream performance gains. 🔗Full report: github.com/MoonshotAI/Att…
Kimi.ai tweet media
English
334
2.1K
13.6K
4.9M
Akos Kadar retweetledi
Judea Pearl
Judea Pearl@yudapearl·
We are notified of a unique event in the history of AI-investment: Yann LeCun's AMI Labs launches with $1.03 Billion to build AI "that understand the world". frenchtechjournal.com/yann-lecuns-am… Comment: There is no "understanding the world" without causal modeling of the world and, strangely, LeCun has not shown any interest in causal modeling in the past. I do not know what to make of it except to repeat my comments when the WSJ article came out: archive.is/2025.11.16-234…. I said: "evidently, LeCun has just discovered "world models", I hope to see lots of funding pouring into CI soon."
English
15
17
264
51.9K
Akos Kadar retweetledi
Seungwook Han
Seungwook Han@seungwookh·
Can language models learn useful priors without ever seeing language? We pre-pre-train transformers on neural cellular automata — fully synthetic, zero language. This improves language modeling by up to 6%, speeds up convergence by 40%, and strengthens downstream reasoning. Surprisingly, it even beats pre-pre-training on natural text! Blog: hanseungwook.github.io/blog/nca-pre-p… (1/n)
Seungwook Han tweet media
English
48
261
1.7K
244.7K
Akos Kadar retweetledi
DailyPapers
DailyPapers@HuggingPapers·
Flash-KMeans Achieving up to 17.9x speedup over baselines and 200x over FAISS via IO-aware FlashAssign kernels that eliminate memory bottlenecks and atomic contention in GPU clustering.
DailyPapers tweet media
English
4
42
356
23.3K
Akos Kadar retweetledi
Percy Liang
Percy Liang@percyliang·
Normally replay old data reduces forgetting, but it actually helps you learn on new data too! We finally put this paper out on arxiv, but had it up as a Marin GitHub issue ~1 year ago: github.com/marin-communit…
Suhas Kotha@kothasuhas

to improve fine-tuning data efficiency, replay generic pre-training data not only does this reduce forgetting, it actually improves performance on the fine-tuning domain! especially when fine-tuning data is scarce in pre-training (w/ @percyliang)

English
12
25
250
35.9K
Akos Kadar retweetledi
Kianté Brantley
Kianté Brantley@xkianteb·
Does LLM RL post-training need to be on-policy?
English
10
45
328
111.5K
Akos Kadar retweetledi
Grigory Sapunov
Grigory Sapunov@che_shr_cat·
1/ We know Transformers fail at length extrapolation. But new research shows a deeper flaw: they fail at IN-DISTRIBUTION state tracking. They don't learn algorithmic rules, they just memorize isolated circuits per length. 🧵
Grigory Sapunov tweet media
English
7
47
385
35.6K