Marcin Sendera

247 posts

Marcin Sendera

@MarcinSendera

Ph.D. student in deep learning @JagiellonskiUni. Learning the machines how to learn. Also, working on enhancing the existing generative models.

Katılım Şubat 2022

1.5K Takip Edilen270 Takipçiler

Sabitlenmiş Tweet

Marcin Sendera@MarcinSendera·16 Eki

I'm excited to share a personal update. I've just begun a research internship at MILA, where I'll be working on Generative AI and Bayesian DL with Kolya Malkin and Prof. Yoshua Bengio for the next few months. If you're in Montreal, please reach out! 🚀🌟 #MILA #MTL #generativeAI

English

3.1K

Marcin Sendera retweetledi

Floor Eijkelboom@FEijkelboom·22 Nis

Flow-LLM Blogpost :D flow-based-llms.github.io In the last few weeks, a bunch of work on flows for language came out 🌊 That is exciting, because it makes truly parallel text generation feel real: generation where models can keep refining the whole response during inference, instead of committing token by token. I wrote an intuitive and animated introduction to the area — why autoregression has a structural ceiling, why discrete diffusion only partly escapes it, and why flows may be the first genuinely parallel alternative. Here's an overview of the key parts of the blog - and let's chat at #ICLR2026 :)

English

353

45K

Marcin Sendera retweetledi

ML in PL@MLinPL·3 Nis

Happy Easter from ML in PL 🐣 May your gradients always descend, your losses converge, and the problems you care about turn out to be in P. And if they don't — well, at least you'll have good company in the open problems section. Speaking of which: this week's recordings come from the Witold Lipski Award session, which honors outstanding young Polish researchers in computer science. Three talks, three genuinely hard questions. Piotr Ostropolski-Nalewaja — My Favourite Problem in Database Theory and a Few Other Things When does one database query contain another, under bag semantics? The problem has been open for decades. Piotr walks through the historical background, recent breakthroughs, and why it remains stubbornly unresolved — a good reminder that "basic" questions in theory are rarely basic. Marcin Sendera — Beyond the Known: Probabilistic Inference for the AI Scientist What would it actually take to build an AI that discovers genuinely new knowledge, rather than interpolating existing data? Marcin's answer runs through Bayesian inference, the intractability of MCMC at scale, and his own work on diffusion-style samplers developed during a research stay at Mila — building inference engines that are scalable, mode-covering, and controllable. The AI Scientist framing is ambitious, but he earns it. Marek Sokołowski — Algorithmics of Dynamic Well-Structured Graphs Graphs where edges and nodes change over time — social networks, communication systems, anything that shifts. The question is how to efficiently maintain useful structural properties (tree-likeness, small separators) as the graph evolves. Quietly important work for anyone building systems that need to reason about changing relationships. Links in the thread ⬇️

English

167

Marcin Sendera retweetledi

Molei Tao@MoleiTaoMath·24 Şub

Does GenAI create new knowledge? arxiv.org/abs/2602.06021 gives * 1st explicit characterization of diffusion model's generalization * more precise than offered by classical stat. learning theory * systematic integration of various inductive biases (training+architecture+inference)

English

170

12.2K

Marcin Sendera@MarcinSendera·2 Nis

@niedakhPL @marcinnaps Aż zerknąłem na ten wątek. Naprawdę, brak słów… Chociaż, w sumie to nie, ale są raczej uważane za nieprzyjemne. Tam przydałby się przynajmniej community note

Polski

Fizyk matematyczny@marcinnaps·1 Nis

Smutna rzeczywistość ery ai. Start-upy używające matematyki dla zbierania funduszy.

Daniel Litt@littmath

@prz_chojecki I'm happy to give you some time to check that the error I've flagged is real. But extremely bad behavior to claim to have solved this problem, given that neither you nor anyone else has checked the solution's correctness, and that someone has pointed to an error.

Polski

4.1K

Marcin Sendera retweetledi

Emiel Hoogeboom@emiel_hoogeboom·23 Mar

You may think discrete distillation is fundamentally flawed, you are (surprisingly) wrong. 🤯 Meet Discrete Moment Distillation (D-MMD). It is a new method that brings fast, few-step sampling to discrete diffusion models! 🧵👇

English

252

57.2K

Marcin Sendera retweetledi

Alex Tong@AlexanderTong7·5 Mar

Ever wondered why we train masked diffusion LMs with uniform unmasking, but sample completely differently at inference? 🤔 We tackle this disconnect in PAPL. Thrilled to co-author this work with @bezemekz! Catch our oral at #ICLR2026, and check out the breakdown below: 👇🧵

Zachary Bezemek@bezemekz

(1/3) Excited to give an oral presentation of PAPL at #ICLR2026 ! Camera-ready: arxiv.org/abs/2509.23405 We ask: Why do we train masked diffusion LMs to match reference dynamics which unmask tokens uniformly at random, when we don’t sample that way at inference?

English

4.4K

Marcin Sendera retweetledi

Accepted papers at TMLR@TmlrPub·5 Şub

From discrete-time policies to continuous-time diffusion samplers: Asymptotic equivalences and fa... Julius Berner, Lorenz Richter, Marcin Sendera, Jarrid Rector-Brooks, Nikolay Malkin. Action editor: Valentin De Bortoli. openreview.net/forum?id=xLE3x… #bo

English

308

Marcin Sendera retweetledi

Quanta Magazine@QuantaMagazine·4 Şub

A new proof reveals a surprising new link between graph theory and the Fourier transform. “It is a little bit like the moon landing or the 4-minute mile,” said Tom Sanders of the University of Oxford. “It’s not clear ahead of time what this is going to open up.” quantamagazine.org/networks-hold-…

English

131

805

187.8K

Marcin Sendera retweetledi

Tom Zahavy@TZahavy·30 Oca

Can AI truly invent, or is it just compressing what we already know? 🤖🧠 In my position paper, LLMs can’t jump, I use Einstein’s happiest thought as a case study to show why LLMs are structurally incapable of the abductive "jump" needed for scientific discovery and how interactive environments like 🧞 offer a path forward Paper: philsci-archive.pitt.edu/28024/1/Scient…

English

495

53K

Marcin Sendera retweetledi

François Chollet@fchollet·29 Oca

One of the best ways to contribute directly to the current frontier of AI research is to build agents that can solve ARC-AGI-3 environments with human-level efficiency. Today we're releasing a toolkit that lets you interact with all public environments locally, at 2000 FPS. You can run your first game with a super simple Python script (see our docs), and you can watch your agent interact with the environment in real-time.

ARC Prize@arcprize

Today we're launching the ARC-AGI-3 Toolkit Your agents can now interact with environments at 2,000 FPS, locally. We're open sourcing the environment engine, 3 human-verified games (AI scores <5%), and human baseline scores. ARC-AGI-3 launches March 25, 2026.

English

660

66.7K

Marcin Sendera retweetledi

Julius Berner@julberner·16 Oca

🚀🎬We introduce TMD (Transition Matching Distillation): 480p videos generated from text prompts in < 3 NFEs! 1️⃣Main backbone for feature extraction and lightweight head for iterative refinement 2️⃣Distilled from Wan2.1 14B T2V combining MeanFlow & DMD2 🔗research.nvidia.com/labs/genair/tmd

English

13.5K

Marcin Sendera@MarcinSendera·13 Oca

@niedakhPL A mówimy o sytuacji, kiedy jeden z nich trwał trzy miesiące, ale pojechałem w czasie studiów magisterskich pracować na Uniwersytecie w Cambridge (mało dla ministerstwa), a później na doktoracie do Mili na rok - niby spoko, ale dalej krótko. piękne stypendium, pozdrawiam ministra

Polski

Marcin Sendera@MarcinSendera·13 Oca

@niedakhPL w jednym roku Preludium NCN, które dostałem - bardzo ważny i innowacyjny temat, ale projekt trwa krótko (czyli 10 miesięcy w momencie pisania), w kolejnym - temat mało istotny xD a staże naukowe zwykle - ośrodki średnie, może dobre (Cambridge i Mila), ale za to na pewno krótkie

Polski

Marcin Sendera retweetledi

Chen Sun 🤖@ChenSun92·9 Oca

This paper "From Entropy to Epiplexity" was gorgeously clarifying on a number of important ideas on the quality of your training data, compression, OOD generalization, etc. Here's a deep dive 🏊‍♂️👇: We are likely all aware of the Minimum Description Length (MDL) principle, which has long been theorized as a proxy for generalization: the model that compresses the training data most efficiently is likely capturing the true underlying mechanisms rather than memorizing noise. But since we cannot practically search the space of all possible programs to find the true Kolmogorov complexity, the central question becomes: what does it take to best approximate MDL in a way that actually predicts generalization? This paper proposes a novel method called Requential Coding. And yet is this necessary? What is wrong with the prequential coding strategy that people have been using (arxiv.org/pdf/1802.07044) up till now to compute the description? 1. What is actually wrong with prequential coding? Prequential coding (Sequential MDL) estimates complexity by simply summing the loss during training. It has served as an alternative to the classic "Two-Part Code". The issue is that this sum includes both the structure the model learns and the irreducible noise of the data. To isolate the model's complexity, one has to heuristically subtract the "final loss" (an estimate of noise) from the total area. This is mathematically loose: it relies on the symmetry of information which breaks down under computational bounds, and if the model hasn't perfectly converged, the baseline is wrong, contaminating our measure of structure with noise. 2. What is the idea of epiplexity and requential coding? Epiplexity separates "useful structure" from random noise by measuring the complexity of the learning process itself. The insight here is that the most efficient way to describe a smart model isn't to list its billions of weights, but to describe the instructions for training it! 3. Why does requential coding need an observer? This was the most subtle point. Requential coding requires a "Teacher" observer to act as a shared reference for the "noise." In the diagram below, the area under the Teacher Curve represents the shared noise/entropy—we don't pay bits for this because the sender and receiver share a random seed (a shared source of noise). The area between curves is the pure "surprise" of the structural update. We only pay bits to describe how the student deviates from the teacher's path, effectively isolating the structure cost from the data's inherent entropy. 4. Why doesn't prequential coding need an observer? Prequential coding doesn't use an external observer because "reality" (the data stream) acts as the observer. The student predicts , and reality reveals . The cost is the raw loss. Because reality mixes signal and noise indistinguishably, we cannot separate them without the heuristic subtraction mentioned in Question 1. ############################### Empirical results: While Prequential and Requential estimates often correlate on natural data, they can mismatch entirely in cases of emergence. For example, in the "Game of Life," a brute-force model that models the laws of physics in the game looks better to Prequential coding, but a bounded model that learns concepts like "gliders" has higher epiplexity, which Requential coding correctly identifies as the transferable structure. ############################### Ultimately, the Requential coding strategy works because information is relative to the observer's constraints. One can think of it in the following way: - To God (Infinite Compute): Nothing is random. Everything else is structural (low entropy). - To a Rock (Zero Compute): Everything is random noise. Nothing is predictable. - To an AI (Bounded Compute): We are in the middle. Epiplexity measures exactly how much of that "apparent randomness" the AI has successfully converted into "predictable structure" by burning compute (training). So, to actually measure the meaningful structure in a model through its training process and rigorously remove the pure noise part, it almost innately requires an AI observer (the Teacher) to define what "noise" looks like. Pretty nice paper @m_finzi 🌹

Marc Finzi@m_finzi

1/🧵 We are very excited to release our new paper! From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence arxiv.org/abs/2601.03220 with amazing team @ShikaiQiu @yidingjiang @Pavel_Izmailov @zicokolter @andrewgwils

English

343

31.8K

Marcin Sendera retweetledi

Andrew Gordon Wilson@andrewgwils·7 Oca

We introduce epiplexity, a new measure of information that provides a foundation for how to select, generate, or transform data for learning systems. We have been working on this for almost 2 years, and I cannot contain my excitement! 1/7

Marc Finzi@m_finzi

English

191

1.3K

163K

Marcin Sendera retweetledi

Hugo Larochelle@hugo_larochelle·2 Oca

A little under 4 years ago, @RaiaHadsell, @kchonyc and I launched TMLR. I am SO proud of what we've achieved since then, and I'm particularly happy to leave TMLR to a remarkable team of EICs, with @thegautamkamath, @NailaMurray, Nihar B. Shah and @lcharlin .

Transactions on Machine Learning Research@TmlrOrg

The end of 2025 marks the end of Hugo Larochelle's term as (Founding co-) Editor-in-chief of TMLR. It is an understatement to say that he was indispensable to making TMLR what it is today. Huge thanks to @hugo_larochelle for everything he's done!

English

269

42.4K

Marcin Sendera retweetledi

Yoshua Bengio@Yoshua_Bengio·19 Ara

OpenReview is a pillar of progress in the AI research community. Now it needs our support. Along with several of my colleagues, I have pledged to help, and I encourage anyone who can to do the same. openreview.net/donate

English

354

61K

Marcin Sendera@MarcinSendera·9 Ara

@niedakhPL Nie no, akurat bez przesady - Michał zrobił super robotę, ale w trakcie pobytu w Princeton, więc tutaj raczej nasze wydatki nie mają na to wpływu, niestety :/

Polski

Marcin Sendera@MarcinSendera·3 Ara

@PiotrRMilos Congrats!

English

Piotr Miłoś@PiotrRMilos·2 Ara

It’s a perfect day to announce that I’ve joined Mistral as an AI scientist, when our new flagship model has arrived :). Obviously, I did not contribute to this one, but I have high hopes about the next one :). I am very excited about this opportunity for a few reasons. On the personal level, it's going to exciting, a lot of learning and cool stuff. More broadly, it is the first time when the frontier lab starts its operations in Warsaw. I’m really proud about the speed of development that the Polish AI ecosystem has witnessed, hope to see much more great things happening :).

Mistral AI@MistralAI

Introducing the Mistral 3 family of models: Frontier intelligence at all sizes. Apache 2.0. Details in 🧵

English

1.2K

82.4K

Keşfet

@niedakhPL @marcinnaps @bezemekz @m_finzi @RaiaHadsell @kchonyc @thegautamkamath @NailaMurray