Marcin Sendera

247 posts

Marcin Sendera

Marcin Sendera

@MarcinSendera

Ph.D. student in deep learning @JagiellonskiUni. Learning the machines how to learn. Also, working on enhancing the existing generative models.

Katılım Şubat 2022
1.5K Takip Edilen270 Takipçiler
Sabitlenmiş Tweet
Marcin Sendera
Marcin Sendera@MarcinSendera·
I'm excited to share a personal update. I've just begun a research internship at MILA, where I'll be working on Generative AI and Bayesian DL with Kolya Malkin and Prof. Yoshua Bengio for the next few months. If you're in Montreal, please reach out! 🚀🌟 #MILA #MTL #generativeAI
Marcin Sendera tweet media
English
0
3
31
3.1K
Marcin Sendera retweetledi
Floor Eijkelboom
Floor Eijkelboom@FEijkelboom·
Flow-LLM Blogpost :D flow-based-llms.github.io In the last few weeks, a bunch of work on flows for language came out 🌊 That is exciting, because it makes truly parallel text generation feel real: generation where models can keep refining the whole response during inference, instead of committing token by token. I wrote an intuitive and animated introduction to the area — why autoregression has a structural ceiling, why discrete diffusion only partly escapes it, and why flows may be the first genuinely parallel alternative. Here's an overview of the key parts of the blog - and let's chat at #ICLR2026 :)
Floor Eijkelboom tweet media
English
5
63
353
45K
Marcin Sendera retweetledi
ML in PL
ML in PL@MLinPL·
Happy Easter from ML in PL 🐣 May your gradients always descend, your losses converge, and the problems you care about turn out to be in P. And if they don't — well, at least you'll have good company in the open problems section. Speaking of which: this week's recordings come from the Witold Lipski Award session, which honors outstanding young Polish researchers in computer science. Three talks, three genuinely hard questions. Piotr Ostropolski-Nalewaja — My Favourite Problem in Database Theory and a Few Other Things When does one database query contain another, under bag semantics? The problem has been open for decades. Piotr walks through the historical background, recent breakthroughs, and why it remains stubbornly unresolved — a good reminder that "basic" questions in theory are rarely basic. Marcin Sendera — Beyond the Known: Probabilistic Inference for the AI Scientist What would it actually take to build an AI that discovers genuinely new knowledge, rather than interpolating existing data? Marcin's answer runs through Bayesian inference, the intractability of MCMC at scale, and his own work on diffusion-style samplers developed during a research stay at Mila — building inference engines that are scalable, mode-covering, and controllable. The AI Scientist framing is ambitious, but he earns it. Marek Sokołowski — Algorithmics of Dynamic Well-Structured Graphs Graphs where edges and nodes change over time — social networks, communication systems, anything that shifts. The question is how to efficiently maintain useful structural properties (tree-likeness, small separators) as the graph evolves. Quietly important work for anyone building systems that need to reason about changing relationships. Links in the thread ⬇️
ML in PL tweet media
English
1
1
6
167
Marcin Sendera retweetledi
Molei Tao
Molei Tao@MoleiTaoMath·
Does GenAI create new knowledge? arxiv.org/abs/2602.06021 gives * 1st explicit characterization of diffusion model's generalization * more precise than offered by classical stat. learning theory * systematic integration of various inductive biases (training+architecture+inference)
English
3
27
170
12.2K
Marcin Sendera
Marcin Sendera@MarcinSendera·
@niedakhPL @marcinnaps Aż zerknąłem na ten wątek. Naprawdę, brak słów… Chociaż, w sumie to nie, ale są raczej uważane za nieprzyjemne. Tam przydałby się przynajmniej community note
Polski
0
0
1
29
Fizyk matematyczny
Fizyk matematyczny@marcinnaps·
Smutna rzeczywistość ery ai. Start-upy używające matematyki dla zbierania funduszy.
Daniel Litt@littmath

@prz_chojecki I'm happy to give you some time to check that the error I've flagged is real. But extremely bad behavior to claim to have solved this problem, given that neither you nor anyone else has checked the solution's correctness, and that someone has pointed to an error.

Polski
4
0
18
4.1K
Marcin Sendera retweetledi
Emiel Hoogeboom
Emiel Hoogeboom@emiel_hoogeboom·
You may think discrete distillation is fundamentally flawed, you are (surprisingly) wrong. 🤯 Meet Discrete Moment Distillation (D-MMD). It is a new method that brings fast, few-step sampling to discrete diffusion models! 🧵👇
Emiel Hoogeboom tweet media
English
6
39
252
57.2K
Marcin Sendera retweetledi
Alex Tong
Alex Tong@AlexanderTong7·
Ever wondered why we train masked diffusion LMs with uniform unmasking, but sample completely differently at inference? 🤔 We tackle this disconnect in PAPL. Thrilled to co-author this work with @bezemekz! Catch our oral at #ICLR2026, and check out the breakdown below: 👇🧵
Zachary Bezemek@bezemekz

(1/3) Excited to give an oral presentation of PAPL at #ICLR2026 ! Camera-ready: arxiv.org/abs/2509.23405 We ask: Why do we train masked diffusion LMs to match reference dynamics which unmask tokens uniformly at random, when we don’t sample that way at inference?

English
0
4
34
4.4K
Marcin Sendera retweetledi
Accepted papers at TMLR
Accepted papers at TMLR@TmlrPub·
From discrete-time policies to continuous-time diffusion samplers: Asymptotic equivalences and fa... Julius Berner, Lorenz Richter, Marcin Sendera, Jarrid Rector-Brooks, Nikolay Malkin. Action editor: Valentin De Bortoli. openreview.net/forum?id=xLE3x… #bo
English
0
1
3
308
Marcin Sendera retweetledi
Quanta Magazine
Quanta Magazine@QuantaMagazine·
A new proof reveals a surprising new link between graph theory and the Fourier transform. “It is a little bit like the moon landing or the 4-minute mile,” said Tom Sanders of the University of Oxford. “It’s not clear ahead of time what this is going to open up.” quantamagazine.org/networks-hold-…
English
13
131
805
187.8K
Marcin Sendera retweetledi
Tom Zahavy
Tom Zahavy@TZahavy·
Can AI truly invent, or is it just compressing what we already know? 🤖🧠 In my position paper, LLMs can’t jump, I use Einstein’s happiest thought as a case study to show why LLMs are structurally incapable of the abductive "jump" needed for scientific discovery and how interactive environments like 🧞 offer a path forward Paper: philsci-archive.pitt.edu/28024/1/Scient…
Tom Zahavy tweet media
English
38
78
495
53K
Marcin Sendera retweetledi
François Chollet
François Chollet@fchollet·
One of the best ways to contribute directly to the current frontier of AI research is to build agents that can solve ARC-AGI-3 environments with human-level efficiency. Today we're releasing a toolkit that lets you interact with all public environments locally, at 2000 FPS. You can run your first game with a super simple Python script (see our docs), and you can watch your agent interact with the environment in real-time.
ARC Prize@arcprize

Today we're launching the ARC-AGI-3 Toolkit Your agents can now interact with environments at 2,000 FPS, locally. We're open sourcing the environment engine, 3 human-verified games (AI scores <5%), and human baseline scores. ARC-AGI-3 launches March 25, 2026.

English
33
77
660
66.7K
Marcin Sendera retweetledi
Julius Berner
Julius Berner@julberner·
🚀🎬We introduce TMD (Transition Matching Distillation): 480p videos generated from text prompts in < 3 NFEs! 1️⃣Main backbone for feature extraction and lightweight head for iterative refinement 2️⃣Distilled from Wan2.1 14B T2V combining MeanFlow & DMD2 🔗research.nvidia.com/labs/genair/tmd
English
3
17
64
13.5K
Marcin Sendera
Marcin Sendera@MarcinSendera·
@niedakhPL A mówimy o sytuacji, kiedy jeden z nich trwał trzy miesiące, ale pojechałem w czasie studiów magisterskich pracować na Uniwersytecie w Cambridge (mało dla ministerstwa), a później na doktoracie do Mili na rok - niby spoko, ale dalej krótko. piękne stypendium, pozdrawiam ministra
Polski
0
0
1
14
Marcin Sendera
Marcin Sendera@MarcinSendera·
@niedakhPL w jednym roku Preludium NCN, które dostałem - bardzo ważny i innowacyjny temat, ale projekt trwa krótko (czyli 10 miesięcy w momencie pisania), w kolejnym - temat mało istotny xD a staże naukowe zwykle - ośrodki średnie, może dobre (Cambridge i Mila), ale za to na pewno krótkie
Polski
1
0
0
13
Marcin Sendera retweetledi
Chen Sun 🤖
Chen Sun 🤖@ChenSun92·
This paper "From Entropy to Epiplexity" was gorgeously clarifying on a number of important ideas on the quality of your training data, compression, OOD generalization, etc. Here's a deep dive 🏊‍♂️👇: We are likely all aware of the Minimum Description Length (MDL) principle, which has long been theorized as a proxy for generalization: the model that compresses the training data most efficiently is likely capturing the true underlying mechanisms rather than memorizing noise. But since we cannot practically search the space of all possible programs to find the true Kolmogorov complexity, the central question becomes: what does it take to best approximate MDL in a way that actually predicts generalization? This paper proposes a novel method called Requential Coding. And yet is this necessary? What is wrong with the prequential coding strategy that people have been using (arxiv.org/pdf/1802.07044) up till now to compute the description? 1. What is actually wrong with prequential coding? Prequential coding (Sequential MDL) estimates complexity by simply summing the loss during training. It has served as an alternative to the classic "Two-Part Code". The issue is that this sum includes both the structure the model learns and the irreducible noise of the data. To isolate the model's complexity, one has to heuristically subtract the "final loss" (an estimate of noise) from the total area. This is mathematically loose: it relies on the symmetry of information which breaks down under computational bounds, and if the model hasn't perfectly converged, the baseline is wrong, contaminating our measure of structure with noise. 2. What is the idea of epiplexity and requential coding? Epiplexity separates "useful structure" from random noise by measuring the complexity of the learning process itself. The insight here is that the most efficient way to describe a smart model isn't to list its billions of weights, but to describe the instructions for training it! 3. Why does requential coding need an observer? This was the most subtle point. Requential coding requires a "Teacher" observer to act as a shared reference for the "noise." In the diagram below, the area under the Teacher Curve represents the shared noise/entropy—we don't pay bits for this because the sender and receiver share a random seed (a shared source of noise). The area between curves is the pure "surprise" of the structural update. We only pay bits to describe how the student deviates from the teacher's path, effectively isolating the structure cost from the data's inherent entropy. 4. Why doesn't prequential coding need an observer? Prequential coding doesn't use an external observer because "reality" (the data stream) acts as the observer. The student predicts , and reality reveals . The cost is the raw loss. Because reality mixes signal and noise indistinguishably, we cannot separate them without the heuristic subtraction mentioned in Question 1. ############################### Empirical results: While Prequential and Requential estimates often correlate on natural data, they can mismatch entirely in cases of emergence. For example, in the "Game of Life," a brute-force model that models the laws of physics in the game looks better to Prequential coding, but a bounded model that learns concepts like "gliders" has higher epiplexity, which Requential coding correctly identifies as the transferable structure. ############################### Ultimately, the Requential coding strategy works because information is relative to the observer's constraints. One can think of it in the following way: - To God (Infinite Compute): Nothing is random. Everything else is structural (low entropy). - To a Rock (Zero Compute): Everything is random noise. Nothing is predictable. - To an AI (Bounded Compute): We are in the middle. Epiplexity measures exactly how much of that "apparent randomness" the AI has successfully converted into "predictable structure" by burning compute (training). So, to actually measure the meaningful structure in a model through its training process and rigorously remove the pure noise part, it almost innately requires an AI observer (the Teacher) to define what "noise" looks like. Pretty nice paper @m_finzi 🌹
Chen Sun 🤖 tweet media
Marc Finzi@m_finzi

1/🧵 We are very excited to release our new paper! From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence arxiv.org/abs/2601.03220 with amazing team @ShikaiQiu @yidingjiang @Pavel_Izmailov @zicokolter @andrewgwils

English
11
38
343
31.8K
Marcin Sendera retweetledi
Andrew Gordon Wilson
Andrew Gordon Wilson@andrewgwils·
We introduce epiplexity, a new measure of information that provides a foundation for how to select, generate, or transform data for learning systems. We have been working on this for almost 2 years, and I cannot contain my excitement! 1/7
Marc Finzi@m_finzi

1/🧵 We are very excited to release our new paper! From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence arxiv.org/abs/2601.03220 with amazing team @ShikaiQiu @yidingjiang @Pavel_Izmailov @zicokolter @andrewgwils

English
34
191
1.3K
163K
Marcin Sendera retweetledi
Hugo Larochelle
Hugo Larochelle@hugo_larochelle·
A little under 4 years ago, @RaiaHadsell, @kchonyc and I launched TMLR. I am SO proud of what we've achieved since then, and I'm particularly happy to leave TMLR to a remarkable team of EICs, with @thegautamkamath, @NailaMurray, Nihar B. Shah and @lcharlin .
Transactions on Machine Learning Research@TmlrOrg

The end of 2025 marks the end of Hugo Larochelle's term as (Founding co-) Editor-in-chief of TMLR. It is an understatement to say that he was indispensable to making TMLR what it is today. Huge thanks to @hugo_larochelle for everything he's done!

English
17
26
269
42.4K
Marcin Sendera retweetledi
Yoshua Bengio
Yoshua Bengio@Yoshua_Bengio·
OpenReview is a pillar of progress in the AI research community. Now it needs our support. Along with several of my colleagues, I have pledged to help, and I encourage anyone who can to do the same. openreview.net/donate
English
23
47
354
61K
Marcin Sendera
Marcin Sendera@MarcinSendera·
@niedakhPL Nie no, akurat bez przesady - Michał zrobił super robotę, ale w trakcie pobytu w Princeton, więc tutaj raczej nasze wydatki nie mają na to wpływu, niestety :/
Polski
0
0
1
18
Piotr Miłoś
Piotr Miłoś@PiotrRMilos·
It’s a perfect day to announce that I’ve joined Mistral as an AI scientist, when our new flagship model has arrived :). Obviously, I did not contribute to this one, but I have high hopes about the next one :). I am very excited about this opportunity for a few reasons. On the personal level, it's going to exciting, a lot of learning and cool stuff. More broadly, it is the first time when the frontier lab starts its operations in Warsaw. I’m really proud about the speed of development that the Polish AI ecosystem has witnessed, hope to see much more great things happening :).
Mistral AI@MistralAI

Introducing the Mistral 3 family of models: Frontier intelligence at all sizes. Apache 2.0. Details in 🧵

English
50
58
1.2K
82.4K