Tomasz Sternal

31 posts

Tomasz Sternal

Tomasz Sternal

@TomaszSternal

Katılım Temmuz 2016
492 Takip Edilen19 Takipçiler
Tomasz Sternal
Tomasz Sternal@TomaszSternal·
Another inspiring work from Bartosz! I vividly remember the talk he delivered 15 years ago in my highschool, introducing us to this beautiful part of science where mathematics overlaps with computing with concepts like Conway’s „Game of Life” and Langton’s Ant :)
Bartosz Naskręcki@nasqret

I am happy to share that I have finally finished the big project of properly formalizing all the claims in Andrzej Odrzywołek’s paper on the EML(x, y) = exp(y) - log(y) function in Lean 4. The project took me about two weeks of work, and I think it was a very refreshing experience. I will describe here, in an informal way, what I actually did, while deferring the technical details to the GitHub repo, which contains everything and is fully reproducible. 1. I decided that the work arXiv:2603.21852 should finally get a full Lean 4 formalization. This is an ideal task, since the scope and breadth of the work depend entirely on foundations laid out in Mathlib. 2. My plan was to use this project as a test of agentic engineering and design. I put a lot of effort into designing an intricate system based on Claude, Mathematica, Aristotle, and GPT Pro: Claude for orchestration, Mathematica for specific identity chasing, Aristotle for formalizing the many parts of the paper — including very crucial negative feedback — and finally GPT Pro as a critical feedback model that re-steered the Claude orchestrator whenever it got stuck. Finally, Codex was used to informalize some of the Lean statements. 3. I did the work in several batches. My supervision was based on insights and on gaining a deeper understanding of how the combinators work on specific domains. 4. The hard aspect of this work was that we wanted to have a full domain definition. This turned out to be impossible at some isolated points. 5. In hindsight, I should say that I honestly learned the hard-to-write details of the EML theorem. The many identities between elementary functions gave extra depth to some of the choices. The Lean code feels light and structured. 6. In this project, I felt more like a "mathematical engineer" than a typical "tinkering mathematician". This is a very different feeling, but it is a cool type of job. If you orchestrate AI properly, you can get a lot of satisfaction from such work. If you know how to tinker with Lean and mathematics, it becomes much more than mere vibe-coding. 7. IMHO, future work in mathematics will rely on models doing a lot of the work, with humans helping to verify it. This is an emerging new type of activity: deeply mathematical, but with a lot in common with proper engineering. 8. I am still super curious about the result itself. I was very happy to see the structural design with the combinators, which gave me the impression of good taste and structural thinking on the part of the models. It was not merely a dull formalization run. 9. Looking forward to more projects like this in the future. You can be creative in such ventures in entirely new ways. It is not subpar compared to proper mathematical tinkering. It is different, and it is fun. 10. This project also shows how important it is to know all the top-tier AI tools on the market. Switching between models and using them against each other turns out to be very productive. Links in the comments. Feel free to interact. Maybe there are other formalizations of this project, or similar scaffolds? Curious what you think!

English
1
0
7
631
Tomasz Sternal
Tomasz Sternal@TomaszSternal·
[3] Memory-Efficient LLMs Training with Dynamic Sparsity: From Stability to Practical Scaling
Tomasz Sternal tweet media
English
0
0
1
38
Tomasz Sternal
Tomasz Sternal@TomaszSternal·
[2] When Data Is Scarce: Scaling Sparse Language Models with Repeated Training
Tomasz Sternal tweet media
English
1
0
1
46
Tomasz Sternal
Tomasz Sternal@TomaszSternal·
Thanks to my wonderful collaborators, our three papers got accepted to #ICML2026 🎉 Huge thank you to the team and see you in Seoul 🇰🇷!
English
2
0
7
141
Tomasz Sternal retweetledi
Tanishq Mathew Abraham, Ph.D.
Tanishq Mathew Abraham, Ph.D.@iScienceLuvr·
I think it's incredible that, while obviously not as good as modern LLMs, this LLM is able to do in-context learning and write basic Python code. Really highlights the intelligence of LLMs.
Tanishq Mathew Abraham, Ph.D. tweet media
David Duvenaud@DavidDuvenaud

Announcing Talkie: a new, open-weight historical LLM! We trained and finetuned a 13B model on a newly-curated dataset of only pre-1930 data. Try it below! with @AlecRad and @status_effects 🧵

English
5
10
115
11.5K
Tomasz Sternal
Tomasz Sternal@TomaszSternal·
Our new preprint is out! We introduce Process Reward Agents (PRA) - a new framework in which the reasoning capabilities of a frozen reasoning model are decoupled from the Reward Agent, steering the reasoning process at test time.
Michael Moor@Michael_D_Moor

Preprint: arxiv.org/abs/2604.09482 Page: process-reward-agents.github.io Code: github.com/eth-medical-ai… Big thanks to a stellar team of co-authors @de_Jiung @TomaszSternal @KStyppa @thoefler! @ETH_en 1/

English
1
0
2
76
Tomasz Sternal retweetledi
Pietro Monticone
Pietro Monticone@PietroMonticone·
AI is increasingly changing how we do mathematics. Erdős Problem #650, open for over 60 years, was solved a few weeks ago through a collaboration between human mathematicians, an informal reasoning model (GPT 5.4 Pro @OpenAI) and a formal one (Aristotle @HarmonicMath). 🧵
Pietro Monticone tweet media
English
7
86
380
137.4K
Tomasz Sternal
Tomasz Sternal@TomaszSternal·
@kingofknowwhere Love this take! Matches my experience - never stops surprising me when people with 5 ML papers at ICML/NeurIPS can’t explain what precision and recall are.
English
1
0
4
1.5K
Ankit Jxa
Ankit Jxa@kingofknowwhere·
I have been doing ML interviews for a friend's company and over the past 2 months I have conducted over 50 intetviews of candidates across the globe. Stats GenAI MLE GPU Kernels you name it. 25 things you can do-
English
21
95
1.4K
117.5K
Tomasz Sternal
Tomasz Sternal@TomaszSternal·
@KShevchenkoReal @KShevchenkoReal, can you please quote where you took the 90% estimate from? I thought this particular route through the Polish-Belarusian border is closer to 3% of the total freight movement.
English
0
0
0
92
Kyrylo Shevchenko
Kyrylo Shevchenko@KShevchenkoReal·
Poland refuses to reopen its border with Belarus, cutting China off from a €25B/year trade artery. Some 90% of China-EU rail freight moves through Poland, now frozen amid Zapad-2025 drills and a Russian drone incursion. Beijing asked Warsaw to restore the route (vital for platforms like Temu & Shein) but after 3 hours of talks, FM Sikorski said no. With sea routes slower and air transport up to 30% more expensive, Europe’s e-commerce supply chains risk serious disruption. #ChinaEconomy #EUEconomy Photo: ResearchGate
Kyrylo Shevchenko tweet media
English
1.9K
5.1K
22.4K
2.6M
Will Manidis
Will Manidis@WillManidis·
the solutions to your problems are hidden throughout history, you just need to find them.
Will Manidis tweet media
English
3
8
213
16.2K
Tomasz Sternal retweetledi
sid
sid@immasiddx·
Don’t worry, our jobs are safe.
sid tweet media
English
588
1.2K
32.7K
1.6M
Tanishq Mathew Abraham, Ph.D.
Tanishq Mathew Abraham, Ph.D.@iScienceLuvr·
Has anyone successfully done RL post-training of GPT-oss with meaningful performance gains? What libraries even support it? I guess technically TRL/axolotl, maybe Unsloth... but there are no good examples of doing it...
English
16
7
162
22.5K
Tomasz Sternal
Tomasz Sternal@TomaszSternal·
@Rainmaker1973 The picture is extremely misleading. Plastic bottles are not made of polyurethane but from polyethylene terephthalate. Pestalotiopsis microspora has nothing to do with plastic bottles.
English
0
0
0
3
Massimo
Massimo@Rainmaker1973·
Scientists found a fungus in the Amazon called Pestalotiopsis microspora that literally snacks on plastic. Pestalotiopsis microspora is not your typical fungus. It can survive entirely on polyurethane, one of the most common (and most persistent) types of plastic — and it does so even in oxygen-free environments, like buried landfills. [Khan, Sehroon, et al. “Biodegradation of Polyester Polyurethane by Aspergillus tubingensis.” Environmental Pollution, vol. 225, Mar. 2017, pp. 469–480]
Massimo tweet media
English
76
433
2.1K
172.7K
Tomasz Sternal retweetledi
Noam Brown
Noam Brown@polynoamial·
To all undergrads interested in learning about AI: be wary of taking “Intro to AI” as your first AI course. In many programs, the class you actually want first is “Intro to Machine Learning”. AI technology has exploded in the past 15 years thanks to deep neural networks. Yet at many schools, the “Intro to AI” curriculum has barely changed from what it was in 2010, and spends often only a few lectures on machine learning. Unfortunately, revamping “Intro to AI” is controversial at many universities, and inertia tends to dominate. Don’t decide which course to take based on the name alone. Instead, check the syllabus. Ideally, the course covers linear regression, gradient descent, backpropagation, and reinforcement learning. Each university is different and some “Intro to AI” courses will cover all these topics, but most don’t. If you plan to pursue AI as a career, I think it makes sense to take “Intro to AI” later for a broader perspective on intelligence. But if your goal is an intro to the technology powering modern chatbots, image recognition/generation tools, and coding assistants, the class you probably want first is “Intro to Machine Learning”.
Noam Brown tweet mediaNoam Brown tweet media
English
64
113
1.2K
149.8K
Tomasz Sternal
Tomasz Sternal@TomaszSternal·
@Hesamation @ShunyuYao12 Sounds exciting, however, the main claim that "RL finally generalises" seems overly optimistic. This arxiv.org/pdf/2504.13837 sounds much more realistic: the RL won't get us beyond the reasoning patterns the base model has already learned from the training data.
English
0
0
0
13
ℏεsam
ℏεsam@Hesamation·
this is one of the best blog posts of 2025 by the openai researcher @ShunyuYao12. "we're at AI's halftime," it's a playbook of what will matter the most in AI research and the startup ecosystem, and how to prepare best for it. for decades, AI research focused on algorithms and new models to beat the benchmarks. but something important has changed the game: "RL finally generalizes." the working “recipe”: massive language pretraining (priors) + scale + reasoning-as-action inside an RL loop. the result of this benchmark climbing. the game shifts: from solving problems to defining the right problems. evaluation becomes center stage. the core benchmark now is the "utility problem". benchmarks don't really translate well to real-world tasks. so this is the second-half playbook: invent evaluation setups tied to real utility; then apply the recipe to win under those new rules. in RL the key trio is environment, algorithms, and priors. we've spent so much time on the best algos but algos overfit to the environment they are born in. for the “second half,” evaluation = environment design: build setups closer to reality (human-in-the-loop, non-IID, sequential/with memory) to drive real utility, not just benchmark wins.
ℏεsam tweet media
English
18
111
963
89K
Tomasz Sternal
Tomasz Sternal@TomaszSternal·
Very cool post on the limitations of Sliding Window Attention! Together with the earlier post on Attention Sinks, it gives a great overview of both the strengths and weaknesses of SWA.
Guangxuan Xiao@Guangxuan_Xiao

It's a common belief that L SWA layers (size W) yield an L×W receptive field. My post shows why the effective range is limited to O(W), regardless of depth. The reasons are information dilution and the exponential barrier from residual connections: guangxuanx.com/blog/stacking-…

English
0
0
0
81
Tomasz Sternal
Tomasz Sternal@TomaszSternal·
A statement is true if and only if denying it would force you to give up something else you already believe is true.
English
0
0
0
36