Tomasz Sternal

31 posts

Tomasz Sternal

@TomaszSternal

Katılım Temmuz 2016

492 Takip Edilen19 Takipçiler

Tomasz Sternal@TomaszSternal·13 May

Another inspiring work from Bartosz! I vividly remember the talk he delivered 15 years ago in my highschool, introducing us to this beautiful part of science where mathematics overlaps with computing with concepts like Conway’s „Game of Life” and Langton’s Ant :)

Bartosz Naskręcki@nasqret

I am happy to share that I have finally finished the big project of properly formalizing all the claims in Andrzej Odrzywołek’s paper on the EML(x, y) = exp(y) - log(y) function in Lean 4. The project took me about two weeks of work, and I think it was a very refreshing experience. I will describe here, in an informal way, what I actually did, while deferring the technical details to the GitHub repo, which contains everything and is fully reproducible. 1. I decided that the work arXiv:2603.21852 should finally get a full Lean 4 formalization. This is an ideal task, since the scope and breadth of the work depend entirely on foundations laid out in Mathlib. 2. My plan was to use this project as a test of agentic engineering and design. I put a lot of effort into designing an intricate system based on Claude, Mathematica, Aristotle, and GPT Pro: Claude for orchestration, Mathematica for specific identity chasing, Aristotle for formalizing the many parts of the paper — including very crucial negative feedback — and finally GPT Pro as a critical feedback model that re-steered the Claude orchestrator whenever it got stuck. Finally, Codex was used to informalize some of the Lean statements. 3. I did the work in several batches. My supervision was based on insights and on gaining a deeper understanding of how the combinators work on specific domains. 4. The hard aspect of this work was that we wanted to have a full domain definition. This turned out to be impossible at some isolated points. 5. In hindsight, I should say that I honestly learned the hard-to-write details of the EML theorem. The many identities between elementary functions gave extra depth to some of the choices. The Lean code feels light and structured. 6. In this project, I felt more like a "mathematical engineer" than a typical "tinkering mathematician". This is a very different feeling, but it is a cool type of job. If you orchestrate AI properly, you can get a lot of satisfaction from such work. If you know how to tinker with Lean and mathematics, it becomes much more than mere vibe-coding. 7. IMHO, future work in mathematics will rely on models doing a lot of the work, with humans helping to verify it. This is an emerging new type of activity: deeply mathematical, but with a lot in common with proper engineering. 8. I am still super curious about the result itself. I was very happy to see the structural design with the combinators, which gave me the impression of good taste and structural thinking on the part of the models. It was not merely a dull formalization run. 9. Looking forward to more projects like this in the future. You can be creative in such ventures in entirely new ways. It is not subpar compared to proper mathematical tinkering. It is different, and it is fun. 10. This project also shows how important it is to know all the top-tier AI tools on the market. Switching between models and using them against each other turns out to be very productive. Links in the comments. Feel free to interact. Maybe there are other formalizations of this project, or similar scaffolds? Curious what you think!

English

631

Tomasz Sternal@TomaszSternal·1 May

[3] Memory-Efficient LLMs Training with Dynamic Sparsity: From Stability to Practical Scaling

English

Tomasz Sternal@TomaszSternal·1 May

[2] When Data Is Scarce: Scaling Sparse Language Models with Repeated Training

English

Tomasz Sternal@TomaszSternal·1 May

Thanks to my wonderful collaborators, our three papers got accepted to #ICML2026 🎉 Huge thank you to the team and see you in Seoul 🇰🇷!

English

141

Tomasz Sternal retweetledi

Tanishq Mathew Abraham, Ph.D.@iScienceLuvr·29 Nis

I think it's incredible that, while obviously not as good as modern LLMs, this LLM is able to do in-context learning and write basic Python code. Really highlights the intelligence of LLMs.

Tanishq Mathew Abraham, Ph.D. tweet media

David Duvenaud@DavidDuvenaud

Announcing Talkie: a new, open-weight historical LLM! We trained and finetuned a 13B model on a newly-curated dataset of only pre-1930 data. Try it below! with @AlecRad and @status_effects 🧵

English

115

11.5K

Tomasz Sternal retweetledi

AK@_akhaliq·13 Nis

Process Reward Agents for Steering Knowledge-Intensive Reasoning paper: huggingface.co/papers/2604.09…

English

16.9K

Tomasz Sternal@TomaszSternal·13 Nis

It was a fantastic learning experience. Thank you so much to the amazing team for this collaboration! @de_Jiung @KStyppa @Michael_D_Moor @thoefler @ETH_en

English

235

Tomasz Sternal@TomaszSternal·13 Nis

Our new preprint is out! We introduce Process Reward Agents (PRA) - a new framework in which the reasoning capabilities of a frozen reasoning model are decoupled from the Reward Agent, steering the reasoning process at test time.

Michael Moor@Michael_D_Moor

Preprint: arxiv.org/abs/2604.09482 Page: process-reward-agents.github.io Code: github.com/eth-medical-ai… Big thanks to a stellar team of co-authors @de_Jiung @TomaszSternal @KStyppa @thoefler! @ETH_en 1/

English

Tomasz Sternal retweetledi

Pietro Monticone@PietroMonticone·7 Nis

AI is increasingly changing how we do mathematics. Erdős Problem #650, open for over 60 years, was solved a few weeks ago through a collaboration between human mathematicians, an informal reasoning model (GPT 5.4 Pro @OpenAI) and a formal one (Aristotle @HarmonicMath). 🧵

English

380

137.4K

Tomasz Sternal@TomaszSternal·15 Eki

@kingofknowwhere Love this take! Matches my experience - never stops surprising me when people with 5 ML papers at ICML/NeurIPS can’t explain what precision and recall are.

English

1.5K

Ankit Jxa@kingofknowwhere·14 Eki

I have been doing ML interviews for a friend's company and over the past 2 months I have conducted over 50 intetviews of candidates across the globe. Stats GenAI MLE GPU Kernels you name it. 25 things you can do-

English

1.4K

117.5K

Tomasz Sternal@TomaszSternal·18 Eyl

@KShevchenkoReal @KShevchenkoReal, can you please quote where you took the 90% estimate from? I thought this particular route through the Polish-Belarusian border is closer to 3% of the total freight movement.

English

Kyrylo Shevchenko@KShevchenkoReal·17 Eyl

Poland refuses to reopen its border with Belarus, cutting China off from a €25B/year trade artery. Some 90% of China-EU rail freight moves through Poland, now frozen amid Zapad-2025 drills and a Russian drone incursion. Beijing asked Warsaw to restore the route (vital for platforms like Temu & Shein) but after 3 hours of talks, FM Sikorski said no. With sea routes slower and air transport up to 30% more expensive, Europe’s e-commerce supply chains risk serious disruption. #ChinaEconomy #EUEconomy Photo: ResearchGate

English

1.9K

5.1K

22.4K

2.6M

Tomasz Sternal@TomaszSternal·10 Eyl

@WillManidis How did you come across this quote?

English

Will Manidis@WillManidis·9 Eyl

the solutions to your problems are hidden throughout history, you just need to find them.

English

213

16.2K

Tomasz Sternal retweetledi

sid@immasiddx·6 Eyl

Don’t worry, our jobs are safe.

English

588

1.2K

32.7K

1.6M

Tomasz Sternal@TomaszSternal·5 Eyl

@iScienceLuvr @jxmnop played with SFT maybe has some experience with RL as well? x.com/jxmnop/status/…

Jack Morris@jxmnop

OpenAI hasn’t open-sourced a base model since GPT-2 in 2019. they recently released GPT-OSS, which is reasoning-only... or is it? turns out that underneath the surface, there is still a strong base model. so we extracted it. introducing gpt-oss-20b-base 🧵

English

Tanishq Mathew Abraham, Ph.D.@iScienceLuvr·5 Eyl

Has anyone successfully done RL post-training of GPT-oss with meaningful performance gains? What libraries even support it? I guess technically TRL/axolotl, maybe Unsloth... but there are no good examples of doing it...

English

162

22.5K

Tomasz Sternal@TomaszSternal·2 Eyl

@Rainmaker1973 The picture is extremely misleading. Plastic bottles are not made of polyurethane but from polyethylene terephthalate. Pestalotiopsis microspora has nothing to do with plastic bottles.

English

Massimo@Rainmaker1973·1 Eyl

Scientists found a fungus in the Amazon called Pestalotiopsis microspora that literally snacks on plastic. Pestalotiopsis microspora is not your typical fungus. It can survive entirely on polyurethane, one of the most common (and most persistent) types of plastic — and it does so even in oxygen-free environments, like buried landfills. [Khan, Sehroon, et al. “Biodegradation of Polyester Polyurethane by Aspergillus tubingensis.” Environmental Pollution, vol. 225, Mar. 2017, pp. 469–480]

English

433

2.1K

172.7K

Tomasz Sternal retweetledi

Noam Brown@polynoamial·28 Ağu

To all undergrads interested in learning about AI: be wary of taking “Intro to AI” as your first AI course. In many programs, the class you actually want first is “Intro to Machine Learning”. AI technology has exploded in the past 15 years thanks to deep neural networks. Yet at many schools, the “Intro to AI” curriculum has barely changed from what it was in 2010, and spends often only a few lectures on machine learning. Unfortunately, revamping “Intro to AI” is controversial at many universities, and inertia tends to dominate. Don’t decide which course to take based on the name alone. Instead, check the syllabus. Ideally, the course covers linear regression, gradient descent, backpropagation, and reinforcement learning. Each university is different and some “Intro to AI” courses will cover all these topics, but most don’t. If you plan to pursue AI as a career, I think it makes sense to take “Intro to AI” later for a broader perspective on intelligence. But if your goal is an intro to the technology powering modern chatbots, image recognition/generation tools, and coding assistants, the class you probably want first is “Intro to Machine Learning”.

English

113

1.2K

149.8K

Tomasz Sternal@TomaszSternal·28 Ağu

@Hesamation @ShunyuYao12 Sounds exciting, however, the main claim that "RL finally generalises" seems overly optimistic. This arxiv.org/pdf/2504.13837 sounds much more realistic: the RL won't get us beyond the reasoning patterns the base model has already learned from the training data.

English

ℏεsam@Hesamation·27 Ağu

this is one of the best blog posts of 2025 by the openai researcher @ShunyuYao12. "we're at AI's halftime," it's a playbook of what will matter the most in AI research and the startup ecosystem, and how to prepare best for it. for decades, AI research focused on algorithms and new models to beat the benchmarks. but something important has changed the game: "RL finally generalizes." the working “recipe”: massive language pretraining (priors) + scale + reasoning-as-action inside an RL loop. the result of this benchmark climbing. the game shifts: from solving problems to defining the right problems. evaluation becomes center stage. the core benchmark now is the "utility problem". benchmarks don't really translate well to real-world tasks. so this is the second-half playbook: invent evaluation setups tied to real utility; then apply the recipe to win under those new rules. in RL the key trio is environment, algorithms, and priors. we've spent so much time on the best algos but algos overfit to the environment they are born in. for the “second half,” evaluation = environment design: build setups closer to reality (human-in-the-loop, non-IID, sequential/with memory) to drive real utility, not just benchmark wins.

English

111

963

89K

Tomasz Sternal@TomaszSternal·26 Ağu

Very cool post on the limitations of Sliding Window Attention! Together with the earlier post on Attention Sinks, it gives a great overview of both the strengths and weaknesses of SWA.

Guangxuan Xiao@Guangxuan_Xiao

It's a common belief that L SWA layers (size W) yield an L×W receptive field. My post shows why the effective range is limited to O(W), regardless of depth. The reasons are information dilution and the exponential barrier from residual connections: guangxuanx.com/blog/stacking-…

English

Tomasz Sternal@TomaszSternal·26 Ağu

A statement is true if and only if denying it would force you to give up something else you already believe is true.

English

Keşfet

@de_Jiung @KStyppa @Michael_D_Moor @thoefler @ETH_en @OpenAI @HarmonicMath @kingofknowwhere