Ben Grimmer

438 posts

Ben Grimmer banner
Ben Grimmer

Ben Grimmer

@prof_grimmer

Assistant Professor @JohnsHopkinsAMS, Optimization, PhD @Cornell_ORIE Mostly here to share pretty maths/3D prints, sometimes sharing my research

Baltimore, MD 가입일 Ocak 2015
445 팔로잉3.1K 팔로워
Ben Grimmer 리트윗함
Damek
Damek@damekdavis·
Excited to launch the first stage of this competition with Terry and the @sairfoundation! If a student works through a math textbook and solves all exercises correctly, you’d expect that they’ve understood something. But what? They’ve likely learned a few major results, a few techniques, and they’ve gotten good at combining results and techniques in nontrivial ways. We then test their understanding with an exam. Often we even allow them to distill what they’ve learned into a single page ‘cheat sheet’ and bring it to the exam. Years later, a student might return to the material and feel every problem is now straightforward because theyunderstand the basic ‘algorithm’ for finding a solution. Today frontier LLMs perform very well on problems from undergraduate textbooks in, e.g., linear algebra, analysis, or topology. They can even answer nontrivial extensions of such problems. But what algorithm are they applying? And can we distill that algorithm into a short human readable `cheat sheet’ that we can use to teach other weaker LLMs? This is the motivation for the first stage of our competition: to boost the performance of weak LLMs via a short human readable cheat sheet. Rather than work with a textbook, we start with closed world experiment: the equational theories project. This project was a large-scale ‘polymath’ style project that Terry ran a few years ago. The goal of that project was to take a large list of ‘equational identities’ (think associativity or commutativity) and determine the entire ‘implication graph’ (e.g., does associativity imply commutativity). They succeeded and largely without any llm help at the time. In the present competition, we want to test the ability of LLMs to correctly classify whether one identity implies another. In our initial tests, frontier LLMs already do pretty well on this task. But we’re more interested in the performance of weak LLMs, which score closer to 50% accuracy. So the challenge is design a <= 10KB cheat sheet that improves the classification accuracy of weak open source LLMs as much as possible. The sair foundation is generously hosting this competition and providing model credits (fairly limited in stage 1). We are currently planning the second stage. Please read Terry’s post below for his thoughts and more information on the resources available to participants.
SAIR@SAIRfoundation

Our co-founder Terence Tao is announcing SAIR Foundation's inaugural competition: the Mathematics Distillation Challenge. Co-organized by @damekdavis, Terence Tao, and SAIR Foundation. competition.sair.foundation/competitions/m…

English
7
25
194
22.1K
Ben Grimmer 리트윗함
Jeremias Sulam
Jeremias Sulam@Jere_je_je·
Belated professional update 🔊 I’ve been promoted to Associate Professor with tenure at @JohnsHopkins @JHUBME. I’m incredibly thankful to my mentors over the past two decades, to my past and current collaborators (and friends!), and immensely proud of my students!
Jeremias Sulam tweet media
English
13
4
61
5.7K
Ben Grimmer
Ben Grimmer@prof_grimmer·
For the second morning this week, one of my phd students defended (successfully!) 🎉🎓🎉 Today, Alan Luner defended his excellent work "On Large-Scale Optimization: Optimal Methods and Computer-Assisted Algorithm Design" I promise this is the last such announcement for the year
English
1
1
27
1.2K
Ben Grimmer
Ben Grimmer@prof_grimmer·
This morning my PhD student Thabo Samakhoana defended his thesis (successfully!) 🎉🎓🎉 Was a great five years working with him towards his thesis "On Optimal Smoothings and their Applications to Optimization and Deep Learning"
English
1
1
34
1.8K
Ben Grimmer 리트윗함
Johns Hopkins University
Johns Hopkins University@JohnsHopkins·
Three @HopkinsEngineer faculty members have been named 2026 Sloan Research Fellows by the @SloanFoundation. Mateo Díaz, Yayuan Liu, and Soledad Villar are among 126 early-career scientists selected for the two-year, $75,000 fellowship, which recognizes strong potential for leadership in their fields. bit.ly/4aWws1b
English
1
10
45
7.6K
Ben Grimmer
Ben Grimmer@prof_grimmer·
This paper generated me new office decorations as well. Below is the strongly convex set we designed that is provably hard for all Frank-Wolfe methods (at least for two steps). The paper builds this "evil" shape in d dimensions able to counteract any d/2 step method
Ben Grimmer tweet media
Ben Grimmer@prof_grimmer

For anyone interested in our lower bound result for Frank-Wolfe methods in Nemirovski and Yudin "zero-chain" lower bounding style, a link: arxiv.org/abs/2602.22608

English
0
2
20
1.9K
Ben Grimmer
Ben Grimmer@prof_grimmer·
For anyone interested in our lower bound result for Frank-Wolfe methods in Nemirovski and Yudin "zero-chain" lower bounding style, a link: arxiv.org/abs/2602.22608
English
0
5
23
3.6K
Ben Grimmer
Ben Grimmer@prof_grimmer·
Smoothness and strong convexity are dual to each other but here we have smoothness of the objective and strong convexity of the constraint set. Linear minimization examines the support function of the set, a dual object I don't see the connection, but there ought to be symmetry!
English
2
1
15
1.3K
Ben Grimmer
Ben Grimmer@prof_grimmer·
There is (at least one) strange thing in accelerated convex optimization theory: Since the 80s, in unconstrained minimization by gradient methods, smoothness is known to allow a fast O(1/T^2) convergence rate by Nesterov. Nemirovski and Yudin give matching lower bounds.
English
2
7
85
11.8K
Ben Grimmer 리트윗함
Daniel Litt
Daniel Litt@littmath·
Some thoughts on AI and mathematics, inspired by "First Proof."
Daniel Litt tweet media
English
48
199
1.1K
331.5K
Ben Grimmer 리트윗함
Courtney Paquette
Courtney Paquette@cypaquette·
Please check out my amazing PhD student’s work! Theory guided practice here! Scaled up to 2.6B params a decoder-only transformer with a new optimizer that improves at scale. 40% compute saved over tuned AdamW by using the structure of language in the algorithm!
Damien Ferbach@damien_ferbach

1/10 We built ADANA, an optimizer that gets better as you scale. It extends AdamW with log-time schedules for momentum and weight decay — same hyperparameter count, no extra engineering. Scaled from 45M to 2.6B, it saves ~40% compute vs tuned AdamW, and the gap keeps growing.🧵

English
1
7
51
5.8K
Ben Grimmer
Ben Grimmer@prof_grimmer·
The pattern continues. We can fractally build N=31 self-dual pattern constructed of two N=15 patterns or 4 N=7 patterns, carefully sewn together (pun intended). I'll stop sewing after I finish N=63 :) Stay tuned for an upcoming paper where this has algorithmic/engineering value
Ben Grimmer tweet media
English
0
0
1
93
Ben Grimmer
Ben Grimmer@prof_grimmer·
The partition {1,2}{3} above is "self-dual". To get this note, I numbered the 1', 2', 3' nodes counterclockwise. We can use this self-dual partition of size N=3 to build a self-dual partition of N=7 recursively. Physically self-dual == "dream-catcher" mirrors blue and green 3/4
Ben Grimmer tweet media
English
1
0
0
161
Ben Grimmer
Ben Grimmer@prof_grimmer·
Lately, non-crossing partitions have shown up out of nowhere in my research, which have a lovely duality structure. This inspired some good art and fractals :) Wanted to share the fun here (just sharing the pretty art for now, the research story will come in due time) 1/4
Ben Grimmer tweet media
English
1
1
5
344
Ben Grimmer 리트윗함
Dmitriy Drusvyatskiy
Dmitriy Drusvyatskiy@ddrusvyat·
Long overdue, but finally decided to become active on Twitter. Interested in conversations around machine learning/AI and related areas.
English
2
3
36
8K
Ben Grimmer
Ben Grimmer@prof_grimmer·
Happy to share that my work "On optimal universal first-order methods for minimizing heterogeneous sums" just received the Optimization Letters Best Paper Prize link.springer.com/journal/11590/… This work is part of a larger trend against the brittleness of classic smooth/nonsmooth theory
English
2
3
25
1.3K