Mark L. Stone

1.5K posts

Mark L. Stone banner
Mark L. Stone

Mark L. Stone

@themarklstone

Interests: Nonlinear multivariate dynamic stochastic global optimization (everything else is a special case, an approximation, or a compromise)

Robust Nonlinear Multivariate Dynamic Stochastic Global Optimum Katılım Mart 2010
191 Takip Edilen220 Takipçiler
Sabitlenmiş Tweet
Mark L. Stone
Mark L. Stone@themarklstone·
What I do as an Operations Research: professional: I figure out the best thing to do when you don't know what the hell's going on, and when you think you do but you really don't.
English
0
0
5
0
Mark L. Stone
Mark L. Stone@themarklstone·
@TheSIAMNews Congratulations to all new Fellows. But as usual, almost all the Fellows are academics. Either remove Industrial from SIAM's name, or make SIAM more industrial.
English
0
0
0
17
Damek
Damek@damekdavis·
The way the theorem is later used in the cited paper adds an additional assumption that makes it true. Unfortunately, that assumption doesn't hold in my problem. I never would have caught this without codex. Now I need to develop a proof that doesn't rely on the cited result.
English
4
5
155
4.4K
Damek
Damek@damekdavis·
'Proved' something new and had codex formalize it lean. It generated 41000 lines. I initially left all cited theorems that I relied on as axioms, but then I started formalizing them. Codex complained about one particular result. Turns out that result was false as stated.
English
12
16
390
39.5K
Mark L. Stone
Mark L. Stone@themarklstone·
@gabrielchua @OpenAIDevs Easy choice for me. ChatGPT 5.4 pro extreme reasoning and Codex 5.4 pro xhigh for everything. I don’t use these tools for pedestrian purposes, so I want the best quality they can deliver, period.
English
0
0
0
123
Gabriel Chua
Gabriel Chua@gabrielchua·
Now with `gpt-5.4-mini` and `nano` out, I put together a simple cheat sheet of the latest OpenAI models by use case. Noticed at a few recent hackathons & meetups: some folks still default to `gpt-4o-mini` for LLMs and `whisper-1` for transcription. Newer options tend to fit better now with much better performance. If you’re running into issues switching, lmk!
Gabriel Chua tweet media
English
28
20
285
31.2K
Mark L. Stone
Mark L. Stone@themarklstone·
@mmaaz_98 As of November, ChatGPT was up on the latest developments in second order optimality conditions for Nonlinear Semidefinite Programming. Claude was a disaster, and was confidently wrong on rather basic stuff in Semidefinite Programming.
English
0
0
1
108
Maaz
Maaz@mmaaz_98·
It’s quite funny how much the models hand-wave on math topics they don’t know much about In this case a problem related to semidefinite programming
English
4
0
18
1.9K
Mark L. Stone
Mark L. Stone@themarklstone·
In LLM/Reasoning era: Generation is Easy, Verification is Hard. Faux verification is easy. True verification is hard. Claims that verified math (theorem provers) being used for bullet-proof math in finance & engineering. End to end rigor’s what matters in applied math. Not ultra-rigor in some portion of the modeling and calculations, combined with handwaving for the rest. Rigorously verified math derivations in a model whose assumptions don't match reality, or which ignore the effects of roundoff error in finite-precision floating-point calculations and non-exact termination of iterative algorithms, do not provide end to end rigor, and do not make the applied math bullet-proof. Applied math is not the same as pure math. In finance, end to end rigor (to the extent it’s even obtainable) must consider the models used, to include the multivariate stochastic processes driving them (with all their dependencies across "space" and time, and their estimation and calibration), the vagaries of human actions, regime changes, force majeure, rules changes, etc. Similarly in engineering (financial or otherwise), where dependent failure modes matter. The dependency structure of failures and uncertainty can be crucial. And of course, the form of probability distributions. Normal (Gaussian) distributions (and their moral equivalent in some parts of finance, the Lognormal distribution) are almost always a terrible model of tail (extreme) behavior, which is exactly what matters for risk calculations. Faux verification is easy, and getting easier. True verification is hard.
English
0
0
1
51
Mark L. Stone
Mark L. Stone@themarklstone·
In LLM/Reasoning era: Generation is Easy, Verification is Hard. Faux verification is easy. True verification is hard. Claims that verified math (theorem provers) being used for bullet-proof math in finance & engineering. End to end (balanced) rigor’s what matters in applied math. Not ultra-rigor in some portion of the modeling and calculations, combined with handwaving for the rest. Rigorously verified math derivations in a model whose assumptions don't match reality, or which ignore the effects of roundoff error in finite-precision floating-point calculations and non-exact termination of iterative algorithms, do not provide end to end rigor, and do not make the applied math bullet-proof. Applied math is not the same as pure math. In finance, end to end rigor (to the extent it’s even obtainable) must consider the models used, to include the multivariate stochastic processes driving them (with all their dependencies across "space" and time, and their estimation and calibration), the vagaries of human actions, regime changes, force majeure, rules changes, etc. Similarly in engineering (financial or otherwise), where dependent failure modes matter. The dependency structure of failures and uncertainty can be crucial. And of course, the form of probability distributions. Normal (Gaussian) distributions (and their moral equivalent in some parts of finance, the Lognormal distribution) are almost always a terrible model of tail (extreme) behavior, which is exactly what matters for risk calculations. Faux verification is easy, and getting easier. True verification is hard.
English
0
0
0
45
Mark L. Stone
Mark L. Stone@themarklstone·
New name for an old idea: simulation optimization, which goes back more than 4 decades. (I was doing it before it had a name). All sorts of constraints can be imposed on the optimization, which focuses effort to where any “optimal” or at least feasible solution might lie. Proper control of random numbers is key to effective optimization (RL with simulation). And of course good modeling is vital. @joon_s_pk
English
0
0
0
436
Percy Liang
Percy Liang@percyliang·
I think it’s pretty clear that simulation is the next frontier for AI. The most impressive feats of AI to date are when we have a clear environment + reward, whether it be beating Le Sedol at Go, winning an IMO gold medal, or writing entire apps from scratch. In these cases, the RL algorithm can try different actions, and observe the well-defined consequences in the safety of a docker container. But what about messy real-world situations involving people? The rewards are unclear, the stakes are high, and you can’t experiment in the real world. But these situations are precisely where the next big opportunity in AI is. To crack this, we need to *simulate* society (“put society into a docker container”). Concretely, this means building a model that can predict what will happen in any given situation (real or hypothetical). If we can do this, we are only limited by our imagination: predict the future, optimize for better outcomes, answer hypothetical (“what if”) questions. Ultimately, this goes beyond making better decisions, but it’s about giving us a better understanding of ourselves and the world. Simulation is the whole enchilada. And this is exactly the research that @simile_ai is working on. Read more here: simile.ai/blog/simulatio…
English
44
111
1.1K
111.3K
Mark L. Stone
Mark L. Stone@themarklstone·
@NVIDIAAIDev Can we please see cuOpt in comparison to the leading non-open source solvers, such as Gurobi and Xpress?
English
0
0
0
56
NVIDIA AI Developer
NVIDIA AI Developer@NVIDIAAIDev·
NVIDIA cuOpt is officially the #1 OSS solver on the Hans Mittelmann MIPfeas leaderboard. This is a massive win for GPU-accelerated Mixed Integer Programming, proving cuOpt is ready to power the next generation of complex, memory-intensive workloads. From complex fleet routing to supply chain planning, the possibilities for low-latency agentic systems just expanded. Hans Mittelmann Benchmark: plato.asu.edu/ftp/mipfeas.ht…
NVIDIA AI Developer tweet media
English
7
20
127
6.8K
Mark L. Stone
Mark L. Stone@themarklstone·
@JFPuget The only French you need to know when playing tennis is "merde" (sounds like "mer de"). At least that's all the French Stanford graduate students who had attended École Centrale des Arts et Manufactures seemed to say when we played tennis. It works for singles or doubles.
English
0
0
1
165
JFPuget 🇺🇦🇨🇦🇬🇱
I concur. If you ever visit France, and don't speak the language, then just learn to say "bonjour" with a light smile (similar to hi, or hello in the US) when you start talking to someone. It shows respect and buys you a lot, even if that's the only French word you know. I had a colleague from Netherlands who came work in Paris. He picked up French fairly quickly (Dutch people seem to learn any language in few weeks, they are amazing at it). What impressed him the most was the difference he saw if he was saying "bonjour" first or not. For instance, at a bakery, a "bonjour" buys you a much better service than without.
VEO@vrexec

I’ve spent a lot of time in Western Europe and I’m always so impressed by the French people. They’re really wonderful. Beautiful, kind, powerful, enterprising. The stereotype that the French are cold to outsiders and tourists and “hate it” when you don’t speak French to them is not real. It’s a trope repeated by people who don’t travel and/or don’t present themselves well. If you’re disheveled, disrespectful, or conceited… then yes… nobody will like you. The French don’t have a monopoly on this. You must look nice, presentable, and speak respectfully in any language. Plus, for extra points, it’s not difficult to say “bonne soirée” or “bonjour” or “merci.” But the French are practical, as are most Europeans. They don’t really care what language you are speaking. They’ll likely never see you again anyway.

English
3
0
33
4.3K
Mark L. Stone
Mark L. Stone@themarklstone·
Not the tendency in Semidefinite Programming (SDP), which optimizes an objective function subject to the constraint of a matrix being positive semidefinite (psd). The matrix in the optimal solution of a well-behaved model is often singular, which is right on the edge of being psd. It’s often the case that at optimality, the matrix has multiple zero eigenvalues, which is singular on steroids, and is the optimization model’s way of getting its money’s worth out of the matrix being singular.
English
0
0
0
20
Mark L. Stone
Mark L. Stone@themarklstone·
Where, if at all, is the effect of round-off error in finite-precision floating point arithmetic addressed? Absent that, how can any code involving floating point arithmetic be verified in any practical sense? Things only get more complicated when iterative algorithms which don’t converge exactly, such as in the optimization model types listed in the white paper are considered (which among other things, should probably make the distinction between convex and non-convex optimization problems, and how convexity is verified, as in for example, CVX, YALMIP, and CVXPY). Also, outward-rounded interval arithmetic should be addressed. That is well-suited to formal correctness verification. Lest you think this is “academic”, it is extremely common for users to provide poorly-scaled input data to optimization solvers, and for the optimization solvers to either fail due to numerical error, or for them to produce false results, sometimes with no warning to the user. This needs to be addressed for any verification to be meaningful. “Numerical Analysis” is (or at least was when I went to school), part of Computer Science , and a rather important part if finite-precision computation is performed or non-exactly-terminating iterative algorithms are used. @CarinaLHong @KenOno691
English
2
0
2
452
Lean
Lean@leanprover·
The CSLib steering committee recently announced the official launch of CSLib — an open-source effort to formalize computer science in Lean, inspired by the impact of Mathlib in mathematics. CS researchers, practitioners, and enthusiasts are invited to get involved to support formalizing essential computer science concepts, and building infrastructure for reasoning about real-world code with Lean. Learn more at: 🌐 cslib.io 📄 White paper: arxiv.org/abs/2602.04846 🤝 Contribute: github.com/leanprover/csl… #LeanLang #LeanProver #CSLib #OpenSource #FormalVerification
Lean tweet media
English
8
84
429
29.4K
Mark L. Stone
Mark L. Stone@themarklstone·
Isn’t @axiommathai handwaving to say that Lean proves computer code which does finite-precision floating point arithmetic is correct? Lean’s verification is predicated on the incorrect assumption that arithmetic calculations are exact (infinite precision). Numerically unstable calculations which would be correct in exact arithmetic can lead to catastrophic failures, such as bridge collapse or a missile defense interceptor design failure www-users.math.umn.edu/~arnold/disast… If outward-rounded floating point interval arithmetic were used, perhaps Lean could be used to verify correctness of the code. Barring that, @axiommathai ‘s AI APPLIED Mathematician would have to be expert in numerical analysis, verifying correctness or at least goodness of code, taking into account round-off errors and tolerances in termination criteria for floating point iterative calculations (such as equation solving and optimization). It would have to know the laws of floating point arithmetic, which are different than the laws of exact arithmetic. @CarinaLHong
English
0
0
0
35
Ken Ono
Ken Ono@KenOno691·
Fair point, Mark! 🎯 That’s why we use tools like Lean to stop handwaving and fix assumptions at the source. Check out my former student Jerry Lu at the Winter Olympics today. He’s using AI to turn Ilia Malinin’s quads into digital twins for @NBCSports. Proof that when the logic is sound, it doesn't just solve conjectures it helps win Olympic medals. 🏔️🏅 [Link: news.virginia.edu/content/callin…]
English
1
0
6
1.1K
Mark L. Stone
Mark L. Stone@themarklstone·
As an Applied Mathematician/Operations Researcher, I see beauty when math and theory is applied to formulate and solve real-world problems, to make something in the world better. Rigor in applications results from the quality of end to end modeling and analysis, not from the combination of 1) a rigorous proof about some piece in the middle and 2) invalid assumptions about the rest. Good Applied Math/Operations Research is mainly about questioning assumptions and framing problems to analyze, not about proving theorems predicated on invalid assumptions. But beauty is in the eyes of the beholder, so I also appreciate the pure mathematician or aficionado enjoying a beautiful theorem and proof. I used to be one of those myself.
Ken Ono@KenOno691

3/ Reason 2: Deepening Understanding & Beauty 🧠 The goal isn't just code. For purists, the beauty of a proof is everything. Elegance is recovered in the synthesis of a paper from Lean where you find the "soul" of the argument without the handwaving.

English
1
0
6
1.6K
Mark L. Stone
Mark L. Stone@themarklstone·
@deepfates If @AnthropicAI wants to be less wrong, its bots should do inference consistent with the Law of Total Probability. All frontier AI bots would fail the midterm in an intro probability class. They can recite the Law of Total Probability, but violate it in their own calculations.
English
0
0
0
579
🎭
🎭@deepfates·
Getting word that Anthropic has an internal version of LessWrong that's even more less wrong than the public one
English
21
35
1.4K
48.4K
Mark L. Stone
Mark L. Stone@themarklstone·
Is @RicursiveAI working on (tools for) co-development (mutually optimized design) of chips and numerical algorithms to run on them? Current numerical optimization algorithms are designed to shoehorn on to CPUs and GPUs not well-suited for them. Co-developing chips and algorithms which are optimized for each other could unlock greater performance and efficiency potential, and recursive improvement opportunity, than optimizing chip design on its own. @annadgoldie @Azaliamirh
English
0
0
0
101
Mark L. Stone
Mark L. Stone@themarklstone·
LEAN assumes exact (infinite precision) arithmetic, not finite precision floating point calculations performed on computers. Numerical stability of the computer code matters in the real world. Code which would be correct in exact arithmetic can be unsound code which does not perform the intended math on a computer. Potentially, LEAN could be made to “prove” correctness of certain code which uses outward-rounded floating point interval arithmetic. Until interval arithmetic is implemented in hardware, code using it is unlikely to be used very widely.
English
0
0
1
36
Arjun Narayan
Arjun Narayan@narayanarjun·
I'm optimistic that formal verification is the solution to our current situation where LLMs are writing our code and nobody's reading it. Formal methods can give us a world where we write succinct specs and agent-generated code is proven to comply. But we have a long way to go. There are several open challenges that stand between our situation today and that future, but none appear insurmountable. I’ve written a brief overview of what I consider to be the big open problems, and some of the directions that researchers are taking today to address them: from verifying mathematics to building standard libraries of verified code that can be built upon. Here are a few highlights: 1) A Brief History of Formal Verification Verification is fundamentally about understanding what your program can or can’t do, and verifying it with a proof. In order to verify, you must first have a specification that you are verifying your program against. Most of you leverage some formal verification day to day: namely, some of the compiler errors in statically-typed languages like C++ and Java are verification errors. Static type checking is the version of formal verification programmers are most familiar with. Type systems (and related formal verification tools) have gotten quite impressive, and they are becoming a lot more relevant in constraining the behavior of AI coding models. 2) Rust Type checking represents a middle ground for verification. The hard part is choosing the right balance: reject too many good programs and it becomes hard to program in this language as the programmer has to “guess what the type checker will permit”. Recently the language that has brought the most interesting advances from type systems to the real world is Rust. Its ownership type language and associated type checker is known as the “borrow checker”. The borrow checker is conservative, and “fighting with the borrow checker” is part and parcel of everyone’s Rust experience. This gives us the following lesson: we can prove more interesting things, but at a larger burden to the developer. Finding elegant middle points is hard, and Rust represents a real design breakthrough in navigating that tradeoff. 3) Mechanically verified math Recently, groups of mathematical researchers have recently been writing mathematical proofs in a specialized programming language called a proof assistant. This language, LEAN, comes with a powerful type checker capable of certifying complex mathematical proofs. LEAN is exciting, but working in LEAN can be frustrating - because of the nontermination properties of the type checker’s search, such languages rely heavily on programmer annotation. And this is why more complex type systems have stayed relatively academic: the Rust borrow checker sits at a genuinely elegant point in the design space: complex enough to reason about a complex property like memory references, yet simple enough to not need too much extra annotation. But this is a critically important point: Mathematical proofs and type checking aren’t just analogous: they are the literal same task. They are different only in the degree of complexity along two axes: the complexity of the underlying objects, and the complexity of the properties we are proving. 4) There is still a long way to go for proof assistants While the world I describe is exciting, bluntly, we’re not anywhere close to that world yet. Proofs break easily when programs are modified, the standard library of proofs is too small, and specifications seldom capture everything about the program’s behavior. Overall there’s a long way to go before these techniques reach a mainstream programming language with broad adoption. But, AI is a huge accelerant to proof assistants. Much of the energy towards AI-assisted mathematics is coming from AI researchers who see it as a very promising domain for building better reasoning models. Verified math is a domain rich in endless lemmas, statements, and proofs, all of which can be used as “ground truth” - which means we can use them as strong reward signals in our post-training workflows. There are several startups being built by seasoned foundation model researchers - Harmonic, Math Inc - that are based on this premise. I’m no expert here, but it sure seems to me that formally verified code would lead to a clear domain of tasks that have strong verifiable rewards ripe for use in reinforcement learning to build better agents period. I’m excited about the efforts to use verified mathematics in reinforcement learning. But I’d love to see even more experiments in bringing verification to the agentic coding world. This is an exciting time in programming languages and formal methods research. There’s only one way out of the increasingly unwieldy mountain of LLM generated code: We must prove. We will prove.
English
17
19
134
19.8K