Justin Waugh

439 posts

Justin Waugh

@JustinWaugh

Founder @ Approximate Labs

Katılım Temmuz 2009

605 Takip Edilen268 Takipçiler

Justin Waugh@JustinWaugh·2d

@LLMJunky @mweinbach Dell via call for quote (the video above is of the Dell version, hence the "them") I saw the cdw thing as well, but not clear that that is real / actually available. I haven't heard back directly from any other supplier (there are 6 that were on the NVIDIA website)

English

am.will@LLMJunky·2d

@JustinWaugh @mweinbach who is them? they are ~$97K generally cdw.com/product/msi-nv…

English

Max Weinbach@mweinbach·3d

Does anyone have pricing on this I? A couple months ago I was betting like $50-75K

Matthew Berman@MatthewBerman

.@nvidia hand delivered a pre-production unit of the @Dell Pro Max with GB300 to my house. 100lbs beast with 750GB+ of unified memory to power the best open-source models in the world. What should I test first?

English

16.8K

Justin Waugh@JustinWaugh·3d

@Ex0byt @MichaelDell Quoted at 169k + 14k tax, and 60 day to delivery date.

English

174

Eric@Ex0byt·3d

Take my money @MichaelDell

English

5.9K

Justin Waugh@JustinWaugh·3d

@mweinbach (called them and got this number, precisely: 169,005.32 base, 183,029.39 with tax) i was seriously ready to buy at ~80k-100k price point, but was shocked to hear the price they are asking.

English

Justin Waugh@JustinWaugh·3d

@mweinbach I got quoted at 170k + 13k taxes (60 day delivery)

English

461

Justin Waugh@JustinWaugh·3d

@thsottiaux Codex stops early way too often, and i have no way to meta-prompt it to wake itself up on a schedule / keep iterating like i can with claude code

English

134

Tibo@thsottiaux·3d

What are we consistently getting wrong with codex that you wish we would improve / fix?

English

1.2K

872

141.3K

Justin Waugh@JustinWaugh·3d

@StirlingForge @NVIDIAAIDev I got a quote: 170k (+14k tax) w/ 2 month delivery time

English

1.1K

Stirling Forge (Unsupervised)@StirlingForge·3d

@NVIDIAAIDev HOW MUCH?!

English

8.8K

NVIDIA AI Developer@NVIDIAAIDev·3d

NVIDIA DGX Station is now available to order from select OEMs🔥 Powered by the GB300 Grace Blackwell Ultra Desktop Superchip, DGX Station brings data-center-class AI performance to the desk — enabling developers to build and run autonomous AI agents locally. ⚡ 748GB of coherent memory ⚡ Up to 20 petaflops of AI compute performance ⚡ Run large open models up to one trillion parameters Together with NVIDIA NemoClaw — an open source stack that simplifies running OpenClaw always-on assistants, more safely, with a single command, we are delivering a full-stack platform for secure, long-running agentic AI. Learn more: #dgx-spark-station" target="_blank" rel="nofollow noopener">blogs.nvidia.com/blog/gtc-2026-…

English

114

153

1.4K

234.4K

Justin Waugh@JustinWaugh·4d

@towheretobegin Tokyo transit isochrone I made

English

213

leon@towheretobegin·4d

When I moved to new york, I found it hard to visualize what commute times actually looked like. The same dilemma occurs every time you move, or even book a hotel: what's actually accessible in 20 minutes of public transit? Deployment link below

English

136

5.2K

1.4M

Justin Waugh@JustinWaugh·6d

@_chenglou Very cool~ I wonder if it would generalize well across more puzzle varieties (eg. those in pencil puzzle bench ppbench.com )

English

569

Cheng Lou@_chenglou·6d

I’m very happy to present my toy research project: Sotaku! It's a neural net that automatically discovered the rules of sudoku and learned to solve them, achieving a new state-of-the-art score of 98.9% on one of the hardest sudoku datasets, while being agnostic to the game, and beating all other sudoku-optimized neural net architectures* Read more for fun motivations, plus some extremely unconventional discoveries, e.g. reverse curriculum consistently beating curriculum (!), emergent reasoning-like capabilities, and the future of traditional programming

English

1.2K

86.7K

Justin Waugh@JustinWaugh·6d

@carterwmckay @scaling01 ppbench.com

QME

Carter McKay@carterwmckay·6d

@scaling01 Where are you finding the puzzles, all the sites I tried had some broken puzzles, 0 and 1 points directly at each other.

English

Lisan al Gaib@scaling01·6d

felt good about myself after I solved 3 Yajilins in 5-9 minutes each for puzzles that models couldn't solve then encountered thanos on the next one aaaand im down to 40 minutes again it's over

Lisan al Gaib@scaling01

I got absolutely mogged by the smartest human alive

English

4.2K

Justin Waugh@JustinWaugh·13 Mar

@AndilesAnthony @scaling01 For this puzzle specifically here's the trace: @xhigh&puzzle=yajilin_de2cf706b2ff47627cc6ded790ff3de4" target="_blank" rel="nofollow noopener">ppbench.com/replay.html?mo… it took ~30 minutes thinking and then one-shot it, haha.

English

Justin Waugh@JustinWaugh·13 Mar

I wrote the benchmark and ran gpt-5.4-xhigh test shown here. Paper is available on arxiv: arxiv.org/html/2603.0211… website has full traces of the model on all the puzzles @xhigh.html" target="_blank" rel="nofollow noopener">ppbench.com/model/gpt-5.4@… (full details available on huggingface) For your actual questions: GPT-5.4@xhigh was run against these puzzles using the API and with this basic-agentic harness, defined here. github.com/approximatelab…

English

Lisan al Gaib@scaling01·12 Mar

just tried another level of this benchmark that only GPT-5.4-xhigh could solve i solved it, but it took me 44 minutes GPT-5.4-xhigh did it in 38 minutes ppbench.com/share/QVBZEEtF…

Lisan al Gaib@scaling01

this is like the 5th benchmark im discovering today very cool

English

141

18K

Justin Waugh@JustinWaugh·12 Mar

@scaling01 That was a fun one one for sure! Took me 17:05 (and I consider myself pretty good at these / have done many yajilin before) (also, thanks for posting this, prompted me to try it, led to learning the open-graph preview for share had a bug on X!) ppbench.com/share/QVBZEEhD…

English

429

Justin Waugh retweetledi

Ethan Mollick@emollick·12 Mar

Exponential improvements* everywhere for those with the eyes to see them. This is a cool benchmark, and was impossible for early non-reasoner LLMs to do at all. * Okay, technically "logistic improvement" because the maximum score is bounded at 100 (and logistic has a lower AIC)

Justin Waugh@JustinWaugh

(1/N) Pencil Puzzle Bench is out! 51 LLMs tested on pencil puzzles (multi-step, logical reasoning, verifiable at each step) Dataset: 62k unique puzzles, 94 types. Evaluation: covers 300 puzzles across 20 types Best score: GPT 5.2 @xhigh 56%, half the puzzles are still unsolved

English

261

57.2K

Justin Waugh@JustinWaugh·12 Mar

@ChristosTzamos I recently released pencil-puzzle-bench. Awesome to see so many steps / decoding as a computer. Would be interested to see if it can adapt solutions for many puzzle types, not just sudoku as shown. ppbench.com

English

4.7K

Christos Tzamos@ChristosTzamos·12 Mar

1/4 LLMs solve research grade math problems but struggle with basic calculations. We bridge this gap by turning them to computers. We built a computer INSIDE a transformer that can run programs for millions of steps in seconds solving even the hardest Sudokus with 100% accuracy

English

239

787

5.9K

1.6M

Justin Waugh@JustinWaugh·10 Mar

@YafahEdelman pencil puzzle bench recently crossed 30%, but each LLM sucess is still expensive/slow ($5+, 10min+), especially compared to CPU based SAT solvers (custom to problem) that are ~7-8 orders cheaper and ~3 orders faster Huge efficiency gains still remain ppbench.com

English

253

Yafah Edelman@YafahEdelman·10 Mar

Okay, what benchmarks are still under 30%?

English

32.6K

Justin Waugh@JustinWaugh·10 Mar

Just released the harness and code I used for Pencil Puzzle Bench! (ppbench.com) Also has a gym environment and a verifiers environment Check it out on github github.com/approximatelab…

English

135

Justin Waugh@JustinWaugh·8 Mar

@eigenform Took this picture the other day

English

266

20.2K

meta@eigenform·8 Mar

latest installment in "is this a factorio screenshot or old intel p-core layout"

English

1.1K

20.7K

290.2K

Justin Waugh@JustinWaugh·8 Mar

@hardmaru theres still time

English

953

hardmaru@hardmaru·8 Mar

In an alternate timeline we’d be using Evangelion GUI designs rather than CLIs

English

102

700

7.2K

1.6M

Justin Waugh@JustinWaugh·8 Mar

@j_dekoninck Saw something similar with pencil puzzle bench: gpt-5.2@xhigh out performed gpt-5.2-pro. (gpt-5.2-pro scored closer to 5.2@medium) ppbench.com

English

Jasper Dekoninck@j_dekoninck·7 Mar

I know about some other benchmarks where GPT-5.4-Pro seemingly does not outperform GPT-5.4 by all that much, but this clearly shows it's at least better in some areas :)

English

2.1K

Jasper Dekoninck@j_dekoninck·7 Mar

One more: We now added GPT-5.4-Pro to ArxivMath February and Apex. Extremely expensive, but it is SoTa on both by quite a significant margin

English

360

28.7K

Justin Waugh@JustinWaugh·6 Mar

Just added GPT-5.4 to pencil-puzzle bench results! Large uplift (70% solved now). Longest single success for a puzzle took 95 minutes and $17.90 in inference alone See the full breakdown, play the puzzles, and look at the 5.4 traces here: ppbench.com

Justin Waugh@JustinWaugh

English

318

Keşfet

@LLMJunky @mweinbach @Ex0byt @MichaelDell @thsottiaux @StirlingForge @NVIDIAAIDev @towheretobegin