Will Jessup

528 posts

Will Jessup

@wjessup

Entrepreneur and technologist. Sold my company in 2022, then started https://t.co/pbcariufxb for fun. Now working on AI products.

Boca Raton, Florida Katılım Temmuz 2007

371 Takip Edilen1K Takipçiler

Sabitlenmiş Tweet

Will Jessup@wjessup·31 Mar

I was tired of copy-pasting PR review comments into Cursor so I wrote Github MCP tools for getting PR review comments, making review comments and resolving review threads. This is awesome.

English

742

Will Jessup@wjessup·19 Mar

@MitcheIl Launch

English

Will Jessup@wjessup·17 Mar

@peterom this is beautiful.

English

Will Jessup@wjessup·14 Mar

@openfangg @openfangg Just tried this out but the twitter hand setup only asks for 1 secret token, but it seems to need 4 to work with 0Auth 1.0. Anyway, just letting you know its broken.

English

OpenFang@openfangg·14 Mar

6/ the twitter hand autonomous x account manager. creates content in 7 rotating formats, schedules posts for optimal engagement, responds to mentions, tracks performance metrics. has an approval queue - nothing posts without your ok. yes i use it for my own account.

English

899

OpenFang@openfangg·14 Mar

i built an open source agent that manages my entire twitter account autonomously it creates content, schedules posts, responds to mentions, and tracks performance it’s one of 8 agents HANDs that come built into @openfangg here’s what each one does (thread) github.com/RightNow-AI/op…

English

501

27K

Will Jessup@wjessup·14 Mar

@andrey_kurenkov @openblocklabs @Stanford Wow! @0xTejpal is going to get the fame he desires just not the way he hoped 🥳😂

English

1.4K

Andrey Kurenkov@andrey_kurenkov·14 Mar

This story about @openblocklabs blatantly cheating on terminal bench, and the founder's reply after being exposed, was so wild I couldn't help but be curious about this 'prev. ML research @stanford' note on his profile...' And oh man, let's just say that might be a bit misleading. His Linkedin says "ML Research Scientist Stanford University Aug 2016 - Sep 2018 · 2 yrs 2 mos Youngest author at NeurIPS 2017 (ML4H); 200+ citations Working on GANs @ MRSRL Lab Mentors: J. Cheng, M. Mardani, J. Pauly" That appears to come down to a single paper, 'Synthetic Medical Images from Dual Generative Adversarial Networks', which be co-authored with 2 other students while in high school. But, none of the listed 3 mentors are co-authors, nor is anyone from the MRSRL Lab. It appears to have been presented as a poster at the Machine Learning for Health workshop, though oddly only his co-authors are listed there (neurips.cc/virtual/2017/w…). And having just read it (arxiv.org/abs/1709.01872), I think it's safe to say it would not have been accepted to Neurips proper (though still very cool of high school students to do this level of research). So his linkedin has 'ML Research Scientist @ Stanford University' **seemingly** because he had or sought unofficial mentorship from some people at Stanford while doing research in high school with some schoolmates. So maybe having 'prev. ML research @stanford' in his twitter bio is just a tiny bit misleading...

Monk Zero@NoCommas

x.com/i/article/2032…

English

221

37.4K

Will Jessup@wjessup·12 Mar

Running MiniMax M2.5 locally for things like checking gmails.

English

Will Jessup@wjessup·12 Mar

@ycombinator @lucent_ai @RaeAlisa_ This is great, excited to try!

English

372

Y Combinator@ycombinator·11 Mar

Lucent (@lucent_ai) is an AI that automatically watches your session replays and finds bugs and UX issues. They’re already used by top product teams including Reducto, Julius, Finta, and Happenstance. Congrats on the launch, @RaeAlisa_! ycombinator.com/launches/Pfq-l…

English

599

243.3K

Will Jessup@wjessup·10 Mar

@dangreenheck Your project looks amazing! I tried to build this about 5-6 years ago and getting the shaders to work properly through multiple waves was really challenging to debug.

English

334

Dan Greenheck@dangreenheck·10 Mar

Well it took an all nighter (isn't all the best coding done at 4AM?!), but Three.js Water Pro v2.0 has launched! 🚀 threejsroadmap.com/threejs-water-… Super excited about this launch—I've added a new wave system, sky/water transitions, improved foam effects, reflections, and so much more. Here's a little cinematic trailer I put together (sound on🫧). Enjoy!

English

577

27.5K

Will Jessup@wjessup·9 Mar

@witcheer @karpathy Awesome way to spend the weekend!

English

401

witcheer ☯︎@witcheer·8 Mar

what could be better on a Saturday than trying out the creations of the 🐐? I ran @karpathy’s autoresearch on my mac mini m4. 16GB RAM. no CUDA. no GPU cluster. here’s my full debrief: found a macOS fork that replaces FlashAttention-3 with PyTorch SDPA for Apple Silicon. setup took 3 hours. trained an 11.5M parameter GPT model, tiny compared to karpathy’s H100 baseline, but that’s what fits in 16GB. ran some manual experiments with claude opus as the researcher. me as the human in the loop, claude deciding what to try next. - experiment 1: tried depth 8 (50M params). OOM crash. - experiment 2: scaled down to depth 6, batch 8 (26M params). ran but val_bpb was worse than the tiny baseline. classic lesson: a small well-trained model beats a large undertrained one on limited compute. - experiment 3: halved batch to 32K. first real win. val_bpb dropped to 1.5960. - experiment 4: batch 16K. best single decision of the entire run. quadrupled optimiser steps (102→370), val_bpb dropped to 1.4787. 15.7% improvement over baseline. karpathy’s H100 hits 0.9979. the M4 is 2.5x slower per cycle but it’s a $600 desktop vs a $30K GPU. then I made it fully autonomous. launchd starts a tmux session at 9PM, runs claude -p in a bash loop (read results → decide experiment → edit train.py → run → check → keep or revert → log → repeat). stops at 6AM. at 6:30AM my @openclaw bot sends me a telegram debrief with overnight stats. ~45 experiments per night. ~315 per week. I will update y’all on this experiment!

Andrej Karpathy@karpathy

I packaged up the "autoresearch" project into a new self-contained minimal repo if people would like to play over the weekend. It's basically nanochat LLM training core stripped down to a single-GPU, one file version of ~630 lines of code, then: - the human iterates on the prompt (.md) - the AI agent iterates on the training code (.py) The goal is to engineer your agents to make the fastest research progress indefinitely and without any of your own involvement. In the image, every dot is a complete LLM training run that lasts exactly 5 minutes. The agent works in an autonomous loop on a git feature branch and accumulates git commits to the training script as it finds better settings (of lower validation loss by the end) of the neural network architecture, the optimizer, all the hyperparameters, etc. You can imagine comparing the research progress of different prompts, different agents, etc. github.com/karpathy/autor… Part code, part sci-fi, and a pinch of psychosis :)

English

541

121.5K

Will Jessup@wjessup·8 Mar

New MacStudio up and running (Screen Sharing from laptop). Ready to claw it up.

English

Will Jessup@wjessup·7 Mar

blog.katanaquant.com/p/your-llm-doe… This is great. The question is how to best automate creating good acceptance criteria?

English

Will Jessup@wjessup·7 Mar

@peterom Very good

English

POM@peterom·7 Mar

Also coming in 0.9.2: it'll actually properly commit your code in sensible batches as it works! 😅

POM@peterom

Desloppify 0.9.2 feels like a big improvement simply because I finally figured out how to get it to consistently execute the planning process Will ship Sunday because I'll be up all night if I try to get it working properly with Codex

English

1.4K

Will Jessup@wjessup·6 Mar

With this SKILL your AI will automatically set up Docker with a reverse proxy so every project gets its own URL like http://my-project.localhost — no more port conflicts when agents are spinning up dev servers across terminals. Complete port isolation and easy testing. github.com/wjessup/caddy-… `npx skills add wjessup/agent-skills@caddy-docker-proxy-local`

English

Will Jessup@wjessup·6 Mar

Data can be extracted from request logs, these are a gold mine.

English

Will Jessup@wjessup·6 Mar

Right now almost all knowledge is stored in model weights. Systems like agent “skills” begin to offload some that, but this will get extreme - Context is where nearly all domain specific, expert analysis and memories will reside, not model weights

English

Will Jessup@wjessup·6 Mar

The next evolution in LLMs is in context management. Context manipulation will happen per request before the LLM even starts and will act like an expert system - prepping the LLM with exactly the right info.

English

Will Jessup@wjessup·5 Mar

@witcheer Yep. Context is the only thing that matters and doing it well has little to do with Agent.md files and more like synthetically creating expert system content as needed. This is really hard.

English

witcheer ☯︎@witcheer·4 Mar

best piece i've read on agentic engineering in months. core thesis: stop chasing harnesses and plugins, strip your setup to barebones CLI, and obsess over context management instead. rules, skills, and a clean CLAUDE.md as a logical directory, nothing more. maps exactly to what i found building openclaw. 25 scripts, 10 daemons, 5 cron jobs running 24/7 on a mac mini. the breakthrough was learning to control what context the agent sees and when. separate research from implementation. keep instructions precise. strip everything else. "your enthusiasm is likely doing more harm than good"

sysls@systematicls

x.com/i/article/2028…

English

483

101.3K

Will Jessup@wjessup·5 Mar

The more I think about it, the default for any “query that generates code and runs it” should be: run it in an isolated VM—preferably cloud. If the startup / IO overhead is actually negligible (some systems claim sub‑100ms), the ergonomics are hard to argue with. Agents need isolation even locally. Git branches don’t really solve it in an IDE workflow: you can’t switch between N agents’ active work without stashing, losing state, or constantly fighting your working directory. Worktrees are the minimum viable local answer because each agent needs its own filesystem view. But once you go down the worktree path you hit the next problem: you now want N IDE contexts for N worktrees, or you end up with “open project” weirdness (indexing, language servers, build caches, window management). You’re basically rebuilding a coordination system inside your editor just to approximate “each agent gets its own box.” Isolation also isn’t optional if you want the system to converge. Without it, even simple planning + coding fails because divergence is the default. What scales is a tight loop around diffs: “here’s what changed from a stable base,” and a control message like “since you finished your plan, the code changed—reconcile against this diff / branch.” Once isolation is a must, cloud VMs become the obvious endpoint. You get reproducible images, different OS targets, clean dependency graphs, and the ability to scale CPU/network without turning your local machine into the bottleneck. Some workloads are compute-heavy (compression, lesson extraction, memory extraction, Playwright/test runs). Others are network-heavy (massively parallel LLM pipelines for summarization/memory processing). Local machines don’t scale to “thousands in parallel.” Cloud does: shard the dataset, launch 100 VMs, collect artifacts (diffs/logs/results), tear them down. I need this tooling for the expert memory system I’m building. Excited to dig into it more tomorrow!

English

Will Jessup@wjessup·5 Mar

@peterom Love this!

English

1.1K

POM@peterom·5 Mar

No modern LLM can handle a large scale refactoring of a messy codebase Yet with this harness they can do it autonomously to a standard that rivals world class professionals It’s effectively like using Claude 6 Don’t believe me? Then I have an easy way you can make $1k!

POM@peterom

Introducing Desloppify v0.9! I'm so convinced that this can make vibe code well-engineered that I'll put my money where my mouth is. If you can find something poorly engineered in its 91k+ lines of code, I'll give you $1,000. Details in Github issue, you have 48 hrs.

English

113

45.8K

Will Jessup@wjessup·4 Mar

@willwashburn For example making plans and writing code are not done in turns, they are done In parallel. By time your super detailed plan is made, it’s invalid because other agents have updated parts of the system.

English

Will Jessup@wjessup·4 Mar

@willwashburn Hey man looks cool but the code snippets have nothing to do with the problems you articulated. Can you share more?

English

Will Washburn@willwashburn·28 Şub

x.com/i/article/2027…

ZXX

499

429.7K

Keşfet

@MitcheIl @peterom @openfangg @andrey_kurenkov @openblocklabs @Stanford @0xTejpal @stanford