Jeffrey Emanuel

15.6K posts

Jeffrey Emanuel banner
Jeffrey Emanuel

Jeffrey Emanuel

@doodlestein

Former Quant Investor | My Open Source Projects: https://t.co/9qbOCDlaqM | Try https://t.co/oCtjI2mBIl , my collection of agent coding tooling.

NYC Katılım Nisan 2009
12.9K Takip Edilen41.2K Takipçiler
Jeffrey Emanuel
Jeffrey Emanuel@doodlestein·
@pvncher This is a classic slashdot nerd joke. Original jokes are still mostly out of reach of these models. Ironically I think GPT 4.5 was the best at it so far despite being so old now (reasoning doesn’t work with jokes, you need that instant gestalt shift): x.com/doodlestein/st…
Jeffrey Emanuel@doodlestein

More on GPT4.5 and humor. I asked it: Can you explain why this is funny: A duck goes into a druggist and asks for some Chapstick. The druggist asks, "Will that be cash or charge? The duck says, "Just put it on my bill." It responded like this: This joke’s humor emerges from a linguistic pun involving a word with two distinct meanings: Step-by-step reasoning: 1.Contextual Setup: A duck enters a pharmacy and requests Chapstick. There’s a surreal, whimsical aspect because ducks don’t typically engage in commercial transactions or have need for Chapstick. This whimsical context primes the listener for something playful or absurd. 2.Punchline (“Just put it on my bill.”): •Literal meaning: In commercial transactions, “put it on my bill” means to charge the item to one’s account. •Pun meaning: Ducks literally have a bill (their beak). The humor arises because the phrase simultaneously refers to the duck’s physical bill and the financial meaning of bill. 3.Humorous tension and resolution: When first hearing the druggist’s question, the listener anticipates a conventional response (“cash” or “charge”). Instead, the duck cleverly references its own anatomy (“bill”), subverting expectations. This sudden semantic shift provides a “rewarding” cognitive surprise, creating amusement. In short, it’s funny precisely because the phrase “put it on my bill” neatly combines a mundane human context (charging a purchase) and a playful animal-centric meaning (the duck’s physical bill), in one concise punchline. — This isn’t bad, but it’s not the whole picture. So I prodded it to look deeper: “You missed an absolutely critical component to the joke that raises it to a far higher level of humor than a typical pun-based joke. Try to figure out what I’m talking about.” Then it really *did* appreciate the joke fully: You’re right—there’s a subtle yet critical layer of humor I overlooked. Let me carefully unpack it again: Deeper Analysis: On the surface, this joke looks like a simple pun about the dual meaning of “bill,” as I previously mentioned. But what’s special is that the pun isn’t merely linguistic—it’s situationally justified and logically supported by the structure of the narrative itself. The Hidden, Essential Component: The critical nuance: The joke explicitly sets up a scenario involving Chapstick, a product designed specifically for lips. But ducks don’t have lips—they have bills. Thus, the duck’s request is inherently absurd because a duck would logically apply Chapstick directly to its bill. This small detail completely elevates the joke: •If the duck had asked for something generic like aspirin, the pun would remain superficial—just a linguistic joke about “bill.” •However, the duck specifically asks for Chapstick, thereby silently introducing the absurdity of a duck applying lip balm onto its bill rather than human lips. When the duck says “put it on my bill,” it not only cleverly subverts payment expectations, it simultaneously delivers a hidden visual absurdity: the duck literally applying Chapstick on its own bill. This deeper level of humor, which I initially overlooked, arises from an extra cognitive leap required to notice that ducks literally replace “lips” with “bill,” making the pun perfectly tailored to the scenario. Summarized Clearly: The key brilliance of the joke lies in the careful selection of Chapstick—a detail so subtle yet completely crucial—because it silently introduces the hilarious visual absurdity that the duck would literally put the Chapstick “on its bill,” adding a layer of physical humor and logical coherence that far surpasses a simple linguistic pun. — Finally, I wanted to test it to see if it could come up with another joke like that: “Good job! Now my question to you is, can you think of something like that which works on all those levels at once? Try to think of 100 such jokes and then only tell me the very best one you think of.” (Its joke in the next tweet!)

English
0
2
11
5.8K
Jeffrey Emanuel
Jeffrey Emanuel@doodlestein·
I’m surprised that you don’t see OpenAI announcing major partnerships with FANUC and KUKA. These companies make extremely accurate and powerful robot arms that can do small-scale work all the way up to lifting and spinning entire car frames. The problem with them has always been that the programming is extremely tedious, finicky, and time consuming. So the cost of the equipment is just the beginning since you need integrator consultants to program them for you. And then they’re not flexible. Even after you finally dial everything in, you’re now stuck with that configuration. Good luck trying to optimize the overall layout of your plant or to change other work streams without needing to reprogram everything. But this is precisely where advanced AI, vision, RL, world models, etc. are so valuable. But it’s going to be hard for the big labs to match 30+ years of mechanical engineering excellence and learnings like the robot arm companies have. Not to mention 30+ year supplier relationships with every large Western auto OEM. And the arm companies certainly aren’t going to be training any frontier transformers soon! Anyway, that’s what I would be advising them to do now.
Jeffrey Emanuel tweet mediaJeffrey Emanuel tweet media
English
8
2
34
2.5K
Shlok Khemani
Shlok Khemani@shloked·
Jeffrey constantly (and for free) shares some of the highest value ideas on this app. He pushes me to work harder with agents, and has directly and indirectly influenced how I think about the future. High alpha follow.
Jeffrey Emanuel@doodlestein

People are constantly asking me about my planning and execution methodology for creating software using my Agent Flywheel system of tooling, prompts, and workflows. As a result, I find myself posting the same link, often multiple times in a day, to a post of mine that includes links to 5 other X posts and threads I've made about my methodology. While this "works," in that a motivated person can read through each post and understand my approach pretty well, I realize that it's far from optimal, and a lot of people see that and just give up quickly. So I finally decided to gather together all my materials on my method and turn them into two different articles with different target audiences. Perhaps unsurprisingly, I was able to extensively leverage my own tools to do this effectively. For one, I was able to use my xf tool (for searching your personal X post archive that you can download from X) to pull in all the various posts and my replies to people in those threads into a single large markdown document. Then, I had agents use my cass tool to search for my real-world usage of my various tools and to gain insights into my planning process from firsthand observation. I also had a lot of materials in the tutorials section of the Agent Flywheel website, as well as in various agent skills I've created. All of this was woven together and synthesized into a single comprehensive document, The Flywheel Approach to Planning and Bead Creation: agent-flywheel.com/complete-guide This is the new canonical and complete guide to my approach, with everything in one place and synthesized into a coherent whole so that you don't need to scrounge around for all the different posts. I will also be updating the article as my methodology evolves and in response to reader feedback on what is confusing or unclear (so please let me know in the comments). Incidentally, as I got to the final stages of preparing this document, I found this prompt to be extremely useful: "Read the entire document again with fresh eyes all the way through, putting yourself in the position of a smart software developer who is new to agentic coding and doesn't know how to use the Flywheel or agent swarms effectively yet and who doesn't understand the planning process or beads, etc. What would be most confusing? How could we make it more engaging and intuitive without removing any content and without simplifying anything (think additively)?" Beyond that big comprehensive guide, as the Flywheel system has grown to 20+ tools now, I've heard repeatedly from people that they find the entire system too overwhelming, because there are so many tools to understand. But the truth is, there is a "core" to the Flywheel approach which captures most of the value and just uses 3 tools: * My Agent Mail project for coordination and communication of multiple agents of various types; * beads_rust (br) for task management; and * beads_viewer (bv) for automatically triaging the beads graph so that agents always work on the optimal next bead to maximize overall development velocity. So to that end, I created a separate, shorter, more-focused article for beginners to the system, the Flywheel Core Loop Guide: agent-flywheel.com/core-flywheel If you've previously been interested in the Flywheel but found it to be too hard to understand or had "information overload" (which is totally understandable... this stuff emerged organically over months of working on this stuff, so I'm sure it's a lot to take in all at once like that), I highly recommend checking it out. Once you get the hang of it, you can then layer in additional utilities, starting with destructive_command_guard (dcg) to prevent agents from blowing up your projects or machine; coding_agent_session_search (cass) to search instantly across all your agent sessions, and give this power to your agents themselves; and ultimate_bug_scanner (ubs) for finding bugs and problems across most popular programming languages in a single tool that is heavily optimized for use by agents.

English
2
0
6
341
Henrique Bastos
Henrique Bastos@henriquebastos·
Killing time before an appointment, watching @doodlestein’s video to learn about his agent mail MCP. At one point he mentioned his agents use python-decouple, which I wrote years ago. I caught myself smiling. Things really do come full circle.
English
1
0
8
549
Henrique Bastos
Henrique Bastos@henriquebastos·
@doodlestein Tks. I saw you’re deep into Rust. I’ve been following closely and learning a lot. Loved your development cycle manual. Thank you for sharing.
English
1
0
1
48
Jeffrey Emanuel
Jeffrey Emanuel@doodlestein·
@svpino It will continue just as before, except now it has funding in perpetuity.
English
3
0
16
1.7K
Charlie Marsh
Charlie Marsh@charliermarsh·
We've entered into an agreement to join OpenAI as part of the Codex team. I'm incredibly proud of the work we've done so far, incredibly grateful to everyone that's supported us, and incredibly excited to keep building tools that make programming feel different.
English
269
131
2.9K
359.4K
Jeffrey Emanuel
Jeffrey Emanuel@doodlestein·
@PaulSolt Probably, but Claude is a much better writer, so I thought it would do a better job with the documentation. I think it’s also better at making tooling more agent-intuitive. In any case, I probably would have done a similar number of fresh eye passes even with Codex.
English
1
1
2
224
Paul Solt
Paul Solt@PaulSolt·
@doodlestein Would you have used Codex and GPT-5.4 with less effort? That’s a lot of fresh eyes bug fix passes. Or can you not get this result easily with Codex?
English
1
0
2
128
Jeffrey Emanuel
Jeffrey Emanuel@doodlestein·
I wanted to give a quick, self-contained example of how rapidly I can now develop useful, polished, well-documented software to solve a problem. In this case, about an hour or so, and that's with the overhead of trying to write it all up nicely and document everything for public consumption. The problem: my friend @exhaze has been working on some cool formal verification stuff and had some issues he wanted to flag to me about asupersync and inquired as to the best way to do that. I mentioned that the easiest was to just post a GitHub issues submission in the repo so that my clankers can review and check it. But then he asked, what if it's security related? Maybe we can use encryption. Well, sure, that shouldn't be so hard. But rather than try to find some finished project out there that may or may not work well for this use case, I start informally by asking ChatGPT the following: "Suppose someone wants to post github issues for my project publicly but with encrypted messages and then they send to me out of band the decryption key. Then I would have my coding agents (using codex running on ubuntu) check the gh issues and detect when an issue is encrypted and use the decryption key. What's the best, easiest to use, off-the-shelf open source way to do this using ECC?" This quickly yielded a good answer, so I then fired up Claude Code and pasted in the output from ChatGPT and asked it to actually build the thing. OK, so what? Anyone could do this, and tons of people do it every day. Nothing very special about that, right? Well, what I think is interesting is how much further I pushed everything, resulting in a highly polished and well-documented script, and article explaining how to use it, and a skill for agents to help them use it. I added all this to my "misc_coding_agent_tips_and_scripts" repo which I use as a sort of grab-bag of random learnings, and you can see the final product here: #encrypted-github-issues" target="_blank" rel="nofollow noopener">github.com/Dicklesworthst… I was going to just include the key prompts that I used in that session here, but then decided to use a slick feature in my cass (coding_agent_session_search) tool that lets you export agent sessions as nicely formatted, self-contained, static html files which are easy to serve via GitHub Pages. So now you can see the entire session start to finish here and see all the tricks and techniques I used to get really high-quality outputs that are bullet-proof and extremely agent-intuitive and agent-ergonomic (and also hopefully human intuitive, but I care less and less about that each passing day!): dicklesworthstone.github.io/misc_coding_ag… And THAT is the difference between what anyone might whip up and what I can extract from Claude Code with a bit more time and effort and work on my part. Notice in the session log the sheer number of times and different ways I asked it to review its code and the writeup, and how many times that caused it to find new and different bugs and problems and edge cases and fix them. I see people complain about how Opus sucks at coding, but I am positive that they aren't doing this sort of thing, which is absolutely required to get good quality results. And even then, after so many rounds of checks and reviews, it turned out that there was no substitute for deploying the whole thing to GitHub and then trying the entire installation process end-to-end on a fresh machine (luckily I have a ton of VPS boxes waiting around as remote compilation helpers for my rch tool!), where it discovered that the installer forgot to actually install the script itself! The lesson is clear: no amount of static code analysis is enough to catch everything. Also notice how much time I spent making sure that everything was highly optimized for consumption by coding agents, and how I asked Claude Code to introspect about what IT would want to use and read if it had to approach this situation with zero knowledge. If you aren't extremely focused on this aspect of things in all your tooling, you're making a big mistake, since these tools are increasingly going to be used ONLY by agents, and their opinion matters!
Jeffrey Emanuel tweet media
English
8
13
179
13K
Jeffrey Emanuel
Jeffrey Emanuel@doodlestein·
@clanker_ It’s because the average user is having the models do incredibly basic, formulaic stuff. But I agree with you that, for anything nontrivial, you should only ever use the best frontier models with reasoning effort cranked to the maximum.
English
0
0
6
467
clanker
clanker@clanker_·
there's this bizarre influenza psyop going around, endorsed by the frontier labs, that you should be using the mini version of a model, medium reasoning, & whatever other bs, when literally the only usable combo has only ever been the best model with the highest reasoning effort
English
2
0
6
837
Jeffrey Emanuel
Jeffrey Emanuel@doodlestein·
@NicholasBardy @tenobrus I’m sure that workflow makes sense for many people but I’m just not interested in it. You never know in advance exactly when you’re going to need the extra intelligence. Maybe there’s a subtle problem, a bug in a library, etc. I don’t want to take chances.
English
2
0
1
55
Nicholas Bardy
Nicholas Bardy@NicholasBardy·
@doodlestein @tenobrus You can get a lot out of gpt 5.4 spark though with the right plans. Some work is just dumb execution. Especially if you have big codex spec out the math and algos
English
1
0
2
52
Jeffrey Emanuel
Jeffrey Emanuel@doodlestein·
@harleytt Don’t see how that would help. The point is just that you need to actually run the code at some point to know if it really works.
English
0
0
0
176
Harley Trung
Harley Trung@harleytt·
@doodlestein > it turned out that there was no substitute for deploying the whole thing to GitHub would gh codespaces help? Too slow? Anyone using it anymore
English
1
0
1
279
Jeffrey Emanuel
Jeffrey Emanuel@doodlestein·
@nohups I tell them to never bother with time estimates, just sequencing/order and priorities. You just create a bunch of beads and have them work on them.
English
0
0
6
199
nohups
nohups@nohups·
@doodlestein Do you have any tips for having a planning agent spec out multi hour packs of work? I find they always vastly overestimate how long it’ll take to complete work. “4-6 weeks” usually takes 90 mins. “7 days” takes around 20.
English
1
0
2
266
Jeffrey Emanuel
Jeffrey Emanuel@doodlestein·
I know people get really bent out of shape by the onslaught of low-quality "slop PRs" and GitHub issues in open-source projects, but when you automate the entire process, it doesn't bother you at all. Note the 2 "Spam/nonsensical issues" that Claude closed with extreme prejudice:
Jeffrey Emanuel tweet media
English
4
2
47
4.3K
nick@BSLABS
nick@BSLABS@F_AI_Mouse·
Today I turned a page and resigned from a 20 year career doing identity and access management for BofA. I'm so excited for what comes next.
English
1
0
4
256
Jeffrey Emanuel
Jeffrey Emanuel@doodlestein·
@BillyMar10 Because this is a tiny little side project that isn't valuable to me or others and not worth spending time on. I didn't even create a separate repo for it. And it still ended up coming out well.
English
0
0
2
302
Billy
Billy@BillyMar10·
@doodlestein Why didn't you use Agent Flywheel for this? It was my understanding that it brings ChatGPT and Claude into the same system so that research, planning, and implementation can be model agnostic and iterate on a loop for more optimal outcomes. Why didn’t you use that here ?
English
1
0
1
349
Jeffrey Emanuel
Jeffrey Emanuel@doodlestein·
@osekkat It's just faster, and Claude can use tools to make it easier. Even if GPT Pro did it for me, I'd still need to manually copy and paste it in, so why not just have Claude Code to it for me locally?
English
1
0
0
29
Oussama Sekkat
Oussama Sekkat@osekkat·
@doodlestein when you ask chatGPT Pro for a diff style changes (when reviewing a plan for ex), why do you then have claude code integrate the revisions? Why not let chatGPT Pro integrate the revisions itself since it already knows the original plan?
English
1
0
0
28
Jeffrey Emanuel
Jeffrey Emanuel@doodlestein·
@ManacasterBen @karpathy @chute75llc As a general purpose computer, yes. As an AI accelerator for running local models, I'm not so sure. I have a garage filled with GPUs from 2018 that I used to mine Ethereum, and I can't bring myself to liquidate them because I doubt I could get more then a few hundred bucks total.
English
1
0
2
59
Andrej Karpathy
Andrej Karpathy@karpathy·
Thank you Jensen and NVIDIA! She’s a real beauty! I was told I’d be getting a secret gift, with a hint that it requires 20 amps. (So I knew it had to be good). She’ll make for a beautiful, spacious home for my Dobby the House Elf claw, among lots of other tinkering, thank you!!
NVIDIA AI Developer@NVIDIAAIDev

🙌 Andrej Karpathy’s lab has received the first DGX Station GB300 -- a Dell Pro Max with GB300. 💚 We can't wait to see what you’ll create @karpathy! 🔗 #dgx-station" target="_blank" rel="nofollow noopener">blogs.nvidia.com/blog/gtc-2026-… @DellTech

English
492
777
17.8K
875.3K