Jonathan Low

353 posts

Jonathan Low banner
Jonathan Low

Jonathan Low

@jonathanlowhy

Building Mistle - Open source background agents

가입일 Nisan 2010
481 팔로잉161 팔로워
Jonathan Low
Jonathan Low@jonathanlowhy·
@scottmcpherson @steipete Codex computer use is just a MCP. If you have Codex Desktop installed and computer use is supported, Codex CLI can call it too.
English
1
0
0
42
Scott McPherson
Scott McPherson@scottmcpherson·
@steipete Have you tried cmux with codex computer use? I haven't had any luck.
English
2
0
1
2.7K
Jonathan Low
Jonathan Low@jonathanlowhy·
@lubinho_k I get a peace of mind with GPT 5.5. What do you think can be done well with Composer 2.5 where the difference doesn’t matter?
English
1
0
1
1.6K
Luckforest
Luckforest@lubinho_k·
I am on the $200 Claude, $100 Codex, $20 Cursor Plan. After using Composer 2.5 for 8 hours straight while only using 8% of my $20 plan, I should reconsider my entire subscription stack. Maybe $100 Codex for complex stuff, and $60 Cursor for UI & Copy?
Luckforest tweet media
English
181
28
1.3K
115.4K
Jonathan Low
Jonathan Low@jonathanlowhy·
@kunchenguid GPT has its own quirks and is heavily RLed in a specific manner which makes sense for there to be a Codex harness. Eg. Apply_patch is unique to GPT models because of its RL.
English
0
0
0
23
Kun Chen
Kun Chen@kunchenguid·
i'm strongly against model companies focusing too much on harness, but i would love to hear if anyone has a strong argument for it my reason against it: if openai didn't build GPT 5.5, no one else can. this is their core competence if openai didn't build codex cli and app, we have opencode and t3code. building harness is NOT their core competence this is not saying products like claude code, codex aren't good - i genuinely think these are top tier products built by really talented people my point is - the world might be a better place if model companies focus more on their core capability and give us better, faster, safer and cheaper models, rather than competing with the ecosystem in the application layer what do you think?
Greg Brockman@gdb

the model alone is no longer the product

English
250
19
516
117.5K
Jonathan Low
Jonathan Low@jonathanlowhy·
@mattlam_ Curious. What do you use that Codex Cloud cannot provide? The remote stuff?
English
0
0
0
41
Matthew Lam
Matthew Lam@mattlam_·
@jonathanlowhy Codex is great too, but codex cloud is behind, and gpt5.5 is behind on fe/design still imo
English
1
0
2
412
Matthew Lam
Matthew Lam@mattlam_·
agreed. People are sleeping on Cursor and Composer 2.5 is just the start. If you actually try Cursor 3 you'll see: - ux is superb, as good as any coding app - harness tied with codex imo at the top - allows any models, I mostly use Composer 2.5 and Opus - cursor cli, I still need to try, but if it makes the cloud agent handoff seamless like Cursor 3 i'll for sure use There's definitely improvements I'm looking for but they're way ahead of the cloud agents game, and their models are only just starting.
am.will@LLMJunky

People hate on Cursor, or even go as far to laugh at it. "$60B for an IDE LOL" "It's just a VSCode fork" Yeah, and Tesla is just a car. Youtube is just a video player. and the iPhone is just a phone It's honestly hysterical how wrong they are. GPT 5.5 High Fast and Cursor 2.5 Fast feel unbelievably good in this harness. I have been building non-stop for the last week. Between Cursor and my Codex sub, I want for nothing.

English
12
4
227
19.6K
Jonathan Low
Jonathan Low@jonathanlowhy·
@apeatling @OpenAIDevs I think I manage to get it to work for me by having multiple sessions at the same time. The fatigue from context switching is real though. I can only maintain max-multi-sessioning for 5 hours or so.
English
1
0
0
41
Andy Peatling
Andy Peatling@apeatling·
I find myself drifting back to Claude Code from Codex. Primarily for speed, it just feels faster to me. Even if I have Codex on fast mode. @OpenAIDevs
English
1
1
4
456
Jonathan Low
Jonathan Low@jonathanlowhy·
@hunvreus Code correctness is a verifiable loop that can be RL-ed on. Hard to do that with creative work where it is taste that matters.
English
1
0
1
56
Ronan Berder
Ronan Berder@hunvreus·
2 reasons why AI may struggle to generate good design and copy, but not code: - The generated artifact is what people consume. With code, they consume the executed version. If it works, they don't care what it looks like. - Constraints are good for creative tasks. Programming has a LOT of constraints. In design, unless you're working within a design system, you have almost none. Same goes for writing.
English
3
0
4
628
Jonathan Low
Jonathan Low@jonathanlowhy·
3k+ lines of frontend code in a file. Is it a problem if Codex doesn't have an issue with it?
Jonathan Low tweet media
English
0
0
1
36
Jonathan Low
Jonathan Low@jonathanlowhy·
@josevalim With Codex, once implementation is done, I will get Codex to focus on cleaning up the abstraction smells. It helps to have a doc in the repo to explain what the smells are and then point Codex to it.
English
0
0
0
97
José Valim
José Valim@josevalim·
Here is a simple but good example of how Codex tends to handle tasks better than Claude Code. A user reported that some actions in tidewave.ai had keyboard shortcuts but were not displaying them on mouseover. I asked Codex to find and fix the missing cases. Codex found all of them and additionally introduced a small helper called shortcutLabel that maps a ShortcutAction to its label (see code screenshot). The benefits are two: * It uses the ShortcutAction type to ensure we don't accidentally forget the label of any shortcut * The helper was added to shortcut.ts, colocating labels with the shortcuts themselves, making future additions more foolproof Codex also updated the other places where we listed shortcuts to use the new helper. I didn't ask Codex to create the helper but it was the right call. Maybe we would have suggested it during code review, maybe we would not. But I'd say Codex left the codebase in a better state than it found it. I gave Claude Code the exact same prompt three different times. In every case, it just inlined the shortcuts in the templates, duplicating information across multiple files. I rarely feel Claude Code improves the codebase unless I explicitly tell it to do so. Now the flipside is that sometimes Codex is going to go ahead and create abstractions when they are not needed, but so far I have seen more hits than misses.
José Valim tweet mediaJosé Valim tweet media
English
13
7
104
10.3K
Jonathan Low
Jonathan Low@jonathanlowhy·
What’s better? Using PR review tools like CodeRabbit/Greptile, or building a custom one? Currently experimenting with a custom PR review agent on @mistledev. I like how I get full control over the review scope rather than working with a black box.
Jonathan Low tweet media
English
1
0
1
74
Jonathan Low
Jonathan Low@jonathanlowhy·
@brianchew Depends on the use case. Some things can benefit from more cores (eg. building Rust). Or memory (local LLMs). If you think you would be doing those, then you should definitely go for a better machine.
English
0
0
1
73
Brian Chew
Brian Chew@brianchew·
on the fence of buying a maxed out MacBook for the start of my career - do I get it? I currently have a 16GB Ram M1 MBP thoughts? PS: I’m quite thrifty so this is kind of a big decision
English
26
0
19
3.3K
Jonathan Low 리트윗함
Thomas Jiang
Thomas Jiang@thomasjiangcy·
Correct user attribution is pretty important when working with background agents (esp. in teams): - who made the commit? - who opened the PR? - who left a comment? This sounds easy on the surface: just ensure your credentials are scoped correctly and owned by the right person when these actions are performed. Now, what if you're triggering these actions from Slack? or Linear? Here's how @mistledev does it 👇
English
1
1
1
54
Jonathan Low
Jonathan Low@jonathanlowhy·
@did0f Models/thinking are tied to the configuration of a turn. Changing this will only take effect when you make your next submit after the turn. (Steering does not count as it's still mid-turn)
English
1
0
3
642
Francesco Di Donato
In Codex, if I change the thinking mode mid-execution, will it: > ignore it until my next prompt? > complete the current internal thinking step and switch to new intelligence level on next one? > immediately switch it mid-thinking (would that even be possible?!)
Francesco Di Donato tweet media
English
70
2
300
81.7K
Jonathan Low
Jonathan Low@jonathanlowhy·
Building background agents internally can take months to get right. You have to think about sandboxes, credential brokering, permissions, integrations and more. With @mistledev, we're building it to enable teams to get their own Stripe Minions up and running in minutes.
English
0
0
1
30
Jonathan Low
Jonathan Low@jonathanlowhy·
This is a forcing function to throw tokens at a problem instead of money.
Tyler Bosmeny@bosmeny

A mic drop moment @ycombinator tonight @sama just offered $2M in OpenAI tokens to EVERY YC startup in the current batch in exchange for equity Just like Yuri Milner offering to invest in every startup back when Sam was a YC partner I can't wait to see what's unlocked when you let the most driven, creative and formidable founders tokenmaxx

English
0
0
0
40