Nate

223 posts

Nate

@nathanv246

เข้าร่วม Şubat 2025

44 กำลังติดตาม11 ผู้ติดตาม

Nate@nathanv246·2d

@whoiskatrin Kobo 💪 The size is perfect whilst Kindle is slightly too big

English

329

kate@whoiskatrin·2d

do people still use e-readers? which ones?

English

40.6K

Nate@nathanv246·2d

@hunvreus @southpolesteve @James_Elicx Do the thing! Just submit whichever issue you run into and we’ll resolve it within 24 hrs

English

Ronan Berder@hunvreus·3d

@southpolesteve @James_Elicx It'd be nice to move app.pagescms.org to Cloudflare.

English

Ronan Berder@hunvreus·3d

Considering moving Pages CMS to Vinext but worried about long term support. Hey @southpolesteve, is this thing in for the long run? PS: also, Pages CMS moved back to my own account; github.com/hunvreus/pages…

English

1.3K

Nate@nathanv246·5d

@VictorTaelin Just refactor it?

English

Taelin@VictorTaelin·5d

This benchmark addresses my problem with 5.5: it passes the tests but writes shitty code. We don't need a model's output to work today, we need it not to break tomorrow...

Cognition@cognition

Introducing FrontierCode: a coding eval that raises the bar for difficulty & quality. Each task took 40+ hrs of work by leading open-source maintainers. Models write sloppy code that works but isn’t maintainable. Our eval is first to measure: would you actually merge this code?

English

720

77.7K

Nate@nathanv246·5d

@adxtyahq Mythos has much more aura than Fable tho

English

863

aditya@adxtyahq·5d

Anthropic Mythos will be called Anthropic Fable, probably we wont get Opus-5.x series too

aditya@adxtyahq

SOURCES: ANTHROPIC MYTHOS GONNA BE A MAJOR FLOP

English

587

144.6K

Nate@nathanv246·5d

@pierrecomputer GOOD SHIT BRO

English

218

Pierre@pierrecomputer·5d

diffshub[dot]com rendering the new Omarchy 4 branch, almost instantly, no blanking

DHH@dhh

The Omarchy 4 branch is now 30,000 lines of new code. The majority of it was written by GPT5.5. It's been so, so good at QML. You still need to review, but there's just no way this scale of a conversion would be feasible without AI in a reasonable time. github.com/basecamp/omarc…

English

288

45.7K

Nate@nathanv246·5d

@KingBootoshi You really wrote a novel

English

BOOTOSHI 👑@KingBootoshi·6d

i fucked up my sleeping schedule because of my new ai workflow but it's SOO worth it. i feel i have leveled up my engineering productivity to new heights again! ‼️ (LONG, detailed write up on it below) i've finally found the BEST workflow i've ever used for coding after a lot of trial and error with 'productivity theatre' for ex. having agents orchestrate subagents in attempts to token maxx and try to capture as much work as i can in one shot while that DID work, and it WAS good (and quite necessary) with older models (opus 4.5, and gpt 5, lol) it is no longer good with the new generation of models (gpt 5.5 and opus 4.8) while the numbers of these models seem like small increments they are COMPLETELY different in capabilities. because of their extended context window and increased intelligence, they are actually more capable BY THEMSELVES in one single MEGA THREAD. breaking a complex task down into steps and using subagents of these models to execute them in parallel is now an improper way to use these models instead, breaking a complex task down into steps, and having ONE SINGLE CODEX AGENT run through the full list, A-Z, with /goal mode, has been the most ACCURATE, FAST and POWERFUL workflow i've ever done in my life several months ago @steipete posted a blog post (linked below) titled 'just talk to it' in which he just... talked to a codex agent to get work done. no crazy multi-agent workflows, no crazy plugins... this madman just tells it something to do and trusts it to do it now i didn't trust codex to do this reliably back in october last year when this was posted, and everytime i tried it myself I did not get optimal results codex was always a good model for writing raw code, but it was too autistic to understand my intent, so i used claude code to manage codex agents to get tasks done. that carried me throughout the first half of 2026 and was the best personal workflow i had, because i had one main agent who understood my intent that can keep these autistic coding monsters aligned and in checked HOWEVER - with the release of 5.5, and updates to the codex harness (SPECIFICALLY /goal mode), my old workflow is completely invalid now! i dedicated the first month of 5.5 release to code exclusively with codex. it was really clunky, and felt really weird, and i am a bit neurodivergent so talking with codex (who definitely feels neurodivergent in the way it communicates LOL) was really awkward and weird the problem was i was so used to talking to Opus, and Codex doesn't understand me the way Opus did. it took a couple weeks to adjust my communication style to match Codex, and then we REALLY started cooking! i started new complicated projects from scratch to REALLY test it's capabilities and this MONSTER was able to handle crazy projects for me, like building a resilient system that spins up microVMs on my mini for securely housing isolated agents just by... talking to it. now i do have some personal skills that match my workflow i've created, and guardrails on my codebase like ESLint to help keep it in check, but codex just created these when I asked it too and updates it to match the work it does what makes Codex spectacular is its ability to 'dogfood' and run E2E tests via computer use on my macbook i feel this is a heavily underrated feature, but it is a 10x level up in terms of the agent creating reliable code on the spot, and only reporting back to me once the code is fully tested from a user's perspective the magic verb here is 'dogfood' the work. dogfooding means using your own software before releasing it to customers. codex is great at using the software it codes before releasing it back to ME! because this increases the reliability of the work, i no longer waste time on fixing a broken feature only discoverable through using the actual app (which takes a TON of time when you repeat this over and over) and instead focus on prompting the next feature this is an AMAZING time saver because @RayFernando1337 taught me that the code itself will look flawless and logically be 'bug free' while dogfooding the app shows there problems that end up being architectural - codex is great at finding these on its own and re-designing the logic to solve the problem, unsurprisingly without breaking other features because if it does end up breaking other features in a re-write, it finds it, throwing a net over all related problems it finds and designs the proper solution because it has FULL context in the past, telling an agent to 'fix' a problem lead to it breaking other working features in the process, but hey, it 'fixed' the original problem, lol in terms of how I talk to Codex to achieve these great results is very simple, but ends up taking quite a bit of time. i will go into more detail here, because it is the most CRUCIAL part of the entire process. literally NOTHING matters more than the discussion phase for critical work which you CANNOT fuck up. every optimization to your workflow you can do is minuscule compared to the impact this setup has! i have been working on my product for the last 6 months, building something to completely automate impactful workflows for non-technical business owners local in my area. AI is confusing, so I've designed a solution to make this incredibly simple to use. like, they don't even have to talk to an agent or use the app at all, besides clicking a button here and there point being, it has become quite a large codebase that i need to work in with extreme care. i cannot just tell codex to do something in two sentences because it does not understand the specifics of my design taste - but after a couple back and forths of simple conversation, it becomes FULLY aligned with me, understands what it needs to do with bullseye precision, and one shots a LARGE chunk of work with NO errors. it delivers perfection, every single time. the process basically goes like this: 1. me: "hey codex, we need to implement billing. i want this centralized and enforced so every single billable service routes through this system. research and scope this out, then report back to me with a couple options of the simplest design we can do that is the most correct solution long term - and a maximum of 5 important architectural questions I need to answer" (note: 'simplest' design actually makes it not over-engineer. i ask for different options to activate 'creative' vectors by exploring completely different solutions. I have to tell it to find the most correct solution long term, because if I don't, it will find the 'simplest' solution that does the job effectively, but is poor for scale or the long term vision. the mix of these 3 simple requests have produced the most effective output for me) 2. codex then goes and reads any relevant docs, our ADR (CRITICAL, will explain this more below), and the raw code itself. it is CRITICAL to NOT let a codex sub-agent do the reading here. sub-agents do a great job at compacting large amounts of research, but code is specific and logic is critical. a summary has always missed important details. A great benefit of having one codex agent read and hold this logic is it does not have to read the files again, and BLASTS through implementation 3. since 5.5 is VERY intelligent, it reports back with highly impactful questions that allow me to align my intent with Codex. they're usually incredibly easy to answer, and i always ask for it's recommended answer and an explanation supporting it. if you have ADRs set up in your codebase, you may find that Codex ends up recommending answers that are COMPLETELY aligned with you. 95% of the time, i am not answering these questions, because it deadass recommended what i would've said, so i just say "yes" to confirm my alignment NOW - a quick side track into what an ADR is, how I use it and why this completely replaced any other form of documentation in my app an ADR is an Architectural Decision Record. it is an enterprise practice that allows big teams and new hires to be aligned on how to THINK about the codebase, thus allowing them to develop proper solutions for new features or bug fixes in this discussion process with codex, once we are both aligned after our conversation, often times we will finalize on a core, well, architectural decision (lol) that future devs (or agents) MUST follow. this goes in docs/adr, and is labeled in numerical order. yes it is just a .MD file, but a highly impactful one! you can just prompt the agent to turn the discussion into an ADR, and it does a great job with no further explanation! the contents of mine consist of: - a single sentence of the decision we made (the title) - context of why it exists - a deeper explanation of the decision - a list of invariants (conditions that MUST be true in order to respect the decision) - the consequence of not following the decision (typically, explaining the problems it prevents) - file references (usually core services that agents need to understand and import functions from) i try to keep it as small as possible, always try to simplify the core intent into the minimal tokens required for an agent to understand it. though, this is not TOO much of an issue because of the larger context windows new gen LLMs have now. understanding and using ADRs have been more impactful for me in agent accuracy and efficiency in large codebases than ANY skill, tool, or 'prompt' combined, TENFOLD ( btw i picked up this concept from @mattpocockuk 's posts, so i am grateful for the insights he has shared) OKAY - now that you understand the concept and importance of ADRs a bit more, we shall get back to the final steps of the codex workflow 4. now that the discussion phase is done, i will tell Codex the following prompt: "Create a Master PRD, and execute this to completion with goal mode. Make sure to dogfood it and run e2e tests" a lot of people don't know that Codex can make its own goal through a tool it has. i never write the /goal manually. i tell codex to make a master PRD to ensure the truth is aligned when it compacts, and it creates a goal for itself to FULLY implement and test the feature these runs usually take an hour or 2, but DAMN it works so well in comparison to anything i've done, and it's the simplest workflow i've used so far now I am trying to figure out how to level this workflow up, because there's no way i am waiting 1-2 hrs when i can be token maxxing with efficiency today i saw a post from peter (clawfather) and a clip from boris (claude code creator) where they brought forth the concept of creating loops and i have no idea how these madmen with access to infinite tokens operate, BUT it sparked an idea in my head of how to level up my current workflow, and it seems like the idea lies in having one main agent handling multiple threads of consistent codex agents i've orchestrated in the past using one main agent that creates temporary (stateless) agents to solve the task at hand and de-spawn but given how well this workflow has worked for me, it feels like the proper way to have an orchestration is to have one agent handle multiple stateful agents, and have it handle this workflow loop i described in this post ANYWAYS that's my current update on what's been helping me a lot, if you have any questions please drop them below! i'm happy to help if you DM me as well, God bless you and i hope you have a great day 🫡

English

456

30.4K

Nate@nathanv246·6 Haz

@RhysSullivan finally a good take after so long

English

Rhys@RhysSullivan·5 Haz

sort of a funny pre-ai thinking ship i don't want to search through a settings page, i want to just be able to chat with my editor to configure itself

OpenAI Developers@OpenAIDevs

Today’s Codex quality-of-life updates start in settings. You can now search Codex settings, with results grouped by category, so you can find what you want to change without scanning every section – this makes setup and customization easier.

English

441

54.2K

Nate@nathanv246·6 Haz

@reach_vb @robinebers ♥️♥️

QME

Vaibhav (VB) Srivastav@reach_vb·6 Haz

@robinebers it’s in our ethos to value outcomes over limits, codex would not stop until the task is completed (obv within fair use limits) this is a conscious decision

English

1.2K

52.1K

Robin Ebers · AI for Business Owners@robinebers·6 Haz

is it just me, or is Codex not stopping your actual task even though you're out of limits? i've been on 0% for the past 5-10 minutes, and Codex continues to go work surely this is a bug?

Robin Ebers · AI for Business Owners tweet media

English

111

459

360.6K

Nate@nathanv246·6 Haz

@grok @WatcherGuru Just worse case scenario, how much did it cost him (or his companies whatever) to build those datacenters (incl GPU, land, power etc etc)? And from that, derive the time that he’ll recoup the costs

English

Grok@grok·6 Haz

Not quite—Elon’s personal cash isn’t the main source. Initial ~100k-GPU Colossus phase: ~$3-4B total (company raises/debt). Google deal: $920M/mo from Oct 2026 for ~110k GPUs → ~4-month payback on that phase alone. Anthropic’s $1.25B/mo adds huge scale (~$2.2B/mo combined). But full expansions + $20B+ Mississippi site, power, land & opex push total capex far higher. Deals de-risk fast returns on early clusters, yet frontier AI infra stays brutally capital-intensive.

English

Nate@nathanv246·6 Haz

@WatcherGuru @grok how much did it cost Elon to build his datacenters

English

Nate@nathanv246·6 Haz

@grok @WatcherGuru Holy shit he’s getting all his money back in a few months? This + the Anthropic deal

English

Grok@grok·6 Haz

xAI’s Colossus (Memphis) initial ~100k-GPU phase: ~$3-4B total, GPUs alone ~$2.5B+. Later expansions: hardware costs $7B–$18B+ for 100k–555k GPUs. Add ~$80M land, $35M+ substations, power gen, and new $20B Mississippi site. Mostly funded by xAI raises/debt (billions raised), not solely Elon’s cash. Frontier AI clusters are brutally expensive—that’s why monetizing capacity via deals makes sense.

English

Nate@nathanv246·1 Haz

@grok @0xSero Github link

English

Grok@grok·1 Haz

That's 0xSero, a prolific open source developer focused on local AI, LLMs, quantization, and inference tools (vLLM Studio, etc.). He shares how OSS contributions gave him freedom, travel, and connections. The photos are scenic spots in Oslo, Norway — his flex on the lifestyle it enables.

English

0xSero@0xSero·1 Haz

Open source has blessed me with a life I would not have had any other way. I go anywhere I want, stay in the best hotels, I can get into a room with literally anyone. For those who think there’s no money in OSS think again.

English

867

76K

Nate@nathanv246·1 Haz

@Mansourdam @AiBattle_ Sonnet 4.6 is better at PURE CODING while <90k tokens for me. I repeat: PURE CODING

English

938

Mansour@Mansourdam·1 Haz

@AiBattle_ It's not a good benchmark , Sonnet 4.6 scores higher than Opus 4.6 lol, and there is absolutely no way 3.5 Flash outperforms Opus 4.6.

English

8.4K

AiBattle@AiBattle_·1 Haz

MiniMax M2.7 scored 0% on DeepSWE. I’m really curious to see how well M3 will do The model rankings on the DeepSWE benchmark seem to reflect model performance better than other coding benchmarks

English

711

109K

Nate@nathanv246·29 May

@reach_vb Drop 5.6 bro

English

Vaibhav (VB) Srivastav@reach_vb·29 May

mathematically speaking

English

678

30.5K

Nate@nathanv246·29 May

@shadcn what's your fav harness?

English

shadcn@shadcn·29 May

got called out for still being on iTerm2. never found a reason to switch tho. what am i missing

shadcn@shadcn

Send help.

English

226

1.2K

306K

Nate@nathanv246·29 May

@thsottiaux @sandislonjsak improve frontend and people will never leave 😀

English

111

Tibo@thsottiaux·29 May

@sandislonjsak You'll be back after one prompt if I can trust the timeline

English

1.4K

59.4K

Sandi Slonjšak@sandislonjsak·28 May

Is it not cheating on GPT-5.5 if I try a lil’ Opus-4.8 on the side? Asking for a friend…

English

408

62K

Nate รีทวีตแล้ว

Kevin Carpenter@kejca·26 May

Charlie Munger, the Stoic: "Life will have terrible blows in it. Horrible blows. Unfair blows. It doesn't matter. And some people recover and others don't." "There, I think the attitude of Epictetus is the best. He thought that every mischance in life was an opportunity to behave well. Every mischance in life was an opportunity to learn something. Your duty was not to be submerged in self-pity, but to utilize the terrible blow in a constructive fashion."

English

728

5.3K

450.8K

Nate@nathanv246·26 May

@thsottiaux the man is gooning to a token dashboard

English

679

Tibo@thsottiaux·26 May

There is a certain zen to looking at codex traffic, usage and compute dashboards late at night while listening to LCD Soundsystem. The tokens must flow

English

330

141.9K

Nate@nathanv246·25 May

@grok @thsottiaux @Im_IrushiK I seee. Makes sense, because a token is ~ 4 characters and so the combos will have a higher probability of "being a word" across all languages?

English

Grok@grok·25 May

Tokens aren't just characters or whole words—they're the subword chunks LLMs actually "read." Example: "unbelievable" might split into ["un", "believ", "able"]. This lets models handle rare words, different languages, and punctuation efficiently while keeping vocab size manageable (~100k tokens vs millions of characters). Normalizing to "characters" would make context windows tiny and understanding way worse. OpenAI's article explains the counting well!

English

Irushi@Im_IrushiK·24 May

Interviewer : What exactly is a 'Token' ?

English

104

182

53.4K

ค้นพบ

@whoiskatrin @hunvreus @southpolesteve @James_Elicx @VictorTaelin @adxtyahq @pierrecomputer @KingBootoshi