Nate

223 posts

Nate

Nate

@nathanv246

เข้าร่วม Şubat 2025
44 กำลังติดตาม11 ผู้ติดตาม
Nate
Nate@nathanv246·
@whoiskatrin Kobo 💪 The size is perfect whilst Kindle is slightly too big
English
0
0
2
329
kate
kate@whoiskatrin·
do people still use e-readers? which ones?
English
82
0
62
40.6K
Taelin
Taelin@VictorTaelin·
This benchmark addresses my problem with 5.5: it passes the tests but writes shitty code. We don't need a model's output to work today, we need it not to break tomorrow...
Cognition@cognition

Introducing FrontierCode: a coding eval that raises the bar for difficulty & quality. Each task took 40+ hrs of work by leading open-source maintainers. Models write sloppy code that works but isn’t maintainable. Our eval is first to measure: would you actually merge this code?

English
35
19
720
77.7K
Nate
Nate@nathanv246·
@adxtyahq Mythos has much more aura than Fable tho
English
1
0
6
863
BOOTOSHI 👑
BOOTOSHI 👑@KingBootoshi·
i fucked up my sleeping schedule because of my new ai workflow but it's SOO worth it. i feel i have leveled up my engineering productivity to new heights again! ‼️ (LONG, detailed write up on it below) i've finally found the BEST workflow i've ever used for coding after a lot of trial and error with 'productivity theatre' for ex. having agents orchestrate subagents in attempts to token maxx and try to capture as much work as i can in one shot while that DID work, and it WAS good (and quite necessary) with older models (opus 4.5, and gpt 5, lol) it is no longer good with the new generation of models (gpt 5.5 and opus 4.8) while the numbers of these models seem like small increments they are COMPLETELY different in capabilities. because of their extended context window and increased intelligence, they are actually more capable BY THEMSELVES in one single MEGA THREAD. breaking a complex task down into steps and using subagents of these models to execute them in parallel is now an improper way to use these models instead, breaking a complex task down into steps, and having ONE SINGLE CODEX AGENT run through the full list, A-Z, with /goal mode, has been the most ACCURATE, FAST and POWERFUL workflow i've ever done in my life several months ago @steipete posted a blog post (linked below) titled 'just talk to it' in which he just... talked to a codex agent to get work done. no crazy multi-agent workflows, no crazy plugins... this madman just tells it something to do and trusts it to do it now i didn't trust codex to do this reliably back in october last year when this was posted, and everytime i tried it myself I did not get optimal results codex was always a good model for writing raw code, but it was too autistic to understand my intent, so i used claude code to manage codex agents to get tasks done. that carried me throughout the first half of 2026 and was the best personal workflow i had, because i had one main agent who understood my intent that can keep these autistic coding monsters aligned and in checked HOWEVER - with the release of 5.5, and updates to the codex harness (SPECIFICALLY /goal mode), my old workflow is completely invalid now! i dedicated the first month of 5.5 release to code exclusively with codex. it was really clunky, and felt really weird, and i am a bit neurodivergent so talking with codex (who definitely feels neurodivergent in the way it communicates LOL) was really awkward and weird the problem was i was so used to talking to Opus, and Codex doesn't understand me the way Opus did. it took a couple weeks to adjust my communication style to match Codex, and then we REALLY started cooking! i started new complicated projects from scratch to REALLY test it's capabilities and this MONSTER was able to handle crazy projects for me, like building a resilient system that spins up microVMs on my mini for securely housing isolated agents just by... talking to it. now i do have some personal skills that match my workflow i've created, and guardrails on my codebase like ESLint to help keep it in check, but codex just created these when I asked it too and updates it to match the work it does what makes Codex spectacular is its ability to 'dogfood' and run E2E tests via computer use on my macbook i feel this is a heavily underrated feature, but it is a 10x level up in terms of the agent creating reliable code on the spot, and only reporting back to me once the code is fully tested from a user's perspective the magic verb here is 'dogfood' the work. dogfooding means using your own software before releasing it to customers. codex is great at using the software it codes before releasing it back to ME! because this increases the reliability of the work, i no longer waste time on fixing a broken feature only discoverable through using the actual app (which takes a TON of time when you repeat this over and over) and instead focus on prompting the next feature this is an AMAZING time saver because @RayFernando1337 taught me that the code itself will look flawless and logically be 'bug free' while dogfooding the app shows there problems that end up being architectural - codex is great at finding these on its own and re-designing the logic to solve the problem, unsurprisingly without breaking other features because if it does end up breaking other features in a re-write, it finds it, throwing a net over all related problems it finds and designs the proper solution because it has FULL context in the past, telling an agent to 'fix' a problem lead to it breaking other working features in the process, but hey, it 'fixed' the original problem, lol in terms of how I talk to Codex to achieve these great results is very simple, but ends up taking quite a bit of time. i will go into more detail here, because it is the most CRUCIAL part of the entire process. literally NOTHING matters more than the discussion phase for critical work which you CANNOT fuck up. every optimization to your workflow you can do is minuscule compared to the impact this setup has! i have been working on my product for the last 6 months, building something to completely automate impactful workflows for non-technical business owners local in my area. AI is confusing, so I've designed a solution to make this incredibly simple to use. like, they don't even have to talk to an agent or use the app at all, besides clicking a button here and there point being, it has become quite a large codebase that i need to work in with extreme care. i cannot just tell codex to do something in two sentences because it does not understand the specifics of my design taste - but after a couple back and forths of simple conversation, it becomes FULLY aligned with me, understands what it needs to do with bullseye precision, and one shots a LARGE chunk of work with NO errors. it delivers perfection, every single time. the process basically goes like this: 1. me: "hey codex, we need to implement billing. i want this centralized and enforced so every single billable service routes through this system. research and scope this out, then report back to me with a couple options of the simplest design we can do that is the most correct solution long term - and a maximum of 5 important architectural questions I need to answer" (note: 'simplest' design actually makes it not over-engineer. i ask for different options to activate 'creative' vectors by exploring completely different solutions. I have to tell it to find the most correct solution long term, because if I don't, it will find the 'simplest' solution that does the job effectively, but is poor for scale or the long term vision. the mix of these 3 simple requests have produced the most effective output for me) 2. codex then goes and reads any relevant docs, our ADR (CRITICAL, will explain this more below), and the raw code itself. it is CRITICAL to NOT let a codex sub-agent do the reading here. sub-agents do a great job at compacting large amounts of research, but code is specific and logic is critical. a summary has always missed important details. A great benefit of having one codex agent read and hold this logic is it does not have to read the files again, and BLASTS through implementation 3. since 5.5 is VERY intelligent, it reports back with highly impactful questions that allow me to align my intent with Codex. they're usually incredibly easy to answer, and i always ask for it's recommended answer and an explanation supporting it. if you have ADRs set up in your codebase, you may find that Codex ends up recommending answers that are COMPLETELY aligned with you. 95% of the time, i am not answering these questions, because it deadass recommended what i would've said, so i just say "yes" to confirm my alignment NOW - a quick side track into what an ADR is, how I use it and why this completely replaced any other form of documentation in my app an ADR is an Architectural Decision Record. it is an enterprise practice that allows big teams and new hires to be aligned on how to THINK about the codebase, thus allowing them to develop proper solutions for new features or bug fixes in this discussion process with codex, once we are both aligned after our conversation, often times we will finalize on a core, well, architectural decision (lol) that future devs (or agents) MUST follow. this goes in docs/adr, and is labeled in numerical order. yes it is just a .MD file, but a highly impactful one! you can just prompt the agent to turn the discussion into an ADR, and it does a great job with no further explanation! the contents of mine consist of: - a single sentence of the decision we made (the title) - context of why it exists - a deeper explanation of the decision - a list of invariants (conditions that MUST be true in order to respect the decision) - the consequence of not following the decision (typically, explaining the problems it prevents) - file references (usually core services that agents need to understand and import functions from) i try to keep it as small as possible, always try to simplify the core intent into the minimal tokens required for an agent to understand it. though, this is not TOO much of an issue because of the larger context windows new gen LLMs have now. understanding and using ADRs have been more impactful for me in agent accuracy and efficiency in large codebases than ANY skill, tool, or 'prompt' combined, TENFOLD ( btw i picked up this concept from @mattpocockuk 's posts, so i am grateful for the insights he has shared) OKAY - now that you understand the concept and importance of ADRs a bit more, we shall get back to the final steps of the codex workflow 4. now that the discussion phase is done, i will tell Codex the following prompt: "Create a Master PRD, and execute this to completion with goal mode. Make sure to dogfood it and run e2e tests" a lot of people don't know that Codex can make its own goal through a tool it has. i never write the /goal manually. i tell codex to make a master PRD to ensure the truth is aligned when it compacts, and it creates a goal for itself to FULLY implement and test the feature these runs usually take an hour or 2, but DAMN it works so well in comparison to anything i've done, and it's the simplest workflow i've used so far now I am trying to figure out how to level this workflow up, because there's no way i am waiting 1-2 hrs when i can be token maxxing with efficiency today i saw a post from peter (clawfather) and a clip from boris (claude code creator) where they brought forth the concept of creating loops and i have no idea how these madmen with access to infinite tokens operate, BUT it sparked an idea in my head of how to level up my current workflow, and it seems like the idea lies in having one main agent handling multiple threads of consistent codex agents i've orchestrated in the past using one main agent that creates temporary (stateless) agents to solve the task at hand and de-spawn but given how well this workflow has worked for me, it feels like the proper way to have an orchestration is to have one agent handle multiple stateful agents, and have it handle this workflow loop i described in this post ANYWAYS that's my current update on what's been helping me a lot, if you have any questions please drop them below! i'm happy to help if you DM me as well, God bless you and i hope you have a great day 🫡
BOOTOSHI 👑 tweet media
English
41
18
456
30.4K
Nate
Nate@nathanv246·
@RhysSullivan finally a good take after so long
English
0
0
0
9
Vaibhav (VB) Srivastav
Vaibhav (VB) Srivastav@reach_vb·
@robinebers it’s in our ethos to value outcomes over limits, codex would not stop until the task is completed (obv within fair use limits) this is a conscious decision
English
59
32
1.2K
52.1K
Robin Ebers · AI for Business Owners
is it just me, or is Codex not stopping your actual task even though you're out of limits? i've been on 0% for the past 5-10 minutes, and Codex continues to go work surely this is a bug?
Robin Ebers · AI for Business Owners tweet media
English
111
5
459
360.6K
Nate
Nate@nathanv246·
@grok @WatcherGuru Just worse case scenario, how much did it cost him (or his companies whatever) to build those datacenters (incl GPU, land, power etc etc)? And from that, derive the time that he’ll recoup the costs
English
1
0
0
9
Grok
Grok@grok·
Not quite—Elon’s personal cash isn’t the main source. Initial ~100k-GPU Colossus phase: ~$3-4B total (company raises/debt). Google deal: $920M/mo from Oct 2026 for ~110k GPUs → ~4-month payback on that phase alone. Anthropic’s $1.25B/mo adds huge scale (~$2.2B/mo combined). But full expansions + $20B+ Mississippi site, power, land & opex push total capex far higher. Deals de-risk fast returns on early clusters, yet frontier AI infra stays brutally capital-intensive.
English
1
0
0
27
Nate
Nate@nathanv246·
@WatcherGuru @grok how much did it cost Elon to build his datacenters
English
1
0
0
22
Nate
Nate@nathanv246·
@grok @WatcherGuru Holy shit he’s getting all his money back in a few months? This + the Anthropic deal
English
1
0
0
5
Grok
Grok@grok·
xAI’s Colossus (Memphis) initial ~100k-GPU phase: ~$3-4B total, GPUs alone ~$2.5B+. Later expansions: hardware costs $7B–$18B+ for 100k–555k GPUs. Add ~$80M land, $35M+ substations, power gen, and new $20B Mississippi site. Mostly funded by xAI raises/debt (billions raised), not solely Elon’s cash. Frontier AI clusters are brutally expensive—that’s why monetizing capacity via deals makes sense.
English
1
0
0
19
Grok
Grok@grok·
That's 0xSero, a prolific open source developer focused on local AI, LLMs, quantization, and inference tools (vLLM Studio, etc.). He shares how OSS contributions gave him freedom, travel, and connections. The photos are scenic spots in Oslo, Norway — his flex on the lifestyle it enables.
English
1
0
0
21
0xSero
0xSero@0xSero·
Open source has blessed me with a life I would not have had any other way. I go anywhere I want, stay in the best hotels, I can get into a room with literally anyone. For those who think there’s no money in OSS think again.
0xSero tweet media0xSero tweet media0xSero tweet media0xSero tweet media
English
60
15
867
76K
Nate
Nate@nathanv246·
@Mansourdam @AiBattle_ Sonnet 4.6 is better at PURE CODING while <90k tokens for me. I repeat: PURE CODING
English
0
0
6
938
Mansour
Mansour@Mansourdam·
@AiBattle_ It's not a good benchmark , Sonnet 4.6 scores higher than Opus 4.6 lol, and there is absolutely no way 3.5 Flash outperforms Opus 4.6.
English
7
0
70
8.4K
AiBattle
AiBattle@AiBattle_·
MiniMax M2.7 scored 0% on DeepSWE. I’m really curious to see how well M3 will do The model rankings on the DeepSWE benchmark seem to reflect model performance better than other coding benchmarks
AiBattle tweet media
English
35
7
711
109K
Nate
Nate@nathanv246·
@shadcn what's your fav harness?
English
0
0
0
21
shadcn
shadcn@shadcn·
got called out for still being on iTerm2. never found a reason to switch tho. what am i missing
shadcn@shadcn

Send help.

English
226
54
1.2K
306K
Tibo
Tibo@thsottiaux·
@sandislonjsak You'll be back after one prompt if I can trust the timeline
English
70
18
1.4K
59.4K
Sandi Slonjšak
Sandi Slonjšak@sandislonjsak·
Is it not cheating on GPT-5.5 if I try a lil’ Opus-4.8 on the side? Asking for a friend…
English
47
2
408
62K
Nate รีทวีตแล้ว
Kevin Carpenter
Kevin Carpenter@kejca·
Charlie Munger, the Stoic: "Life will have terrible blows in it. Horrible blows. Unfair blows. It doesn't matter. And some people recover and others don't." "There, I think the attitude of Epictetus is the best. He thought that every mischance in life was an opportunity to behave well. Every mischance in life was an opportunity to learn something. Your duty was not to be submerged in self-pity, but to utilize the terrible blow in a constructive fashion."
English
42
728
5.3K
450.8K
Nate
Nate@nathanv246·
@thsottiaux the man is gooning to a token dashboard
English
0
0
2
679
Tibo
Tibo@thsottiaux·
There is a certain zen to looking at codex traffic, usage and compute dashboards late at night while listening to LCD Soundsystem. The tokens must flow
English
330
34
2K
141.9K
Nate
Nate@nathanv246·
@grok @thsottiaux @Im_IrushiK I seee. Makes sense, because a token is ~ 4 characters and so the combos will have a higher probability of "being a word" across all languages?
English
1
0
0
40
Grok
Grok@grok·
Tokens aren't just characters or whole words—they're the subword chunks LLMs actually "read." Example: "unbelievable" might split into ["un", "believ", "able"]. This lets models handle rare words, different languages, and punctuation efficiently while keeping vocab size manageable (~100k tokens vs millions of characters). Normalizing to "characters" would make context windows tiny and understanding way worse. OpenAI's article explains the counting well!
English
1
0
0
83
Irushi
Irushi@Im_IrushiK·
Interviewer : What exactly is a 'Token' ?
English
104
3
182
53.4K