LeGOAT

7.9K posts

LeGOAT

@xGoatJames

Katılım Aralık 2023

1.9K Takip Edilen135 Takipçiler

LeGOAT@xGoatJames·7h

@MParakhin yeah you can kinda sidedoor into Pro via browser automation tools in Codex to prompt Pro (or @browser for the in-app one) but obviously limited in scope think that + real-time are the next big value levers for OAI. lets see

English

132

Mikhail Parakhin@MParakhin·7h

@xGoatJames Totally agree if you can bring Pro into the mix (through Pi, for example).

English

780

Mikhail Parakhin@MParakhin·8h

OK, going to call it. Spent a lot of time with Opus 4.8: 1) It is a big step forward. The base model is still inferior to GPT-5.5, but they dramatically upped the thinking budget (for Max) - makes all the difference 2) Instruction following is still worse than GPT-5.5 xhigh 3) Coding, math, reasoning - better! It's not at the Pro level (of course), but the first Anthropic model I can genuinely use for math/ML. Codex app is much better (especially on Windows), but, until 5.6 arrives, I switched to Claude Code as the main system. Hearing great things about 5.6 though!

English

526

46.9K

LeGOAT retweetledi

Max Marchione@maxmarchione·9h

Retatrutide is so potent it's risky. Doctors closest to the trial are saying this is not for someone trying to lose 15 or 20 pounds, but more for those with a BMI over 40, under medical management. With greater potency comes greater responsibility, and far less room for error.

English

146

31.5K

LeGOAT retweetledi

Ryan Carson@ryancarson·11h

Once you internalize the auto-research concept and use /goal with it, you unlock ridiculously fast app improvement on anything that has a numeric rubric. We’re all still too rooted in the “labor is expensive” world.

English

459

34.7K

LeGOAT retweetledi

Nicolas Bustamante@nicbstme·12h

To provide more context: with each new model, I typically delete most instructions in my system prompt, most tools, and most skills to observe if the model naturally understands how to perform key actions without all that scaffolding. For instance, I had a skill on how to use Gmail, GDrive, and other services with gocli. But now, the model possesses this knowledge, so I deleted everything. My skills are now more focused on providing instructions on how I prefer things to be done, my preferred style, and my tone...

Nick@nickbaumann_

Great read -- all it really takes is: - a harness - connectors to your data/tools - reliable, always-accessible agent(s) The models have reached the inflection point where it's not more complicated than this

English

12.3K

LeGOAT retweetledi

Coach James🇭🇹@TheJamesEdrick·1d

Bum ass point guard clanking both free throws with a minute left in the game and be walking around like Pac

🕸️ brick@NotLikeBrick

Game 7 used to have real wars man

English

494

6.1K

639.6K

LeGOAT retweetledi

manuel@manwelllb_·1d

coming to terms with the fact that I might be insane

English

3.9K

17.2K

252.5K

LeGOAT retweetledi

Justin Skycak@justinskycak·1d

The dangerous part of passive learning is that it improves your vocabulary faster than your ability.

English

287

5.4K

LeGOAT@xGoatJames·1d

@scaling01 worth it for sure. Codex limits are too good imo

English

161

Lisan al Gaib@scaling01·1d

I'm thinking about getting a ChatGPT Pro sub limits are unfortunately trash with Claude

English

428

52.7K

LeGOAT retweetledi

GREG ISENBERG@gregisenberg·1d

I didn't cover Claude Opus 4.8 on my pod because I don't think it's MEANINGFULLY better than GPT 5.5 as of May 29th. We're entering the era where model releases start to feel like iPhone releases. Remember when every new iPhone was a genuine leap? Now it's a slightly better camera and you can't really tell the difference. That's where models are heading. 4.6 to 4.7 to 4.8. Each one is a little different. Nobody can agree if it's better or worse. The benchmarks say one thing, the vibes say another. The thing that actually matters right now is what's happening around the models. Claude Code shipped dynamic workflows this same week and that genuinely changes what one person can build. Codex shipped a desktop app with an in app browser that combines coding and knowledge work in one surface. Those are the releases that move the needle for people. The model underneath is becoming interchangeable. I think we're maybe 6 months from nobody caring which model they're using the way nobody cares which engine is in their Uber. You just want to get where you're going. When something genuinely changes the game for builders, I'll cover it on @startupideaspod. Opus 4.8 wasn't that. Dynamic workflows was. I'd rather save you the hour.

English

309

124

2.1K

539.2K

LeGOAT retweetledi

Edward Lando@edwardlando·2d

Like many VCs, I really do believe Anthropic and OpenAI are the last startups. There is no point to building anything else. Everything has been solved or will be shortly.

English

248

684

223.4K

LeGOAT@xGoatJames·1d

@bridgemindai How do you make sure all instances share context? Is there something you have to suppprr multi accounts even tho hasn’t ANT said that could get people banned

English

BridgeMind@bridgemindai·1d

This is what software development looks like in 2026. 6 Claude Opus 4.8 agents. All in ultracode mode. All running in parallel. Today, live on stream. Not 1 agent. Not 2. A fleet. 3 Max plans powering it. If I hit the limits, I buy a 4th live. One engineer orchestrating a swarm of frontier models is the new baseline. Come watch.

GIF

English

201

11.3K

LeGOAT retweetledi

Matt Turck@mattturck·2d

A day in the life of a VC in 2026: 9am: Board meeting. My main value-add is aggressively pushing for Anthropic/OpenAI usage in non-engineering functions. Briefly ponder how I became an SDR for foundation model companies. 1pm: Lunch with another VC. We discuss how startups can find "blue ocean" away Anthropic/OpenAI. We conclude we should probably just invest the rest of our funds directly into Anthropic/OpenAI. 3pm: Pitch meeting. Me: "Do you run on Anthropic or OpenAI?" Founder: "Both." Debating internally whether a company reselling Anthropic/OpenAI with a 10% gross margin is a good investment but hey, at least they're in the "token flow". 4:30pm: Deep due diligence. I ask Claude if it plans to build this exact startup natively in its next release. Same to ChatGPT. They both say yes. I pass on the deal. 6pm: Urgent call from a portco CTO: "We need an intro to upgrade our Anthropic tier!" I immediately agree to help them spend more of the venture dollars we just invested in them, on Anthropic. 8pm: Brainstorming next guests for the podcast. Thinking I should probably just try to get some folks from Anthropic and OpenAI.

English

968

268.9K

LeGOAT retweetledi

Viv@Vtrivedy10·2d

some cracked ppl (ie: @_duplessis) saying this 4.8 is actually rlly good One thing to watch: Anthropic and OpenAI moving towards “Goal APIs” - looks like /goal and Dynamic Workflows are steps to make this the default user experience Humans just want to specify a task and have it done and verified end to end. Looks like Opus 4.8 is trained especially for this behavior and the Labs are doubling down on that UX and ability

Claude@claudeai

Introducing Claude Opus 4.8: it builds on Opus 4.7 with sharper judgment, more honesty about its own progress, and the ability to work independently for longer than its predecessors. Available today at the same price.

English

14.6K

LeGOAT@xGoatJames·2d

@sudoingX Do you think Hermes as a harness beats cursors own harness?

English

209

Sudo su@sudoingX·2d

picking cursor auth in hermes agent opens every model cursor routes to. 100+ options. composer-2.5, claude opus, gpt-5.5, the whole stack cursor's inference layer covers. sign in once with your cursor account, hermes agent shows the model picker, you pick what you want. most hermes users today juggle 5+ api keys for the various providers. one cursor subscription via hermes agentnow covers most of them. composer-2.5 is the standout because it's cursor exclusive and frontier class at a fraction of the cost. but the menu is wider than that real task i ran. asked it to check availability across some gpu hosts i'm tracking. composer worked through the queries, parsed the results, returned a clean answer. 14 minutes of agent time, 17% of the context window used, three and a half minutes on the actual reasoning. a comparable opus 4.7 run would have burned roughly ten times the cost. frontier intelligence at a fraction of the cost isn't a tagline. it's the line on the status bar after a real run.

Sudo su@sudoingX

composer 2.5 is opus 4.7 class coding at 1/10 the cost. but it was cursor only. that just changed. i just shipped cursor as a hermes agent provider tonight. PR open upstream to nousresearch/hermes-agent, available from my fork right now while it merges. what this means: composer 2.5 + hermes memory + hermes skills + cron + acp subagents + multi-platform delivery, all in one harness. cheapest frontier coding model + deepest agent runtime. neither alone gets you here. the math: - composer 2.5: $0.50 input / $2.50 output per 1M - opus 4.7: $5.00 / $25.00 (10x cost) - gpt-5.5: $5.00 / $30.00 (12x cost) - gpt-5.5 pro: $30.00 / $180.00 (70x cost) same coding benchmark band (79.8% swe-bench multilingual vs opus 4.7's 80.5%, 63.2% cursorbench v3.1 vs 61.6%) at a fraction of the budget. PR: github.com/NousResearch/h… fork: github.com/sudoingX/herme… article with full receipts drops sat ~9pm ICT.

English

LeGOAT retweetledi

Artificial Analysis@ArtificialAnlys·2d

Anthropic just launched Claude Opus 4.8, and it is the new leader on our GDPval-AA benchmark for agentic real-world work tasks Opus 4.8 scored 1890 on GDPval-AA at launch with its 'max' effort setting, +137 points from Opus 4.7 and +121 points ahead of the next-best model, GPT-5.5 xhigh. Compared head-to-head on the GDPval task set, this implies a ~67% win rate against GPT-5.5 xhigh. @AnthropicAI shared access with us ahead of the public release to benchmark this model and we’re glad to see our benchmarks referenced in today’s launch. The rest of the Artificial Analysis Intelligence Index is in progress - we’ll share final results soon!

English

101

1.1K

65.8K

LeGOAT retweetledi

Claude@claudeai·2d

English

3.5K

8.6K

66.9K

14.4M

LeGOAT retweetledi

Tiago Forte@fortelabs·5d

I think the main thing AI has taught me, through all the time savings it brings, is that I’m not a very interesting person Faced with a surplus of free time, I realize I don’t really have hobbies besides content consumption I’m forced to conclude that I don’t have very deep friendships, and am not a core member of any particular community I’m not very cultured, I’m finding, and don’t have abiding interests in art or literature or history or much that isn’t directly related to my work I have a work-centric life, in other words. AI pulls back the curtain on just how impoverished such an existence is, by disabusing me of its necessity Given the freedom I’ve always said I wanted, I’m at a loss as to what to do with it, except plow myself even harder into work, thus exacerbating the lesson There’s nothing more confronting to humans than freedom

English

393

239

4.3K

404.5K

LeGOAT@xGoatJames·5d

@gfodor What reasoning level were the subagents? IMO a good config is - xhigh orchestrates - sub agents can be low or medium - xhigh then checks the work

English

230

gfodor.id@gfodor·5d

Instead of using an orchestrator, which performed terribly, I put 5.5 xhigh in a /goal to complete my plan but asked it to delegate to a subagent for code review at every milestone, and avoid proceeding until the review passes, and got *much* better results.

gfodor.id@gfodor

GPT 5.5 turned out a steaming pile overnight and wow is anyone actually good at this yet? This is starting to feel like programming again, that feeling it’s impossibly hard and painful

English

1.2K

59.3K

LeGOAT retweetledi

Disclose.tv@disclosetv·5d

NOW - Pope XIV says the church and Anthropic, will work together to "find the way for humanity, in this time of artificial intelligence."

English

1.2K

1.8K

16.2K

5.6M

Keşfet

@MParakhin @Browser @scaling01 @startupideaspod @bridgemindai @_duplessis @sudoingX @AnthropicAI