LeGOAT

7.9K posts

LeGOAT

LeGOAT

@xGoatJames

Katılım Aralık 2023
1.9K Takip Edilen135 Takipçiler
LeGOAT
LeGOAT@xGoatJames·
@MParakhin yeah you can kinda sidedoor into Pro via browser automation tools in Codex to prompt Pro (or @browser for the in-app one) but obviously limited in scope think that + real-time are the next big value levers for OAI. lets see
English
0
0
1
132
Mikhail Parakhin
Mikhail Parakhin@MParakhin·
OK, going to call it. Spent a lot of time with Opus 4.8: 1) It is a big step forward. The base model is still inferior to GPT-5.5, but they dramatically upped the thinking budget (for Max) - makes all the difference 2) Instruction following is still worse than GPT-5.5 xhigh 3) Coding, math, reasoning - better! It's not at the Pro level (of course), but the first Anthropic model I can genuinely use for math/ML. Codex app is much better (especially on Windows), but, until 5.6 arrives, I switched to Claude Code as the main system. Hearing great things about 5.6 though!
English
26
17
526
46.9K
LeGOAT retweetledi
Max Marchione
Max Marchione@maxmarchione·
Retatrutide is so potent it's risky. Doctors closest to the trial are saying this is not for someone trying to lose 15 or 20 pounds, but more for those with a BMI over 40, under medical management. With greater potency comes greater responsibility, and far less room for error.
Max Marchione tweet media
English
45
11
146
31.5K
LeGOAT retweetledi
Ryan Carson
Ryan Carson@ryancarson·
Once you internalize the auto-research concept and use /goal with it, you unlock ridiculously fast app improvement on anything that has a numeric rubric. We’re all still too rooted in the “labor is expensive” world.
English
25
18
459
34.7K
LeGOAT retweetledi
Nicolas Bustamante
Nicolas Bustamante@nicbstme·
To provide more context: with each new model, I typically delete most instructions in my system prompt, most tools, and most skills to observe if the model naturally understands how to perform key actions without all that scaffolding. For instance, I had a skill on how to use Gmail, GDrive, and other services with gocli. But now, the model possesses this knowledge, so I deleted everything. My skills are now more focused on providing instructions on how I prefer things to be done, my preferred style, and my tone...
Nick@nickbaumann_

Great read -- all it really takes is: - a harness - connectors to your data/tools - reliable, always-accessible agent(s) The models have reached the inflection point where it's not more complicated than this

English
9
7
80
12.3K
LeGOAT retweetledi
manuel
manuel@manwelllb_·
coming to terms with the fact that I might be insane
manuel tweet media
English
25
3.9K
17.2K
252.5K
LeGOAT retweetledi
Justin Skycak
Justin Skycak@justinskycak·
The dangerous part of passive learning is that it improves your vocabulary faster than your ability.
English
3
30
287
5.4K
LeGOAT
LeGOAT@xGoatJames·
@scaling01 worth it for sure. Codex limits are too good imo
English
0
0
0
161
Lisan al Gaib
Lisan al Gaib@scaling01·
I'm thinking about getting a ChatGPT Pro sub limits are unfortunately trash with Claude
English
70
2
428
52.7K
LeGOAT retweetledi
GREG ISENBERG
GREG ISENBERG@gregisenberg·
I didn't cover Claude Opus 4.8 on my pod because I don't think it's MEANINGFULLY better than GPT 5.5 as of May 29th. We're entering the era where model releases start to feel like iPhone releases. Remember when every new iPhone was a genuine leap? Now it's a slightly better camera and you can't really tell the difference. That's where models are heading. 4.6 to 4.7 to 4.8. Each one is a little different. Nobody can agree if it's better or worse. The benchmarks say one thing, the vibes say another. The thing that actually matters right now is what's happening around the models. Claude Code shipped dynamic workflows this same week and that genuinely changes what one person can build. Codex shipped a desktop app with an in app browser that combines coding and knowledge work in one surface. Those are the releases that move the needle for people. The model underneath is becoming interchangeable. I think we're maybe 6 months from nobody caring which model they're using the way nobody cares which engine is in their Uber. You just want to get where you're going. When something genuinely changes the game for builders, I'll cover it on @startupideaspod. Opus 4.8 wasn't that. Dynamic workflows was. I'd rather save you the hour.
English
309
124
2.1K
539.2K
LeGOAT retweetledi
Edward Lando
Edward Lando@edwardlando·
Like many VCs, I really do believe Anthropic and OpenAI are the last startups. There is no point to building anything else. Everything has been solved or will be shortly.
English
248
22
684
223.4K
LeGOAT
LeGOAT@xGoatJames·
@bridgemindai How do you make sure all instances share context? Is there something you have to suppprr multi accounts even tho hasn’t ANT said that could get people banned
English
0
0
0
39
BridgeMind
BridgeMind@bridgemindai·
This is what software development looks like in 2026. 6 Claude Opus 4.8 agents. All in ultracode mode. All running in parallel. Today, live on stream. Not 1 agent. Not 2. A fleet. 3 Max plans powering it. If I hit the limits, I buy a 4th live. One engineer orchestrating a swarm of frontier models is the new baseline. Come watch.
GIF
English
43
7
201
11.3K
LeGOAT retweetledi
Matt Turck
Matt Turck@mattturck·
A day in the life of a VC in 2026: 9am: Board meeting. My main value-add is aggressively pushing for Anthropic/OpenAI usage in non-engineering functions. Briefly ponder how I became an SDR for foundation model companies. 1pm: Lunch with another VC. We discuss how startups can find "blue ocean" away Anthropic/OpenAI. We conclude we should probably just invest the rest of our funds directly into Anthropic/OpenAI. 3pm: Pitch meeting. Me: "Do you run on Anthropic or OpenAI?" Founder: "Both." Debating internally whether a company reselling Anthropic/OpenAI with a 10% gross margin is a good investment but hey, at least they're in the "token flow". 4:30pm: Deep due diligence. I ask Claude if it plans to build this exact startup natively in its next release. Same to ChatGPT. They both say yes. I pass on the deal. 6pm: Urgent call from a portco CTO: "We need an intro to upgrade our Anthropic tier!" I immediately agree to help them spend more of the venture dollars we just invested in them, on Anthropic. 8pm: Brainstorming next guests for the podcast. Thinking I should probably just try to get some folks from Anthropic and OpenAI.
English
54
59
968
268.9K
LeGOAT retweetledi
Viv
Viv@Vtrivedy10·
some cracked ppl (ie: @_duplessis) saying this 4.8 is actually rlly good One thing to watch: Anthropic and OpenAI moving towards “Goal APIs” - looks like /goal and Dynamic Workflows are steps to make this the default user experience Humans just want to specify a task and have it done and verified end to end. Looks like Opus 4.8 is trained especially for this behavior and the Labs are doubling down on that UX and ability
Viv tweet media
Claude@claudeai

Introducing Claude Opus 4.8: it builds on Opus 4.7 with sharper judgment, more honesty about its own progress, and the ability to work independently for longer than its predecessors. Available today at the same price.

English
11
5
97
14.6K
LeGOAT
LeGOAT@xGoatJames·
@sudoingX Do you think Hermes as a harness beats cursors own harness?
English
0
0
1
209
Sudo su
Sudo su@sudoingX·
picking cursor auth in hermes agent opens every model cursor routes to. 100+ options. composer-2.5, claude opus, gpt-5.5, the whole stack cursor's inference layer covers. sign in once with your cursor account, hermes agent shows the model picker, you pick what you want. most hermes users today juggle 5+ api keys for the various providers. one cursor subscription via hermes agentnow covers most of them. composer-2.5 is the standout because it's cursor exclusive and frontier class at a fraction of the cost. but the menu is wider than that real task i ran. asked it to check availability across some gpu hosts i'm tracking. composer worked through the queries, parsed the results, returned a clean answer. 14 minutes of agent time, 17% of the context window used, three and a half minutes on the actual reasoning. a comparable opus 4.7 run would have burned roughly ten times the cost. frontier intelligence at a fraction of the cost isn't a tagline. it's the line on the status bar after a real run.
Sudo su tweet mediaSudo su tweet mediaSudo su tweet mediaSudo su tweet media
Sudo su@sudoingX

composer 2.5 is opus 4.7 class coding at 1/10 the cost. but it was cursor only. that just changed. i just shipped cursor as a hermes agent provider tonight. PR open upstream to nousresearch/hermes-agent, available from my fork right now while it merges. what this means: composer 2.5 + hermes memory + hermes skills + cron + acp subagents + multi-platform delivery, all in one harness. cheapest frontier coding model + deepest agent runtime. neither alone gets you here. the math: - composer 2.5: $0.50 input / $2.50 output per 1M - opus 4.7: $5.00 / $25.00 (10x cost) - gpt-5.5: $5.00 / $30.00 (12x cost) - gpt-5.5 pro: $30.00 / $180.00 (70x cost) same coding benchmark band (79.8% swe-bench multilingual vs opus 4.7's 80.5%, 63.2% cursorbench v3.1 vs 61.6%) at a fraction of the budget. PR: github.com/NousResearch/h… fork: github.com/sudoingX/herme… article with full receipts drops sat ~9pm ICT.

English
5
8
87
8K
LeGOAT retweetledi
Artificial Analysis
Artificial Analysis@ArtificialAnlys·
Anthropic just launched Claude Opus 4.8, and it is the new leader on our GDPval-AA benchmark for agentic real-world work tasks Opus 4.8 scored 1890 on GDPval-AA at launch with its 'max' effort setting, +137 points from Opus 4.7 and +121 points ahead of the next-best model, GPT-5.5 xhigh. Compared head-to-head on the GDPval task set, this implies a ~67% win rate against GPT-5.5 xhigh. @AnthropicAI shared access with us ahead of the public release to benchmark this model and we’re glad to see our benchmarks referenced in today’s launch. The rest of the Artificial Analysis Intelligence Index is in progress - we’ll share final results soon!
Artificial Analysis tweet media
English
41
101
1.1K
65.8K
LeGOAT retweetledi
Claude
Claude@claudeai·
Introducing Claude Opus 4.8: it builds on Opus 4.7 with sharper judgment, more honesty about its own progress, and the ability to work independently for longer than its predecessors. Available today at the same price.
Claude tweet media
English
3.5K
8.6K
66.9K
14.4M
LeGOAT retweetledi
Tiago Forte
Tiago Forte@fortelabs·
I think the main thing AI has taught me, through all the time savings it brings, is that I’m not a very interesting person Faced with a surplus of free time, I realize I don’t really have hobbies besides content consumption I’m forced to conclude that I don’t have very deep friendships, and am not a core member of any particular community I’m not very cultured, I’m finding, and don’t have abiding interests in art or literature or history or much that isn’t directly related to my work I have a work-centric life, in other words. AI pulls back the curtain on just how impoverished such an existence is, by disabusing me of its necessity Given the freedom I’ve always said I wanted, I’m at a loss as to what to do with it, except plow myself even harder into work, thus exacerbating the lesson There’s nothing more confronting to humans than freedom
English
393
239
4.3K
404.5K
LeGOAT
LeGOAT@xGoatJames·
@gfodor What reasoning level were the subagents? IMO a good config is - xhigh orchestrates - sub agents can be low or medium - xhigh then checks the work
English
1
0
0
230
gfodor.id
gfodor.id@gfodor·
Instead of using an orchestrator, which performed terribly, I put 5.5 xhigh in a /goal to complete my plan but asked it to delegate to a subagent for code review at every milestone, and avoid proceeding until the review passes, and got *much* better results.
gfodor.id@gfodor

GPT 5.5 turned out a steaming pile overnight and wow is anyone actually good at this yet? This is starting to feel like programming again, that feeling it’s impossibly hard and painful

English
33
30
1.2K
59.3K
LeGOAT retweetledi
Disclose.tv
Disclose.tv@disclosetv·
NOW - Pope XIV says the church and Anthropic, will work together to "find the way for humanity, in this time of artificial intelligence."
English
1.2K
1.8K
16.2K
5.6M