Sean Ward

1.1K posts

Sean Ward

Sean Ward

@DNAEngineer

CEO & Co-founder @iGent_AI. Previously Founder @synthace, Relatable, Scale DX

London Katılım Mart 2011
895 Takip Edilen2.1K Takipçiler
Sean Ward
Sean Ward@DNAEngineer·
@ctjlewis It is more complicated than that however: most providers brainfried their models with RL to go prompt -> outcome. Anthropic models can maintain competing hypothesis and tension for far longer, which allows thesis, investigation, correction, and autonomy.
English
0
0
0
8
Lewis 🇺🇸
Lewis 🇺🇸@ctjlewis·
@DNAEngineer I do absolutely believe those to be synonymous statements yeah. I believe reasoning is just intermediate computation and that’s what I wanted to present there. But this is controversial technically, though not really for any good reason.
English
1
0
0
23
Lewis 🇺🇸
Lewis 🇺🇸@ctjlewis·
I’m not on either team, I care about the best output, so I’ve always been using Opus for codegen, because it is better at mentally simulating programs than GPT or Codex. That has not changed.
aphrodiziac@affrodiziac_

@ctjlewis i am just interested, are you completly neutral ?, i mean are you on team Anthropic or team Open AI? or on the team "whoever makes it"?

English
2
0
19
2.4K
Sean Ward
Sean Ward@DNAEngineer·
@ctjlewis In its simplest terms: Anthropic models can reason. Hello from the days of Spellcraft btw.
English
1
0
2
16
Lewis 🇺🇸
Lewis 🇺🇸@ctjlewis·
Basically I think Anthropic is generally less of an asshole on a personal level, and also independently just so happens to be the best at simulating programs in terms of output since 2024. But that was not always the case on the second point, mostly was on the first.
English
2
0
15
736
Jonas Templestein
Jonas Templestein@jonas·
@dbreunig @karpathy @grok what other software engineering challenges exist that are extremely well specified in terms of conformance tests? what other impressive thing could agents just build now?
English
2
0
0
170
Drew Breunig
Drew Breunig@dbreunig·
Another example of “Spec Driven Development” along the lines of whenwords and just-bash, when conformance tests are the key driver. @karpathy
Drew Breunig tweet media
English
8
4
84
6.2K
Sean Ward retweetledi
Claude
Claude@claudeai·
Introducing Claude Opus 4.5: the best model in the world for coding, agents, and computer use. Opus 4.5 is a step forward in what AI systems can do, and a preview of larger changes to how work gets done.
Claude tweet media
English
1.1K
2.4K
19.2K
7.8M
Sean Ward
Sean Ward@DNAEngineer·
@sqs You are assuming those are at odds with each other
English
0
0
0
72
Quinn Slack
Quinn Slack@sqs·
I sense a coming schism in coding agents. “bicycle for the mind” human-in-the-loop vs. “hours of autonomy” human-out-of-the-loop
English
23
3
78
5.9K
Unit Accord
Unit Accord@unit_accord·
@Yuchenj_UW Does it matter how long it runs or what it achieves. Let's see one product it made.
English
1
0
0
76
Yuchen Jin
Yuchen Jin@Yuchenj_UW·
Claude Sonnet 4.5 runs autonomously for 30+ hours of coding?! The record for GPT-5-Codex was just 7 hours. What’s Anthropic’s secret sauce?
Yuchen Jin tweet media
English
200
176
2.8K
360.2K
Sean Ward
Sean Ward@DNAEngineer·
@_xjdr It natively wants you to be absolutely right (it is Sonnet after all). However, unlike 4.0, it will listen to style and tone instructions on a long horizon basis.
English
0
0
3
960
xjdr
xjdr@_xjdr·
sonnet 4.5 still thinks i am absolutely right . disappointment beyond measure . (no opinions on the model itself yet, i just started testing it)
English
32
10
563
36.2K
Sean Ward
Sean Ward@DNAEngineer·
@Yuchenj_UW Verifiable engineering tasks, mostly at the scale of entire features, primarily in compiled languages like go or rust, where it can validate the outputs and hill climb. One such public example: igent.ai/insights/produ… with repo of results and all commit/session history.
English
0
2
24
917
Yuchen Jin
Yuchen Jin@Yuchenj_UW·
@DNAEngineer Tell us more about the details! It’d be great if you could share: - what was the task or prompt you gave it for coding - result: the code it generated, and whether it actually runs, any bugs?
English
1
1
15
3.6K
Sean Ward
Sean Ward@DNAEngineer·
The world is rapidly bifurcating between those who have experienced the superhuman capabilities of even current generation models, used properly, and those stuck in the capabilities of the past.
iGent AI@iGent_AI

We're excited to share that our agent, Maestro, drafted solutions to all 12 problems from ICPC 2025 World Finals in ~2 hours - using current models, no human involvement, no internet access. We deeply respect the human teams' extraordinary dedication. Note: no official validation

English
0
0
0
577
Sean Ward
Sean Ward@DNAEngineer·
@SimonInmania @matthewclifford Yes, task length is in Replit’s case measured in agent runtime, which can slightly proxy for the metric that really matters: how many man months (or years) of work the agent performs. igent.ai/insights/produ… is several man years in the course of ~70 hours or agent time however.
English
0
0
1
61
Simon Inman
Simon Inman@SimonInmania·
@matthewclifford It's not clear that Amjad is describing human-measured task length, right? He's just saying that's how long the agent runs for, which doesn't seem like a particularly interesting measure?
English
3
0
8
1.3K
Sean Ward
Sean Ward@DNAEngineer·
@simonw Strong recommendation: it takes a very different promoting strategy to maintain attendance across that context: basic system prompt + context + request alone won’t cut it.
English
1
0
1
857
Sean Ward
Sean Ward@DNAEngineer·
@platonovadim @_xjdr Yes, tests are central including leveraging the existing redis client libraries and benchmarks. The entire commit history also shows every round of the AI workflow during building.
English
1
0
3
210
Vadim
Vadim@platonovadim·
@DNAEngineer @_xjdr Maestro looks interesting. Do you have a write up with more details on how the rewrite was done? Did you look at Redis source code or design? Was there a conformance test suite?
English
1
0
2
266
xjdr
xjdr@_xjdr·
Tested the NSA code over night and after a few tweaks it trains. Wow, GPT5 and Opus4.1 wrote a 100% AI generated (human art directed) NSA implementation. I would not have guessed that was possible
English
16
22
433
58.8K
Sean Ward
Sean Ward@DNAEngineer·
It has become clear there is a massive performance and productivity delta growing between engineers who understand and embrace AI, with appropriate tooling and critical analysis, and those who have have remained in the co-pilot era. Never before has it been so possible for those who know what they are doing, to build so much.
iGent AI@iGent_AI

Tired of toy AI demos that fizzle in production? iGentAI built Ferrous: A Rust Redis-compatible server outperforming Valkey. 35KLOC, 100% test passing, beats benchmarks. Zero human code. Built in 70 hours of part-time direction. Toys vs. tools—here's the proof.

English
1
0
0
701
Sean Ward retweetledi
iGent AI
iGent AI@iGent_AI·
Our VibeCodeBench evaluations affirm what @Anthropic just announced: Claude Sonnet 4 excels at autonomous multi-feature development. We've seen codebase navigation errors drop from 20% to near zero and strategic refactoring that saves ~500k tokens on multi stage, complex tasks. Proud to power Maestro with this breakthrough.
Anthropic@AnthropicAI

Introducing the next generation: Claude Opus 4 and Claude Sonnet 4. Claude Opus 4 is our most powerful model yet, and the world’s best coding model. Claude Sonnet 4 is a significant upgrade from its predecessor, delivering superior coding and reasoning.

English
1
3
7
553
Sean Ward retweetledi
Louis Knight-Webb
Louis Knight-Webb@tokengobbler·
It's always great hosting @AITinkerers London meetups right after a new model drops... Huge thanks to @rebecca_harbeck from @AnthropicAI, as well as the @iGent_AI team @MSzummer and @samshapley for giving impromptu talks with tons of learnings from early access Claude Sonnet 3.7. We also got to see @HarryCoppock from the @AISecurityInst live demoing 3.7 hacking into a docker container 🫢 And as always, we had some fantastic product, behind-the-scenes and benchmark beating agent talks from Emma Burrows, @moeadham and Sergei Petrov. Huge thanks to team @localglobevc and @ferdisigona for making it happen!
Louis Knight-Webb tweet mediaLouis Knight-Webb tweet mediaLouis Knight-Webb tweet mediaLouis Knight-Webb tweet media
English
9
3
14
885
Sean Ward
Sean Ward@DNAEngineer·
At @iGent_AI, we’ve found @AnthropicAI new Sonnet 3.7 to be quite the powerhouse. Everything from debugging multi language distributed systems, to comprehending and updating legacy codebases, to rapid prototypes or POCS of new technologies. Agentic SWE is now here.
iGent AI@iGent_AI

"Agency > Intelligence" @karpathy nailed it, and after 18 months building Maestro, we agree. The real AI leap isn’t just smarts—it’s agency: the ability to act independently, turning assistants into partners.

English
1
1
2
250
Sean Ward
Sean Ward@DNAEngineer·
@simonw One of the top use cases is chaining differentiated capabilities. For example, GPT4 vision to drive a scene understanding, then shifting to using that text in a different model. Another is cross LLM critiques such as using one model to supervise the outputs of another
English
1
0
2
1.7K
Simon Willison
Simon Willison@simonw·
What are the simplest useful examples you've seen of prompt chaining, where the output of one LLM call is used as input to another? I have seen plenty of theoretical examples, but I'm looking for some concrete results
English
97
15
261
88.6K