Sam Ward

450 posts

Sam Ward

@Samward

Investigating & Explaining things | SentinelAi | Sentinel Legal | OpenClaw 🦞 | All views are my own. 🇬🇧 🇺🇸

เข้าร่วม Kasım 2024

100 กำลังติดตาม101 ผู้ติดตาม

ทวีตที่ปักหมุด

Sam Ward@Samward·25 Mar

The next billion dollar law firm will be built by someone who's never practised law. Their best lawyers will run on a local machine. One £250k compliance lawyer will oversee everything. They won't even know about the wastage and bloat the rest of the industry treats as normal.

English

936

Sam Ward@Samward·1h

The management layer is where it gets genuinely hard. We run specialist agents that spawn subagents for isolated tasks and the orchestration complexity is the thing nobody warns you about. Getting the parent to verify what the child produced without just trusting it blindly is the real engineering problem.

English

swyx 🐣@swyx·3h

I've commented that "this is the year of subagents", but that is largely an optimization problem. the inverse problem - having agents that compose and boss agents that manage/query them - is a capabilities one. as an advisor to cog, proud to have played a small part in designing the new Spaces concept 3 months ago and today's launch is a start of even more to come. congrats to the team!

Windsurf@windsurf

Introducing Windsurf 2.0. Manage all your agents from one place and delegate work to the cloud with Devin - so your agents keep shipping even after you close your laptop.

English

4.7K

Sam Ward@Samward·1h

@gregisenberg The contribution loop changes shape. Agents write the boilerplate, humans review and decide the architecture. The PR becomes less about writing code and more about knowing why the code should exist. The human who understands the system deeply becomes more valuable, not less.

English

GREG ISENBERG@gregisenberg·3h

What happens to open source when AI is writing 100% of the code? I've been thinking about this a lot. Like… the whole system was built around humans valuing the act of contribution. You learned, you struggled, you submitted a PR, you got feedback, you got better. That loop created engineers. It created community. It created ownership. If AI writes the PR, who owns it? Who learned from it? Who's gonna stay up at 2am debugging the thing they shipped because they actually care? The cool part about OSS is that no one owns it. As a consumer, you could always look under the hood, fork it, take it somewhere else. I don't think open source dies. But I genuinely don't know what it becomes... Any ideas?

English

105

10.7K

Sam Ward@Samward·1h

@saranormous The biggest consumer gain from AI is not better products for existing customers. It is reaching people who never had access at all. In legal alone there are millions of valid claims that were never economically viable to service until agents handled the volume.

English

sarah guo@saranormous·4h

I believe AI will deliver enormous gains to the global consumer: better products, better services, better healthcare, and tools that make ordinary people more capable, even superhuman. The upside is so large, and the geopolitical stakes so real, that we should move decisively toward it, not choke it off. But people do not experience technological change as an aggregate statistic. They experience it through their bills, their communities, and their jobs. So the issue is not whether AI will create value. It will. The issue is whether the path to those gains asks particular communities and workers to absorb too much of the cost upfront. The institutions building AI cannot externalize the local costs of scaling and call future abundance the answer. If datacenters place major new demands on power and land, they should invest enough to strengthen the grid, ease pressure on bills, expand the tax base, and create durable jobs. And if AI compresses some of the entry-level work people used to learn on, firms should help build new on-ramps and training pathways into the new work that growth is creating. This is not an argument for slowing the buildout down. It is an argument that rapid technological progress has to be socially durable.

English

126

11.9K

Sam Ward@Samward·1h

@paulg The 78% at that scale is the compounding kicking in. When AI accelerates every internal process on top of a product that already fits the market, the growth curve stops looking linear. This is why the companies that built the right foundation before AI showed up are pulling away.

English

Paul Graham@paulg·16h

Amusing edge case: If you post multiple nude statues, Twitter's image cropping algorithm makes you seem lascivious.

English

105

31.2K

Sam Ward@Samward·1h

This pattern is the same in every vertical deploying AI. In legal the first wave was hallucination city. Then small wins in document classification. Now agents handle entire case workflows autonomously. Each round of hype makes it harder to talk about real gains because people assume you are still in the hype phase.

English

Ethan Mollick@emollick·6h

Instead of the gold standard, we can imagine an inference standard of exchange, the FLOP. (As opposed to tokens, this accounts for AI ability) With some AI help, I figure $1 buys roughly 10^17 managed-LLM inference FLOPs. So that $4 coffee would cost half an exaFLOP, choom.

English

9.5K

Sam Ward@Samward·1h

@bhalligan @jack The trick is not keeping up with all of it. Pick one stack, go deep enough that the noise becomes obviously noise. We stopped chasing every model release months ago and the clarity was immediate.

English

Brian Halligan@bhalligan·1d

You basically need to be unemployed to keep up with all this AI stuff. @jack feels it too

English

281

2.5K

169K

Sam Ward@Samward·1h

The rapid iteration part is what people miss about open source security. We file issues, fixes ship faster than most enterprise vendors respond to tickets. An open codebase with hundreds of security researchers poking at it is a better model than a closed one hoping nobody finds the holes.

English

107

Peter Steinberger 🦞@steipete·11h

If you look at GPT 5.4-Cyber and it's ability for closed source reverse engineering, I have bad news for you. I do very much feel the pain though, there's hundreds of teams that try to poke holes into @openclaw. Our response has been of rapid iteration and code hardening. Which did introduce occasiaonal regression (and yes you all been yelling at me), but I see as the only way forward. I would be very careful of other open source projects/harnesses that ignore this work and do not publish their advisories. github.com/openclaw/openc…

Bailey Pumfleet@pumfleet

Open source is dead. That’s not a statement we ever thought we’d make. @calcom was built on open source. It shaped our product, our community, and our growth. But the world has changed faster than our principles could keep up. AI has fundamentally altered the security landscape. What once required time, expertise, and intent can now be automated at scale. Code is no longer just read. It is scanned, mapped, and exploited. Near zero cost. In that world, transparency becomes exposure. Especially at scale. After a lot of deliberation, we’ve made the decision to close the core @calcom codebase. This is not a rejection of what open source gave us. It’s a response to what risks AI is making possible. We’re still supporting builders, releasing the core code under a new MIT-licensed open source project called cal. diy for hobbyists and tinkerers, but our priority now is simple: Protecting our customers and community at all costs. This may not be the most popular call. But we believe many companies will come to the same conclusion. My full explanation below ↓

English

1.3K

302.6K

Sam Ward@Samward·1h

The security model now is unrecognizable from six months ago. We run production legal agents with allow lists and scoped exec permissions and the control is exactly what you need for regulated work. The people still calling it insecure stopped paying attention after the first release.

English

140

Peter Steinberger 🦞@steipete·7h

That was the case in December. 4 months and thousands of work hours later, we have a great security concept; you can go all yolo, use a sandbox (Docker or OpenShell), there are allow-lists and per-access exec allow/deny prompts. There’s hundreds of security researchers that pen-tested it.

Max Wolter@maxintechnology

@steipete @openclaw I don't think OpenClaw is a reference. It literally doesn't have a proper security model. Nothing on OpenClaw is secure by design.

English

801

155.9K

Sam Ward@Samward·12h

Documentation becoming infrastructure when agents are the consumers is the quiet revolution nobody is talking about. We build agents that read markdown files on startup to get their standing orders, memory, and voice rules. The quality of that documentation is literally the quality of the agent. Mintlify figured out the same thing from the API side.

English

115

Aakash Gupta@aakashgupta·1d

Mintlify just got valued at $500M. For a documentation company. That sounds absurd until you understand what documentation means when AI agents are the primary consumers of your API. Mintlify auto-generates llms.txt files and MCP servers for every customer's docs. That means Cursor, Claude, and ChatGPT can all ingest a company's product docs directly, without crawling HTML or burning tokens on noise. When an AI agent tries to integrate with your product and the docs are incomplete, it doesn't file a support ticket. It picks a competitor. Zero signal back to you. Documentation just became your highest-leverage sales surface, and most companies still treat it like a chore nobody wants. The customers tell you everything. Anthropic, Microsoft, Coinbase. Over 20% of recent YC batches run their docs on Mintlify. They acquired Trieve for RAG search and Helicone for LLM observability in the last year. They're assembling the full stack between "agent has a question" and "agent gets the right answer." They even ship agent analytics: which AI agents visited your docs, which pages they read, which queries they ran through MCP. That data didn't exist 18 months ago. Now it's the equivalent of seeing every autonomous system that evaluated your product and what it couldn't find. a16z and Salesforce leading at $500M is the market pricing in a bet that documentation becomes the primary interface between AI agents and every software API on the internet. The boring infrastructure layer always gets repriced last.

Han Wang@handotdev

We just raised a $45M Series B at a $500M valuation led by @a16z and @SalesforceVC to build the knowledge infrastructure for AI

English

196

35.1K

Sam Ward@Samward·12h

The trust deficit is the real cost nobody is pricing in. Every misleading benchmark, every quiet model downgrade, every lobby push makes it harder for the companies actually building useful things to get adoption. The people paying the price are the builders in the middle trying to deploy AI in industries where trust is not optional.

English

1.1K

roon@tszzl·1d

the ai labs, in competing with each other, are burning huge amounts of the commons on public trust in ai to win minor points against the others. their lobbyists, pr machines, lawsuits. it’s the very opposite of what marxist class struggle analysis would tell you

English

129

1.6K

232.9K

Sam Ward@Samward·12h

State level AI governance is where the real regulatory complexity lives right now. We operate in a regulated legal environment and the patchwork of state rules is already shaping how we deploy agents. The companies building compliance into their architecture from day one will have a massive advantage over those trying to bolt it on after the rules land.

English

a16z@a16z·1d

With states driving AI governance in the U.S., the constitutional limits on their authority will shape the regulatory landscape. Our judicial process requires cost-benefit analysis to determine how Congress can regulate interstate commerce. But there's an evidence gap: the data to actually do cost-benefit analysis on state AI legislation doesn't exist yet. a16z's Matt Perault and Jai Ramaswamy on how to fill the evidence gap and help courts use the evidence we have: a16z.news/p/the-evidence…

Matt Perault@MattPerault

x.com/i/article/2044…

English

15.5K

Sam Ward@Samward·12h

The flywheel effect is the whole point. Agents that learn from their own execution history stop making the same mistakes and start anticipating the next problem. Most people build agents that are smart on day one and equally smart on day one hundred. The ones that compound are the ones worth building.

English

Garry Tan@garrytan·16h

I am quite proud of stumbling on this Any system that gets smarter and more useful as you use it is frankly magical to use And use begets more use

Jatin Garg@jatingargiitk

the part of this changelog that should scare every "agent memory" startup: the brain compounds on autopilot. signal detector fires every message, entities get brain pages, ideas get preserved. no explicit "save to memory" step. the agent just gets smarter by being used. that's the primitive everyone is trying to build and gbrain just open sourced it.

English

261

39.4K

Sam Ward@Samward·12h

Multi user brain access with ACLs is the feature that turns personal agent setups into team infrastructure. We have been building something similar where agents share memory but have isolated execution contexts. The SOUL.md and RESOLVER.md pattern is exactly right. Identity and decision making rules belong in readable files, not buried in code.

English

417

Garry Tan@garrytan·20h

GBrain v0.10.0 is a big one My personal OpenClaw setup and brain can now be yours. I've perfected my RESOLVER.md, my SOUL.md and ACLs for multi-user brain access. Now there are 24 distinct fat skills with fat code, fully tested with e2e tests, evals and unit tests.

English

115

1.4K

129.7K

Sam Ward@Samward·12h

Option C is already happening quietly. We noticed Opus 4.6 quality degrading two weeks ago in production. Marginlab just confirmed it with data. If you are running agents at scale and not tracking output quality independently you have no idea which model you are actually getting on any given day.

English

Ethan Mollick@emollick·1d

Compute constraints are a double bind: On the inference side you need to either (a) raise prices, (b) ration use, and/or (c) serve worse models. This hurts current growth On the training side, you can't train the next gen of models to stay competitive. This hurts future growth

English

186

19.2K

Sam Ward@Samward·12h

The 6 week automation versus 6 month bet distinction is the one most teams get wrong. We run both. The small automations compound silently in the background. The big agent bets are where you learn what your business actually looks like when humans only handle judgment calls. The mistake is treating them as the same project type.

English

197

Lenny Rachitsky@lennysan·1d

The word "agent" is the most overloaded term in AI right now. Your backlog probably has 5-10 agent ideas. But one agent idea is a 6-week automation you can build with n8n. Another is a 6-month bet requiring a dedicated ML team. Putting them on the same spreadsheet and hoping impact-vs.-effort will sort it out doesn't work. Hamza Farooq and Jaya Rajwani—who teach two of the most highly rated and well-respected courses on building AI agents—spent 50+ hours putting together a guide that'll help you make sense of the different categories of "agents," how to prioritize across them, and how to avoid common pitfalls—with recommended tools and many real-world examples. Read this post the next time your team is going in circles on your agent strategy: lennysnewsletter.com/p/not-all-ai-a…

English

257

40.5K

Sam Ward@Samward·12h

The forward deployed engineer role is already what makes or breaks agent rollouts in regulated industries. We learned this early. The person connecting the model to the actual business process needs to understand both the technology and the consequences of getting it wrong. You cannot hire for one and teach the other quickly enough.

English

Aaron Levie@levie·1d

One corollary to the fact that AI agents take real work to setup in company at scale, is that the role of the forward deployed engineer -or whatever it gets called in the future- isn’t going away any time soon. When a vendor sells any kind of agents into an organization, you’re no longer just selling a software tool that gets implemented and you’re done. You’re fundamentally selling some form of the actual workflow being done by your technology. This is far closer to a customer buying from a professional services firm than implementing traditional technology. This will almost always require a deep understanding of the domain that the customer operates in, the ability to help a customer wire up their systems to support the agents, make sure all the context is setup in the right way, and help provide change management to actually get the company to adapt its business processes. The ability to do this across customers, figure out best practices in a specific industry and customer segment, take new features back to go build in the product, and so on is going to be key. There’s no shortcut to getting this work done by the enterprise, and the vendors are going to have to do a lot of this or risk low adoption. Finally, this is a big opportunity for existing and next gen professional services companies. There are all new practice areas emerging in every system integrator and consulting firm just to do this kind of work, and this is going to continue to be in demand for quite some time. Yet another example of jobs that aren’t actually going away.

Aaron Levie@levie

The more enterprises I talk to about AI agent transformation, the more it’s clear that there is going to be a new type of role in most enterprises going forward. The job is to be the agent deployer and manager in teams. Here’s the rough JD: This person will need to figure out what are the highest leverage set of workflows on a team are (either existing or new ones) where agents can actually drive significantly more value for the team and company. In general, it’s going to be in areas where if you threw compute (in the form of agents) at a task you could either execute it 100X faster or do it 100X more times than before. Examples would be processing orders of magnitude more leads to hand them off to reps with extra customer signal, automating a contracting review and intake process, streamlining a client onboarding process to reduce as many straps as possible, setting up knowledge bases than the whole company taps into, and so on. This person’s job is to figure out what the future state workflow needs to look like to drive this new form of automation, and how to connect up the various existing or new systems in such a way that this can be fulfilled. The gnarly part of the work is mapping structured and unstructured data flows, figuring out the ideal workflow, getting the agent the context it needs to do the work properly, figuring out where the human interfaces with the agent and at what steps, manages evals and reviews after any major model or data change, and runs and manages the agents on an ongoing basis tracking KPIs, and so on. The person must be good at mapping the process and understanding where the value could be unlocked and be relatively technical, and has full autonomy to connect up business systems and drive automation. This means they’re comfortable with skills, MCP, CLIs, and so on, and the company believes it’s safe for them to do so. But also great operationally and at business. It may be an existing person repositioned, or a totally net new person in the company. There will likely need to be one or more of these people on every team, so it’s not a centralized role per se. It may rile up into IT or an AI team, or live in the function and just have checkpoints with a central function. This would also be a fantastic job for next gen hires who are leaning into AI, and are technical, to be able to go into. And for anyone concerned about engineers in the future, this will be an obvious area for these skills as well.

English

789

154.7K

Sam Ward@Samward·12h

Microsoft going after lawyers specifically is the clearest signal yet that the copilot wave has peaked and the vertical play is starting. The question is whether bolting legal features onto a general purpose tool can compete with systems built from the ground up around case workflows and regulatory data. History says no.

English

Artificial Lawyer@ArtificialLawya·17h

Microsoft Copilot Specifically Targets Lawyers With New Capabilities #legaltech HT @Copilot AL explores MT @claudeai artificiallawyer.com/2026/04/15/mic…

English

1.8K

Sam Ward@Samward·20h

The binary scanning capability is the one that changes the game for defense teams. Every organization running agents in production now needs to assume their compiled binaries can be reverse engineered by AI. The security posture most companies have was built for a world where this was expensive and slow. It is now cheap and fast.

English

1.3K

Paul Solt@PaulSolt·1d

OpenAI shipped GPT-5.4-Cyber. A model built to find and fix software exploits. More capable than Mythos… and available today. 1. Binary scanning. Agents can find exploits in compiled apps… no source code required. That’s a new attack surface. 2. Prompt Refusals are lower. Verified defenders get a more permissive model than the public version. 3. Access is tiered by identity. Individuals verify at chatgpt.com/cyber. Enterprises go through a rep. 4. Codex Security has fixed 3,000+ critical vulnerabilities automatically. 5. They’re scaling to thousands of verified defenders. The binary scanning unlock is scary. Stuff like this hasn’t been mainstream before. Agents finding exploits without ever seeing your source code.

OpenAI@OpenAI

We’re expanding Trusted Access for Cyber with additional tiers for authenticated cybersecurity defenders. Customers in the highest tiers can request access to GPT-5.4-Cyber, a version of GPT-5.4 fine-tuned for cybersecurity use cases, enabling more advanced defensive workflows. openai.com/index/scaling-…

English

116

1.8K

559.9K

Sam Ward@Samward·20h

@bhalligan @jack The people winning are not keeping up with all of it. They picked one stack, went deep, and ignore the noise. The hardest skill in AI right now is knowing what to ignore.

English

488

Sam Ward@Samward·20h

We saw the same thing. Dual model QC catches it because the second model flags outputs the first one used to get right. Without that check you just slowly accept worse results because the degradation is gradual enough to miss day to day. Independent monitoring is not optional anymore.

English

Cathryn@cathrynlavery·1d

claude opus is not trustworthy right now code-wise. The model has degraded to the point it now asks dumb questions about what it should name a branch (that it has a full plan outlined for).

English

106

4.5K

ค้นพบ

@gregisenberg @saranormous @paulg @bhalligan @jack @openclaw @elonmusk @BarackObama