James

222 posts

James

@jamwithai

Intelligence is never artificial when it produces compassion.

United States Katılım Ocak 2022

261 Takip Edilen27 Takipçiler

Sabitlenmiş Tweet

James@jamwithai·26 Eki

Introducing pullup.ai - a Claude "factory skill" that crawls your web app and generates a Playwright skill specific to your application that significantly speeds up QA flows Claude Code uses to debug or verify features and bug fixes. Will publish benchmark data soon

English

103

James retweetledi

Anuj@byanujpatel·2d

Great article on harness engineering. langchain.com/blog/the-anato…

English

330

30.4K

James retweetledi

Vaishnavi@_vmlops·2d

ANTHROPIC JUST DROPPED A ZERO TRUST PLAYBOOK FOR AI AGENTS and it's not theory it's architecture frontier AI compresses vulnerability-to-exploit timelines from months to hours your agents face threats traditional access controls were never built to handle: ▫️ prompt injection through external data sources ▫️ tool poisoning via MCP server metadata ▫️ memory-based privilege retention across sessions ▫️ multi-agent pivot attacks the framework breaks it into 3 tiers: Foundation, Enterprise, Advanced cdn.prod.website-files.com/6889473510b503…

English

200

1.3K

129.1K

James retweetledi

dharmesh@dharmesh·9 May

I'm with @bchesky on this one. I think the future is not about apps, but about agents. But the shift to agents doesn't necessarily mean text-forward, chat-based UIs. That makes sense for some use cases -- but not all. The future is about agents that work on your behalf, often in the background, and let you interact in ways that make sense. Sometimes, that means typing text, but others it might be a personalized UI element. UI affordances are underrated. Sometimes humans need some guidance and nudges instead of an empty prompt box. I think hybrid agentic interfaces will be the future. And it's not just about B2C. Turns out, B2B users are people too. :)

TBPN@tbpn

"I do not think a chatbot is the right interface for travel or e-commerce." - @bchesky "I think the future is not apps. The future is agents, but I don't think they're going to be text-forward. I think they're going to be really rich user interfaces." "Imagine using iMessage to do everything, when in fact every other app has a unique interface." "With e-commerce, you want a very rich user interface. It would be agentic. You can have a conversation with it, but the point is that it has to be more visual."

English

383

73K

James@jamwithai·11 Nis

@Tecinc7 @gauntlet_xyz No updates about this?

English

mad freak@Tecinc7·6 Nis

@gauntlet_xyz a total of 1.5m$ of capital supplier funds on morpho is stuck in base network - Extrafi XLend USDC (v1.1 vault) and resolv vault of eth network. never trust gauntlet. you can go with fluid as they sold their own tokens to ensure liquidity.

English

248

Gauntlet@gauntlet_xyz·31 Mar

So far, Resolv has not issued a remediation plan following its exploit. We continue to pursue all avenues for full recovery. To minimize the impact, we conducted market removal actions on the vaults below following timelocks. If we are able to realize recoveries from this incident, we expect to set up a claim contract for affected suppliers. - USDC Core on mainnet (v1): wstUSR/USDC market removed, $7.6M liquidity The following vaults will be deprecated with no new supply permitted: - USDC Frontier (v1.1): wstUSR/USDC, PT-RLP-9APR2026/USDC, and RLP/USDC markets removed, $4.3M liquidity - Resolv USDC (v1.1): RLP/USDC, USR/USDC, wstUSR/USDC markets will be removed after the 3-day timelock - Seamless USDC (v1.1): USR/USDC market removed - Extrafi XLend USDC (v1.1): USR/USDC market removed

English

42.5K

James retweetledi

Essam Sleiman@essamsleiman·8 Nis

tldr: everyone is converging on the same product shape: a general harness that takes a goal, uses tools, and does knowledge work. once every product is a harness, the next frontier is the feedback loop that improves it after deployment.

Nicholas Charriere@nichochar

x.com/i/article/2039…

English

1.1K

278.8K

James retweetledi

albs—@albfresco·6 Nis

there's a really interesting finding hiding in this picture, right? most of these tasks didn't get better over time. they got better on the first try meaning: performance was left on the table in the past system. does the team have the strength to avoid the ego hit, and just make progress, knowing most old stuff should be thrown out. interesting findings. nice writeup

English

867

James@jamwithai·12 Mar

@dzhng @morganlinton @denisyarats Curious why you say it should be split from MCP, if you would just end up recreating that part of MCP Apps. I could see political arguments for this if you want to own the standard, but I don't see the technical reason for it.

English

David@dzhng·12 Mar

@jamwithai @morganlinton @denisyarats yes, but that standard doesn't have to be tied to MCP. JSON UI should (and prob will) be split into a decoupled standard

English

169

Morgan@morganlinton·11 Mar

The cofounder and CTO of Perplexity, @denisyarats just said internally at Perplexity they’re moving away from MCPs and instead using APIs and CLIs 👀

English

329

367

5.1K

2.8M

James@jamwithai·12 Mar

@dzhng @morganlinton @denisyarats How do you think MCP Apps factors into this? Do you think that interactive embedded apps is one good use case for MCP since it is substantially more involved than a typical MCP tool?

English

179

David@dzhng·11 Mar

@morganlinton @denisyarats Yup, we are doing the same x.com/i/status/20295…

David@dzhng

x.com/i/article/2029…

English

284

64.4K

James retweetledi

Viv@Vtrivedy10·9 Mar

Harness Design Notes: Decoupling Agent Storage from Agent Compute TLDR: You can give each Agent/Subagent dedicated compute while sharing storage (repo/filesystem) to self-organize work between them. Shared Compute can be a bottleneck especially with long running code execution. Started writing up some harness design patterns over a very long flight this weekend, might make this a series if there's interest! We're on the edge of using a massive amount of compute to orchestrate agents across long horizon work Ex: for Agent Teams, an orchestrator organizes potentially many agents that fan out and do work on a project (like a large repo) For anyone who runs many agents locally, you see your CPU usage skyrocket for even moderate runs with code exec But Sandboxes to the rescue :) There's a nice pattern of shared filesystems via Volumes that all agents access while getting their own sandbox environment. The coordination happens via writing to the write place in the filesystem. And using git makes it so you can track and roll back changes over time good Harness Engineering on self-organizing agents via filesystems requires thinking about infra too. Many patterns work but you have to measure them for your work! Harness Engineering is Systems Engineering

English

379

20.1K

James retweetledi

Latent.Space@latentspacepod·3 Mar

🆕 How to Kill The Code Review latent.space/p/reviews-dead the volume and size of PRs is skyrocketing. @simonw called out StrongDM’s “Dark Factory” last month: no human code, but *also* no human review (!?) in this week’s guest post, @ankitxg makes a 5 step layered playbook for how this can come true.

English

100

769

605.4K

James retweetledi

Matt Stockton@mstockton·3 Mar

An interesting aspect of these models and foundation model companies: - Their internal teams know *a ton* about how to best use these models - They are publishing things (e.g. skills) that let you essentially leverage that knowledge for free - You should never really 'hand craft' a context at this point. It's much better for you to 'find the existing' bootstrapped context (or context generator) and use that instead, or have the model 'prompt it' out of you (e.g. AskUserQuestion tool all day) - The skill-creator skill is a perfect example of this. It's essentially leading-edge knowledge of people *at foundation labs knowing what works* just available to you, for free - It's kind of weird, but there's actually just incredible 'alpha' by finding existing skills that work versus trying to do your own thing. - With the right set of skills loaded, I would make a bet that a large proportion (maybe a majority) of white-collar work could be accomplished by purely typing the key '/' followed by a word into a terminal, over and over again - It still helps to have good taste, know what good looks like, take incremental approaches, and just generally be curious -- but the shape of what it even means 'to work' has totally shifted - and it's going to continue to shift even faster than it is now. It's certainly weird, but we are here.

Lance Martin@RLanceMartin

check out the updated skill-creator. i esp like built-in support for test generation (e.g., to measure + optimize tricky things like skill trigger rate). available in Claude Code as plugin, Claude.ai, + Cowork.

English

551

88.6K

James retweetledi

Thariq@trq212·27 Şub

Prompt caching can be surprisingly easy to regress. Read more on why prompt caching is so important for agents and how to design your agent around it here: x.com/trq212/status/…

Thariq@trq212

x.com/i/article/2024…

English

503

160.8K

James retweetledi

Simon Willison@simonw·1 Mar

New chapter of my Agentic Engineering Patterns guide. This one is about having coding agents build custom interactive and animated explanations to help fight back against cognitive debt simonwillison.net/guides/agentic…

English

1.2K

84K

James retweetledi

Massimo@Rainmaker1973·19 Şub

This turtle behavior, often called "claw fluttering", is a courtship ritual where a male turtle rapidly vibrates or waves his long front claws (or "jazz hands") near a female's face to attract her.

English

202

775

11K

792.6K

James retweetledi

Thariq@trq212·30 Oca

x.com/i/article/2016…

ZXX

127

298

862.4K

James retweetledi

Harrison Chase@hwchase17·28 Oca

🧵 Context Management for DeepAgents We wrote an in depth blog on how we do context management in DeepAgents, our open source agent harness

Mason Daugherty@masondrxy

x.com/i/article/2015…

English

352

53.8K

James retweetledi

Thariq@trq212·23 Oca

x.com/i/article/2014…

ZXX

323

431

5.9K

2.2M

James retweetledi

kitze · supermac.io 🐦‍🔥@thekitze·29 Ara

vibe coding in 2026

English

183

2.2K

208.6K

James retweetledi

Connor Davis@connordavis_ai·26 Ara

Holy shit… this paper might be the most important shift in how we use LLMs this entire year. “Large Causal Models from Large Language Models.” It shows you can grow full causal models directly out of an LLM not approximations, not vibes actual causal graphs, counterfactuals, interventions, and constraint-checked structures. And the way they do it is wild: Instead of training a specialized causal model, they interrogate the LLM like a scientist: → extract a candidate causal graph from text → ask the model to check conditional independencies → detect contradictions → revise the structure → test counterfactuals and interventional predictions → iterate until the causal model stabilizes The result is something we’ve never had before: a causal system built inside the LLM using its own latent world knowledge. Across benchmarks synthetic, real-world, messy domains these LCMs beat classical causal discovery methods because they pull from the LLM’s massive prior knowledge instead of just local correlations. And the counterfactual reasoning? Shockingly strong. The model can answer “what if” questions that standard algorithms completely fail on, simply because it already “knows” things about the world those algorithms can’t infer from data alone. This paper hints at a future where LLMs aren’t just pattern machines. They become causal engines systems that form, test, and refine structural explanations of reality. If this scales, every field that relies on causal inference economics, medicine, policy, science is about to get rewritten. LLMs won’t just tell you what happens. They’ll tell you why.