James

222 posts

James

James

@jamwithai

Intelligence is never artificial when it produces compassion.

United States Katılım Ocak 2022
261 Takip Edilen27 Takipçiler
Sabitlenmiş Tweet
James
James@jamwithai·
Introducing pullup.ai - a Claude "factory skill" that crawls your web app and generates a Playwright skill specific to your application that significantly speeds up QA flows Claude Code uses to debug or verify features and bug fixes. Will publish benchmark data soon
English
1
0
1
103
James retweetledi
Vaishnavi
Vaishnavi@_vmlops·
ANTHROPIC JUST DROPPED A ZERO TRUST PLAYBOOK FOR AI AGENTS and it's not theory it's architecture frontier AI compresses vulnerability-to-exploit timelines from months to hours your agents face threats traditional access controls were never built to handle: ▫️ prompt injection through external data sources ▫️ tool poisoning via MCP server metadata ▫️ memory-based privilege retention across sessions ▫️ multi-agent pivot attacks the framework breaks it into 3 tiers: Foundation, Enterprise, Advanced cdn.prod.website-files.com/6889473510b503…
Vaishnavi tweet media
English
41
200
1.3K
129.1K
James retweetledi
dharmesh
dharmesh@dharmesh·
I'm with @bchesky on this one. I think the future is not about apps, but about agents. But the shift to agents doesn't necessarily mean text-forward, chat-based UIs. That makes sense for some use cases -- but not all. The future is about agents that work on your behalf, often in the background, and let you interact in ways that make sense. Sometimes, that means typing text, but others it might be a personalized UI element. UI affordances are underrated. Sometimes humans need some guidance and nudges instead of an empty prompt box. I think hybrid agentic interfaces will be the future. And it's not just about B2C. Turns out, B2B users are people too. :)
TBPN@tbpn

"I do not think a chatbot is the right interface for travel or e-commerce." - @bchesky "I think the future is not apps. The future is agents, but I don't think they're going to be text-forward. I think they're going to be really rich user interfaces." "Imagine using iMessage to do everything, when in fact every other app has a unique interface." "With e-commerce, you want a very rich user interface. It would be agentic. You can have a conversation with it, but the point is that it has to be more visual."

English
58
28
383
73K
mad freak
mad freak@Tecinc7·
@gauntlet_xyz a total of 1.5m$ of capital supplier funds on morpho is stuck in base network - Extrafi XLend USDC (v1.1 vault) and resolv vault of eth network. never trust gauntlet. you can go with fluid as they sold their own tokens to ensure liquidity.
English
1
0
1
248
Gauntlet
Gauntlet@gauntlet_xyz·
So far, Resolv has not issued a remediation plan following its exploit. We continue to pursue all avenues for full recovery. To minimize the impact, we conducted market removal actions on the vaults below following timelocks. If we are able to realize recoveries from this incident, we expect to set up a claim contract for affected suppliers. - USDC Core on mainnet (v1): wstUSR/USDC market removed, $7.6M liquidity The following vaults will be deprecated with no new supply permitted: - USDC Frontier (v1.1): wstUSR/USDC, PT-RLP-9APR2026/USDC, and RLP/USDC markets removed, $4.3M liquidity - Resolv USDC (v1.1): RLP/USDC, USR/USDC, wstUSR/USDC markets will be removed after the 3-day timelock - Seamless USDC (v1.1): USR/USDC market removed - Extrafi XLend USDC (v1.1): USR/USDC market removed
English
43
4
50
42.5K
James retweetledi
Essam Sleiman
Essam Sleiman@essamsleiman·
tldr: everyone is converging on the same product shape: a general harness that takes a goal, uses tools, and does knowledge work. once every product is a harness, the next frontier is the feedback loop that improves it after deployment.
Nicholas Charriere@nichochar

x.com/i/article/2039…

English
37
66
1.1K
278.8K
James retweetledi
albs—
albs—@albfresco·
there's a really interesting finding hiding in this picture, right? most of these tasks didn't get better over time. they got better on the first try meaning: performance was left on the table in the past system. does the team have the strength to avoid the ego hit, and just make progress, knowing most old stuff should be thrown out. interesting findings. nice writeup
albs— tweet media
English
1
1
3
867
James
James@jamwithai·
@dzhng @morganlinton @denisyarats Curious why you say it should be split from MCP, if you would just end up recreating that part of MCP Apps. I could see political arguments for this if you want to own the standard, but I don't see the technical reason for it.
English
0
0
0
25
Morgan
Morgan@morganlinton·
The cofounder and CTO of Perplexity, @denisyarats just said internally at Perplexity they’re moving away from MCPs and instead using APIs and CLIs 👀
Morgan tweet media
English
329
367
5.1K
2.8M
James
James@jamwithai·
@dzhng @morganlinton @denisyarats How do you think MCP Apps factors into this? Do you think that interactive embedded apps is one good use case for MCP since it is substantially more involved than a typical MCP tool?
English
1
0
0
179
James retweetledi
Viv
Viv@Vtrivedy10·
Harness Design Notes: Decoupling Agent Storage from Agent Compute TLDR: You can give each Agent/Subagent dedicated compute while sharing storage (repo/filesystem) to self-organize work between them. Shared Compute can be a bottleneck especially with long running code execution. Started writing up some harness design patterns over a very long flight this weekend, might make this a series if there's interest! We're on the edge of using a massive amount of compute to orchestrate agents across long horizon work Ex: for Agent Teams, an orchestrator organizes potentially many agents that fan out and do work on a project (like a large repo) For anyone who runs many agents locally, you see your CPU usage skyrocket for even moderate runs with code exec But Sandboxes to the rescue :) There's a nice pattern of shared filesystems via Volumes that all agents access while getting their own sandbox environment. The coordination happens via writing to the write place in the filesystem. And using git makes it so you can track and roll back changes over time good Harness Engineering on self-organizing agents via filesystems requires thinking about infra too. Many patterns work but you have to measure them for your work! Harness Engineering is Systems Engineering
Viv tweet media
English
21
33
379
20.1K
James retweetledi
Latent.Space
Latent.Space@latentspacepod·
🆕 How to Kill The Code Review latent.space/p/reviews-dead the volume and size of PRs is skyrocketing. @simonw called out StrongDM’s “Dark Factory” last month: no human code, but *also* no human review (!?) in this week’s guest post, @ankitxg makes a 5 step layered playbook for how this can come true.
English
50
100
769
605.4K
James retweetledi
Matt Stockton
Matt Stockton@mstockton·
An interesting aspect of these models and foundation model companies: - Their internal teams know *a ton* about how to best use these models - They are publishing things (e.g. skills) that let you essentially leverage that knowledge for free - You should never really 'hand craft' a context at this point. It's much better for you to 'find the existing' bootstrapped context (or context generator) and use that instead, or have the model 'prompt it' out of you (e.g. AskUserQuestion tool all day) - The skill-creator skill is a perfect example of this. It's essentially leading-edge knowledge of people *at foundation labs knowing what works* just available to you, for free - It's kind of weird, but there's actually just incredible 'alpha' by finding existing skills that work versus trying to do your own thing. - With the right set of skills loaded, I would make a bet that a large proportion (maybe a majority) of white-collar work could be accomplished by purely typing the key '/' followed by a word into a terminal, over and over again - It still helps to have good taste, know what good looks like, take incremental approaches, and just generally be curious -- but the shape of what it even means 'to work' has totally shifted - and it's going to continue to shift even faster than it is now. It's certainly weird, but we are here.
Lance Martin@RLanceMartin

check out the updated skill-creator. i esp like built-in support for test generation (e.g., to measure + optimize tricky things like skill trigger rate). available in Claude Code as plugin, Claude.ai, + Cowork.

English
9
20
551
88.6K
James retweetledi
Simon Willison
Simon Willison@simonw·
New chapter of my Agentic Engineering Patterns guide. This one is about having coding agents build custom interactive and animated explanations to help fight back against cognitive debt simonwillison.net/guides/agentic…
English
62
99
1.2K
84K
James retweetledi
Massimo
Massimo@Rainmaker1973·
This turtle behavior, often called "claw fluttering", is a courtship ritual where a male turtle rapidly vibrates or waves his long front claws (or "jazz hands") near a female's face to attract her.
English
202
775
11K
792.6K
James retweetledi
Connor Davis
Connor Davis@connordavis_ai·
Holy shit… this paper might be the most important shift in how we use LLMs this entire year. “Large Causal Models from Large Language Models.” It shows you can grow full causal models directly out of an LLM not approximations, not vibes actual causal graphs, counterfactuals, interventions, and constraint-checked structures. And the way they do it is wild: Instead of training a specialized causal model, they interrogate the LLM like a scientist: → extract a candidate causal graph from text → ask the model to check conditional independencies → detect contradictions → revise the structure → test counterfactuals and interventional predictions → iterate until the causal model stabilizes The result is something we’ve never had before: a causal system built inside the LLM using its own latent world knowledge. Across benchmarks synthetic, real-world, messy domains these LCMs beat classical causal discovery methods because they pull from the LLM’s massive prior knowledge instead of just local correlations. And the counterfactual reasoning? Shockingly strong. The model can answer “what if” questions that standard algorithms completely fail on, simply because it already “knows” things about the world those algorithms can’t infer from data alone. This paper hints at a future where LLMs aren’t just pattern machines. They become causal engines systems that form, test, and refine structural explanations of reality. If this scales, every field that relies on causal inference economics, medicine, policy, science is about to get rewritten. LLMs won’t just tell you what happens. They’ll tell you why.
Connor Davis tweet media
English
120
352
2.2K
233.2K
James retweetledi
Charly Wargnier
Charly Wargnier@DataChaz·
A must-bookmark for vibe-coders. @YCombinator’s guide to making the most of vibe coding:
Charly Wargnier tweet mediaCharly Wargnier tweet media
English
58
266
2.5K
179.4K