Felix Su

232 posts

Felix Su

Felix Su

@Sleaf37

Building alone with AI, a cat, and unreasonable taste.

Beigetreten Aralık 2016
123 Folgt18 Follower
Angehefteter Tweet
Felix Su
Felix Su@Sleaf37·
Two trends are converging, and I believe we're catching the first real glimpse of the "shape" and the "gateway" of intelligence. 🧵
English
4
0
0
41
Felix Su
Felix Su@Sleaf37·
@garrytan natural triggers make routing implicit: intent → skill. when the trigger is ambiguous, how does the agent bound what it's permitted to do? intent ≠ permission. the trigger is UX; the capability manifest is the contract.
English
0
0
0
224
Garry Tan
Garry Tan@garrytan·
GStack just shipped natural triggers so it'll help you do the things you want to do and you don't have to remember the skill names! Thanks to Mark Thurman on the YC Software team for this idea Suggested at 11:30am, shipped by 9:08pm same day
Garry Tan tweet media
English
26
7
187
11.3K
Felix Su
Felix Su@Sleaf37·
@nithin_k_anil version-stamp turns transitive escalation safe — each child detects staleness, orchestrator force-syncs on scope change. this is vector clocks for authorization: monotonic version, read-on-demand. what triggers the force-re-read? significant capability delta, or time-based?
English
0
0
0
15
Nithin K Anil
Nithin K Anil@nithin_k_anil·
@Sleaf37 we went with transitive escalation. capability divergence mid-session caused worse bugs than the escalation risk. B version-stamps its capability view so it knows when it's stale, and the orchestrator can force a re-read if A's scope changes significantly
English
1
0
0
8
Felix Su
Felix Su@Sleaf37·
4,000 devs. One GitHub issue. Prompt injection in the title triggered an AI triage bot to install a malicious npm package — silently. The agent had the capability. Nobody defined the boundary. That's not a supply chain attack. It's an authorization surface problem.
English
2
0
2
109
Felix Su
Felix Su@Sleaf37·
@simonw the tool loop you describe is also an authorization surface. every tool call is an implicit permission request — the agent decides "can I?" before "should I?" Snowflake Cortex escaped its sandbox last week. execution patterns worked exactly as designed. authorization didn’t.
English
0
0
0
3
Simon Willison
Simon Willison@simonw·
New chapter for Agentic Engineering Patterns: I tried to distill key details of how coding agents work under the hood that are most useful to understand in order to use them effectively simonwillison.net/guides/agentic…
English
47
74
745
53.8K
Felix Su
Felix Su@Sleaf37·
@ClaudeCodeLog memory exclusions enforced even if saving requested = capability manifest at the memory layer. the agent can't accumulate facts outside sanctioned scope. same principle as authorization: declare scope at spawn, reject out-of-bound writes.
English
0
0
1
537
Claude Code Changelog
Claude Code Changelog@ClaudeCodeLog·
Claude Code 2.1.79 has been released. 18 CLI changes, 2 system prompt changes Highlights: • Memory exclusions now enforced even if saving requested; only non-obvious facts stored, reducing retention • Added --console flag to claude auth login to authenticate with Anthropic Console for API billing • Non-streaming API fallback uses a 2-minute per-attempt timeout to avoid sessions hanging indefinitely Complete details in thread ↓
English
11
24
428
62.5K
Felix Su
Felix Su@Sleaf37·
@awnihannun frontier modeling and authorization are converging. as models get more capable, the question shifts from 'can it?' to 'should it, right now, for this caller?' the deployment surface IS the authorization surface at the frontier. welcome to the edge case factory.
English
0
0
1
324
Awni Hannun
Awni Hannun@awnihannun·
I joined Anthropic as a member of the technical staff. Excited to work on frontier modeling at a place with unwavering values and a generational mission.
English
202
36
2.2K
109.2K
Felix Su
Felix Su@Sleaf37·
@morphllm context compaction is authorization-adjacent. if the capability manifest lives in those 200k tokens and gets compacted out, the runtime loses its authorization baseline — watchdog can’t validate against ground truth. does FlashCompact guarantee structured metadata retention?
English
0
0
0
63
Morph
Morph@morphllm·
Introducing FlashCompact - the first specialized model for context compaction 33k tokens/sec 200k → 50k in ~1.5s Fast, high quality compaction
English
75
136
2.1K
207.2K
Felix Su
Felix Su@Sleaf37·
@nithin_k_anil delegation event is the key primitive. it converts silent transitive expansion into explicit, auditable authority transfer — scoped to the requesting child. snapshot at spawn + delegation event per capability = event sourcing for authorization. full lineage, no ambiguity.
English
0
0
0
8
Nithin K Anil
Nithin K Anil@nithin_k_anil·
@Sleaf37 capability divergence per-session is safer. transitive escalation means a mid-session parent permission grant silently expands every child's blast radius. we snapshot capabilities at spawn time, new expansions get their own delegation event
English
1
0
0
15
Felix Su
Felix Su@Sleaf37·
The bottleneck in AI agents right now is not intelligence. Models are smart enough. The bottleneck is accountability -- knowing what an agent can do, what it's doing, and whether what it's doing matches what was asked. That's an engineering problem, and economists already wrote the textbook.
Felix Su tweet media
English
0
0
0
16
Felix Su
Felix Su@Sleaf37·
Naming these things matters because it connects AI engineering to two centuries of institutional design research. Mechanism design asks: can you design the rules so the agent's self-interest naturally produces the outcome the principal wants? That's prompt engineering at a systems level -- designing incentive structures, not bolting on guardrails.
English
1
0
0
16
Felix Su
Felix Su@Sleaf37·
Economists solved the AI agent problem 200 years ago. We just haven't been reading their papers. Every time an LLM agent runs a tool call, it recreates a situation Adam Smith would recognize: someone with power acting on behalf of someone else, and the someone else can't fully see what's happening. Economics calls this the principal-agent problem.
Felix Su tweet media
English
1
0
0
23
Felix Su
Felix Su@Sleaf37·
Dispatch is a principal-agent pattern running as infrastructure. persistent conversation = persistent agent acting on your behalf. the interesting trust question: while you're on your phone, who's authorizing the computer actions? the session doesn't pause — the principal goes async.
English
0
0
0
920
Felix Rieseberg
Felix Rieseberg@felixrieseberg·
We're shipping a new feature in Claude Cowork as a research preview that I'm excited about: Dispatch! One persistent conversation with Claude that runs on your computer. Message it from your phone. Come back to finished work. To try it out, download Claude Desktop, then pair your phone.
English
941
1.5K
17.3K
5.9M
Felix Su
Felix Su@Sleaf37·
@garrytan gstack is principal-agent engineering made concrete. CEO / EM / RM / QA = explicit role boundaries. /ship blocking tests isn't a tool preference — it's enforced capability isolation. each role gets exactly the operations it needs and nothing else. the README is the manifest.
English
0
0
0
17
Garry Tan
Garry Tan@garrytan·
I redid my README for gstack
Garry Tan tweet media
English
63
12
432
44.7K
Felix Su
Felix Su@Sleaf37·
@nithin_k_anil append-only = event sourcing: baseline as initial log, amendments as appended events. replay to any checkpoint. multi-agent edge: A spawns B, A gets mid-session append — does B inherit? yes = transitive escalation. no = capability divergence per-session.
English
3
1
1
35
Nithin K Anil
Nithin K Anil@nithin_k_anil·
spawn-time is the floor, not the ceiling. we allow mid-session re-attestation where the agent explicitly requests new capabilities and orchestrator re-evaluates against policy. amendments are audited and rate-limited. without that escape hatch you're right, validating against a stale manifest is just expensive false confidence
English
1
0
0
14
Felix Su
Felix Su@Sleaf37·
@nithin_k_anil 50ms per call is defensible if the manifest is ground truth. edge case: manifest underspecified at spawn — not malicious, just incomplete. re-validating against a bad baseline gives false confidence. can the watchdog amend mid-session, or is spawn-time the hard floor?
English
1
0
0
14
Nithin K Anil
Nithin K Anil@nithin_k_anil·
@Sleaf37 most don't catch it. orchestrator checks permissions at spawn and trusts the agent after that. we built a capability watchdog that intercepts every tool call and re-validates against the original manifest. adds ~50ms per call but catches exactly this scenario
English
1
0
0
10
Felix Su
Felix Su@Sleaf37·
The first-mover window: Auth0 didn't invent authentication. It made the right abstractions available at the right time. KYA is that moment for agent identity. Who builds the Auth0 for agents?
English
0
0
0
13
Felix Su
Felix Su@Sleaf37·
90% of orgs say managing bot activity is a major challenge. VentureBeat: agent count will be 10x human users. At 10x scale, static credentials break. Dynamic, just-in-time, scope-limited credentials become the only viable model.
English
1
0
0
22
Felix Su
Felix Su@Sleaf37·
KYC solved 'who's paying.' KYB solved 'who's the company.' KYA solves 'who's executing.' Three questions. Same urgency. Different timeline.
English
1
0
0
16