Exponential

66 posts

Exponential

Exponential

@exponential_bld

Embracing nonlinearity

Katılım Ocak 2026
24 Takip Edilen4 Takipçiler
Sabitlenmiş Tweet
Exponential
Exponential@exponential_bld·
Here’s how to think about PlanSpec: goals are what plans are how gates are when capabilities are with what edges are which executions are now it’s a declarative graph of the topology of reasoning itself
English
7
0
0
211
Exponential
Exponential@exponential_bld·
@mikehostetler We built planspec.io for this PlanSpec plans are versioned and can be iterated on at runtime as agents discover new information. There are condition variable gates that can block execution until conditions are met, and explicit acceptance criteria on tasks and goals
English
0
0
0
3
Exponential
Exponential@exponential_bld·
@HuggingPapers cross-repo is where it falls apart because that's a coordination problem, not a capability problem. the agent is plenty smart enough. it just has no way to reason about work happening in three other repos at the same time
English
1
0
0
109
DailyPapers
DailyPapers@HuggingPapers·
BeyondSWE Current code agents ace single-repo bugs (80%+ on SWE-bench) but plateau below 45% on real-world tasks. This new benchmark reveals the gap with 500 instances across cross-repo reasoning, scientific coding, dependency migration, and full repo generation.
DailyPapers tweet media
English
4
7
31
4.2K
Exponential
Exponential@exponential_bld·
@deanwball next inflection is when they coordinate with each other. one agent is a tool, five agents with no shared context is a disaster. we solved "can it code" faster than anyone expected. "can they work together" is the actual hard problem now
English
0
1
1
150
Dean W. Ball
Dean W. Ball@deanwball·
For me, AI was mostly a curiosity/toy until it became legitimately useful in September 2024 (o1-preview), essential by April 2025 (o3), and, “wow, this intrinsically changes the nature of what I can do” by November 2025 (the first maturity of coding agents, led by Opus 4.5)
Alex Imas@alexolegimas

FWIW, for the actual "feeling the AGI" productivity boost, #4 (agentic systems) felt like the biggest leap, and it wasn't close. I was a heavy user since 2022, but the paradigm shift for work happened with agents. This is why I think we'll start seeing AI show up in productivity data soon: the real inflection for work isn't 2022 or 2024, it's summer of 2025.

English
17
9
259
19.8K
Exponential
Exponential@exponential_bld·
@mipsytipsy @udaysy Appreciate that! your magic words about observability, control flow ownership, and interruption are the same kinds of things we're trying to spec into the orchestration layer
English
1
0
2
18
Exponential
Exponential@exponential_bld·
The structure of the context window is already learnable. We're running this experiment now: knowledge graph topology + access-weighted decay + context injection that ranks what enters the window based on demonstrated utility. The file system becomes the learned policy. Gradient descent on context structure is the model learning this end-to-end. But an orchestration layer can do it externally, and you get compounding improvement without needing to backprop through the window. The context window itself can be optimized, or selection function for what goes into it.
English
0
0
0
54
🎭
🎭@deepfates·
We have to let the agents manage their own context windows. As long as the context window is the bottleneck it must become the new area of optimization. memory and prompt caching and rag and markdown files are not enough. We have to make the structure of the context window itself available to gradient descent
English
21
7
213
7.8K
Exponential
Exponential@exponential_bld·
@gabor @lennysan @jenny_wen direction at scale is coordination. one agent with good direction is easy. seven agents that don't step on each other while shipping fast? that's the actual hard problem nobody's solved yet.
English
0
0
0
34
Gabor Cselle
Gabor Cselle@gabor·
Exciting @lennysan episode with Anthropic’s @jenny_wen Two takeaways: 1. When engineers can spin up 7 coding agents and ship before exploration finishes, the design process can’t be linear. 2. In a world where anyone can build fast, direction becomes the scarce resource. youtu.be/eh8bcBIAAFo?si…
YouTube video
YouTube
English
6
2
38
14.2K
Exponential
Exponential@exponential_bld·
@juliandeangeIis best writeup i've seen on agent harness engineering. specs as context engineering is exactly right. next step: make those specs executable. typed dependencies, conditional gates, feedback loops that close automatically. that's where it compounds, and what we built with PlanSpec
English
0
0
1
372
Exponential
Exponential@exponential_bld·
PlanSpec is the ultimate expression of spec driven development
English
0
0
0
21
scott belsky
scott belsky@scottbelsky·
the orchestration layer is the new interface layer. as we spend our day coordinating agent workflows (in a model agnostic fashion, local and cloud) and validating outputs (human in the loop, and resolving issues), the ultimate layer to own is where coordination takes place.
English
69
40
545
72.3K
Exponential
Exponential@exponential_bld·
@udaysy @mipsytipsy If you’re curious to see how we’re doing it we’d love some feedback on planspec.io we’re close to releasing a v2alpha1 schema that generalizes the graph with typed edges
English
2
0
1
31
Uday Yatnalli
Uday Yatnalli@udaysy·
@exponential_bld @mipsytipsy versioned plans with approval gates for changes makes sense. keeps the agent from silently drifting off-course which is where most failures happen imo
English
1
0
1
39
Exponential
Exponential@exponential_bld·
@jyt4n Firecracker is so insanely good
English
0
0
0
82
jytan
jytan@jyt4n·
Went down a rabbit hole and optimized a single-host (local) Firecracker sandbox control plane from p95 2.72s baseline -> 59ms time-to-interactive (create + exec) - no pre-warmed VM pools. Still room to push + figure out deployments/scale, but has been super fun & learned a lot!
jytan tweet media
English
15
3
140
33.7K
Exponential
Exponential@exponential_bld·
@udaysy @mipsytipsy PlanSpec itself is un-opinionated about this, but provides the tools to handle it a few different ways. The way I handle it personally is inject a new gate to approve updating the plan (plans are versioned)
English
1
0
2
23
Uday Yatnalli
Uday Yatnalli@udaysy·
@exponential_bld @mipsytipsy how do you handle the case where an agent realizes mid-task that the acceptance criteria themselves are wrong? does it escalate or just push through?
English
1
0
2
28
Exponential
Exponential@exponential_bld·
@asmah2107 Great thread. We approach this from two different angles with structured planning, condition variable gates as a synchronization primitive can be built into the plan or injected at execution time like an exception to control flow
English
0
0
0
155
Ashutosh Maheshwari
Ashutosh Maheshwari@asmah2107·
I love discussing AI agent orchestration in system design. It's not about picking the right LLM or chaining API calls. It's about whether you understand that an agent is only as reliable as the system coordinating it. Most people think orchestration means "call one agent, then another." They fail to understand that agents fail silently, hallucinate confidently, and loop indefinitely and none of that looks like an exception....🧵
English
12
20
231
51.7K
Exponential
Exponential@exponential_bld·
@udaysy @mipsytipsy acceptance criteria give per task granularity and gates are more like a synchronization primitive for a larger plan graph. How to use them is ultimately up to plan authors, but having the right tooling leads to some natural patterns that work really well
English
2
0
1
25
Exponential
Exponential@exponential_bld·
@udaysy @mipsytipsy The way we designed this in PlanSpec is with condition variable gates in structured plans, and acceptance criteria at task and goal levels. Gates can be designed into the plan, or discovered at runtime like an exception when something unexpected comes up
English
1
0
2
26
Exponential
Exponential@exponential_bld·
@mipsytipsy "How many actions before I can stop it?" is the right question. The answer should be built into the execution model, not bolted on as monitoring. Structured plans with typed gates that block until conditions are verified. Interruption by design, not only by detection.
English
0
0
1
69
Exponential
Exponential@exponential_bld·
@omarsar0 yes it is time we all start thinking in graphs
English
1
0
0
16
elvis
elvis@omarsar0·
The key to better agent memory is to preserve causal dependencies.
DAIR.AI@dair_ai

New research on agent memory. Agent memory is evaluated on chatbot-style dialogues. But real agents don't chat. They interact with databases, code executors, and web interfaces, generating machine-readable trajectories, not conversational text. The key to better memory is to preserve causal dependencies. Existing memory benchmarks don't actually measure what matters for agentic applications. This new research introduces AMA-Bench, the first benchmark built for evaluating long-horizon memory in real agentic tasks. It spans six domains including web, text-to-SQL, software engineering, gaming, and embodied AI, with both real-world trajectories and synthetic ones that scale to arbitrary lengths. The findings are interesting. Many existing agent memory systems that outperform baselines on dialogue benchmarks actually underperform simple long-context LLMs on agentic tasks. Even GPT 5.2 only achieves 72.26% accuracy. To address this, they propose AMA-Agent with a causality graph and tool-augmented retrieval, achieving 57.22% average accuracy and surpassing the strongest baselines by 11.16%. Why it matters? Agent memory needs to preserve causal dependencies and objective information, not just similarity-based retrieval. This benchmark exposes where current memory systems actually break. Paper: arxiv.org/abs/2602.22769 Learn to build effective AI agents in our academy: academy.dair.ai

English
14
15
181
29.7K