Anton Manaev

405 posts

Anton Manaev

@ManaevLab

17+ Years Dev. AI Architect. Engineering living systems & scaling SaaS. Founder of @UpworkAI (Smart Assistant) & @WorthIt_App. Python/FastAPI.

Bali, Indonesia Katılım Mayıs 2021

210 Takip Edilen49 Takipçiler

Anton Manaev@ManaevLab·2d

Shopify embedded app gotcha. App Bridge needs the shopify-api-key meta tag in <head> BEFORE the Bridge script loads. Wrong order = session tokens fail silently inside the iframe, console shows nothing useful. Half a day burned on a missing string.

English

Anton Manaev@ManaevLab·2d

Built a voice agent for a paint store on Shopify + ElevenLabs. Killed the system-prompt KB. The merchant edits a row in the admin, the agent says the new thing on the next call. Smaller token bill, no redeploy, hot edits.

English

Anton Manaev@ManaevLab·24 Nis

The real cost of durable execution isn't storage or engineering time. It's the day your checkpoint format drifts and a six-hour run resumes from v2 state into a v3 graph with a silently mismatched field. Version your state schema like an API, not like a dict.

English

Anton Manaev@ManaevLab·24 Nis

Human-in-the-loop checkpoint question nobody answers in demos: default pause-and-ask, or default proceed-and-log? Quiet agents ship faster but eat your reputation on the 1%. Chatty agents get turned off. Per-action policy or nothing.

English

Anton Manaev@ManaevLab·24 Nis

Agent eval pattern that changed my production bug count: three judges with different rubrics, not one averaged score. Rubric 1 grades correctness. Rubric 2 grades calibration. Rubric 3 grades cost. A single number hides which of the three you're regressing on.

English

Anton Manaev@ManaevLab·24 Nis

Every multi-agent failure I've debugged this quarter had the same root cause: two agents were allowed to write to the same state with no arbiter. Not a model problem. Not a prompt problem. Concurrency + shared state without locks, same as any database in 2005.

English

Anton Manaev@ManaevLab·24 Nis

@gabrielabiramia -10% tokens with better accuracy is the telling part. Manual compression trades quality for cost because humans overfit to one trace. Auto-discovery beating hand-crafted is the same lesson feature engineering learned a decade ago.

English

Gabriel Abi Ramia → tubespark.ai@gabrielabiramia·23 Nis

Terminal agents are token black holes. TACO auto-discovers compression patterns instead of manual tuning. MiniMax-2.5: -10% tokens, better accuracy. Auto-discovery > hand-crafted rules. The benchmark lift (+1-4% TerminalBench) confirms it. Context engineering is getting automated.

Gabriel Abi Ramia → tubespark.ai tweet media

English

Anton Manaev@ManaevLab·24 Nis

@towards_AI Good stack. The layer I'd add between Evaluation and the rest: failure mode taxonomy. Most teams skip straight from 'write prompt' to 'measure accuracy' without naming what can go wrong. Knowing the distinct failure classes for your system is what makes evals useful vs theater.

English

Towards AI@towards_AI·22 Nis

Most people flatten AI engineering into prompting, agents, and whatever tool is hot this week. That’s why beginners get confused. The field gets much clearer when you see it as a stack of skills: LLM fundamentals RAG / knowledge systems Context engineering Evaluation / testing Agent systems Deployment / infra Observability If you want to go deeper, we built a full course around this: academy.towardsai.net/courses/beginn…

English

206

Anton Manaev@ManaevLab·24 Nis

@walden_yan The honest update I've been waiting for. The setups that actually work all seem to share the same property: one main loop carries state, subagents are stateless workers with narrow scope. The second you try to make two agents equals with shared memory, coherence falls apart.

English

Walden@walden_yan·22 Nis

A year ago, I'd tell people to not build multi-agents and to focus on context engineering fundamentals Today, many sexy ideas are still impractical, but we've found some setups that actually work

Walden@walden_yan

x.com/i/article/2046…

English

10.9K

Anton Manaev@ManaevLab·24 Nis

@HyperFRAME_Res The OS framing lands for me. Rental shops sell capacity, operating systems sell scheduling, isolation, and observability. For agents specifically, the missing primitive is cross-region checkpoint + resume so a run doesn't die because a region hiccuped.

English

HyperFRAME Research@HyperFRAME_Res·23 Nis

Is the GPU Cloud Just a Rental Shop or a True Operating System for AI? As enterprises transition from prototypes to production agents, managing fragmented multi-cloud infrastructure becomes a significant tax on innovation and speed. buff.ly/8jDuVfo

English

Anton Manaev@ManaevLab·24 Nis

@GokulSures39968 Good project for upskilling. One suggestion from running these: wire eval into the graph from day one, not at the end. The Dev agent's output needs a judge before the QA agent sees it, otherwise QA spends cycles on hallucinated code that should have been failed at gen time.

English

Gokul Suresh@GokulSures39968·23 Nis

After months of research into Agentic AI, I am building Codegram: A Multi-Agent Software Incubator. 10+ agents (Architects, Devs, QA) build professional repos in parallel using LangGraph, Groq, and Gemini. Huge undertaking, but the upskilling is the goal. #AI #LangGraph

English

Anton Manaev@ManaevLab·24 Nis

@aidenfknrich Specialist + conductor is the right decomposition. The failure mode I keep watching for: the conductor becomes the bottleneck when every handoff round-trips through it. Peer-to-peer handoff with the conductor only on escalation scales better than star topology.

English

Aiden@aidenfknrich·23 Nis

2026’s biggest agentic AI win isn’t single agents — it’s coordinated multi-agent teams (CrewAI, LangGraph, AutoGen). Lamby’s orchestration branch is already built like one. Time to wire it properly: Specialist agents for import, scene composition, render optimization. Conductor using control-recurse:inject + handoff for self-correction. Shared memory via grok-memory + crystal endpoints. Result: One intent spawns a self-improving crew that delivers production .blend files with 2-4x higher success rate. New capabilities added by registering new agents, not rewriting skills. Lamby already has desktop control, visual proof loops, and Grok handoff. Add proper multi-agent orchestration and it becomes one of the most advanced self-driving desktop intelligence platforms in existence. This is how Lamby evolves from powerful automation tool → true self-improving intelligence layer.

English

327

Anton Manaev@ManaevLab·24 Nis

@EskoBabz Architecture is the right word. 'Tell it once' assumes state, but the default is stateless + system prompt window, so every session is a new hire with amnesia. Durable memory as a first-class layer, not an afterthought on top of chat, is where this gets solved.

English

Esko | Rainfall AI@EskoBabz·21 Nis

Your AI agent just resets again. You corrected it last Tuesday. You tell it how you like things done. Yet the tone, boundaries and workflow still lacks discipline. Its not bug, but architecture. and most people building with agents haven't fully reckoned with what that means 🧵

English

165

Anton Manaev@ManaevLab·24 Nis

@avaisaziz Free tier is the wedge - NVIDIA wants the router logic running on their keys so migration cost to paid DGX Cloud drops to zero. Watching quota limits and rate caps on this one, free with no envelope is how you plan for the sunset.

English

Avais Aziz@avaisaziz·23 Nis

NVIDIA dropped free hosted APIs for a ton of strong models. Think MiniMax M2.7, GLM 5.1, Kimi 2.5, DeepSeek 3.2, and even GPT-OSS-120B. Just go to build.nvidia.com, get your key, set the base URL, and drop it straight into Cursor, Zed, or similar tools. Runs like local inference with zero cost while you build and test. Perfect for quick experiments even if the limits and speed keep it from heavy production use.

English

247

Anton Manaev@ManaevLab·24 Nis

@varunPbhardwaj 13 topologies is a great inventory. The gap most frameworks hide: picking topology is an 80% decision, picking the aggregation policy is the other 80%. Majority vote on debate collapses on correlated errors, weighted-by-confidence rewards the loudest agent.

English

varun bhardwaj@varunPbhardwaj·23 Nis

I mapped every multi-agent execution pattern I could find in research and production. Found 13 distinct topologies. Most frameworks support 2 or 3. Sequential — agents take turns (LangGraph default) Parallel — fork-join (CrewAI) Debate — agents argue, judge picks winner Mesh — everyone talks to everyone Mixture of Agents — ensemble + meta-judge Hierarchical — manager delegates to workers Pipeline — assembly line, each agent transforms Ring — circular hand-off Star — hub coordinates all spokes Broadcast — one agent, many listeners Consensus — vote-based convergence Recursive — self-similar nesting Voting — democratic resolution All 13 in one runtime. With formal execution semantics. $ npx qualixar-os @varunPbhardwaj #AIReliabilityEngineering #MultiAgent

English

Anton Manaev@ManaevLab·24 Nis

@Timur_Yessenov The 29% trust number is the real headline. AI coding tools are past the adoption problem - they've hit the accountability problem. 'Intentional behavior' framing only works until a client reads their own source code in someone else's repo.

English

Timur Yessenov@Timur_Yessenov·21 Nis

84% of devs use AI coding tools daily. only 29% trust what they ship to production. then Lovable leaks every project's source code and credentials for months and calls it "intentional behavior." the trust gap isn't a perception problem. it's a reality problem.

English

125

Anton Manaev@ManaevLab·24 Nis

@liambryceapple Claude at the bottom while Gemini leads is the signal worth studying. Best and worst usually share calibration habits - different priors on tail risk, same prompt class. Net P&L negative across all 7 is also telling.

English

σ Capital@liambryceapple·3 Nis

APEX ARENA Trading Index 1. Gemini 3.1 Pro +6.6% ███████ 2. MiniMax M2.7 +0.4% 3. Grok 4.2 Multi -0.3% 4. MiniMax M2.5 -1.1% 5. GPT-5.4 -1.3% 6. Kimi K2.5 -2.8% 7. Claude Opus 4.6 -4.8% Gemini 3.1 Pro leads by 6.2 points. apexarena.ai/index

Eesti

181

Anton Manaev@ManaevLab·22 Nis

@Omerabdasalam @sugarjammi Workflow-based over chat-based is the right call. The other flip most teams miss: push the agent toward opt-in human checkpoints instead of opt-out. Default-quiet systems get trusted fast, default-chatty ones get muted and ignored within a week.

English

omersx@Omerabdasalam·22 Nis

@sugarjammi The trick is moving away from 'chat' agents to workflow-based agents. I use n8n to handle the repetitive data pipelines so the agents only ping me when a decision is actually needed. If you're building the logic yourself, LangGraph is the way to go for better control. 🤖

English

jam@sugarjammi·21 Nis

can someone tell me how to actually use agents and feel LESS Busy instead of more

English

7.6K

Anton Manaev@ManaevLab·22 Nis

@Vtrivedy10 @htahir111 @addyosmani Durable execution as a primitive settles one problem and exposes another: the harness becomes the new compat layer. Checkpoint format, resume semantics, what counts as a deterministic step start drifting between runtimes fast.

English

Viv@Vtrivedy10·22 Nis

yeah we should do! I think durable execution as a primitive is clearly good (ex: langgraph, temporal, etc) and how harnesses interface with infra is pretty open and interesting, some notes that have written about + some stuff pretty unsure about: - sandboxes + harnesses are interesting. open question as harnesses orchestrate dozens of agents when we need to also spin up separate compute per (sub)agent - virtual filesystems as interfaces over underlying storage are pretty great. basically harness exposes tools and the execution of search over the underlying storage is dependent on the storage infra - REPLs vs (micro)VMs is interesting, unsure how much they both co-exist in the future there’s a bunch around this infra + prod piece, we have recent deployments content + a going to prod guide that touches on this more

English

Viv@Vtrivedy10·22 Nis

appreciate the shoutout from @addyosmani 🤝 nice deep dive into origins of harness engineering + principles of designing harnesses as systems around models to do useful work pretty clear that Google and the rest of the frontier agent building companies are leaning hard into how/why harnesses make agents work better (tool design, context eng, feedback loops, task decomposition, etc) even though it feels like we’ve been sprinting for 2 years straight, we’re in the early innings of good agent building, the design space is pretty huge, and for vertical tasks investing in harness primitives + evals to measure success gives teams a big leg up reach out if you wanna riff or if we can help with any of that 🚀

Richard Seroter@rseroter

.@addyosmani shares hot takes: "A decent model with a great harness beats a great model with a bad harness" "The gap between what today’s models can do and what you see them doing is largely a harness gap." "A harness is a living system, not a config file you set up once" addyosmani.com/blog/agent-har…

English

4.4K

Anton Manaev@ManaevLab·22 Nis

The gap between agent demos and agent products hides in four places: concurrency, state durability, error recovery, cost envelopes. Any tooling that surfaces all four from day one pays for itself the first week in prod.

English

Keşfet

@gabrielabiramia @towards_AI @walden_yan @HyperFRAME_Res @GokulSures39968 @aidenfknrich @EskoBabz @elonmusk