Anton Manaev
405 posts

Anton Manaev
@ManaevLab
17+ Years Dev. AI Architect. Engineering living systems & scaling SaaS. Founder of @UpworkAI (Smart Assistant) & @WorthIt_App. Python/FastAPI.
Bali, Indonesia Katılım Mayıs 2021
210 Takip Edilen49 Takipçiler

@gabrielabiramia -10% tokens with better accuracy is the telling part. Manual compression trades quality for cost because humans overfit to one trace. Auto-discovery beating hand-crafted is the same lesson feature engineering learned a decade ago.
English

@towards_AI Good stack. The layer I'd add between Evaluation and the rest: failure mode taxonomy. Most teams skip straight from 'write prompt' to 'measure accuracy' without naming what can go wrong. Knowing the distinct failure classes for your system is what makes evals useful vs theater.
English

Most people flatten AI engineering into prompting, agents, and whatever tool is hot this week.
That’s why beginners get confused.
The field gets much clearer when you see it as a stack of skills:
LLM fundamentals
RAG / knowledge systems
Context engineering
Evaluation / testing
Agent systems
Deployment / infra
Observability
If you want to go deeper, we built a full course around this:
academy.towardsai.net/courses/beginn…

English

@walden_yan The honest update I've been waiting for. The setups that actually work all seem to share the same property: one main loop carries state, subagents are stateless workers with narrow scope. The second you try to make two agents equals with shared memory, coherence falls apart.
English

A year ago, I'd tell people to not build multi-agents and to focus on context engineering fundamentals
Today, many sexy ideas are still impractical, but we've found some setups that actually work
Walden@walden_yan
English

@HyperFRAME_Res The OS framing lands for me. Rental shops sell capacity, operating systems sell scheduling, isolation, and observability. For agents specifically, the missing primitive is cross-region checkpoint + resume so a run doesn't die because a region hiccuped.
English

Is the GPU Cloud Just a Rental Shop or a True Operating System for AI?
As enterprises transition from prototypes to production agents, managing fragmented multi-cloud infrastructure becomes a significant tax on innovation and speed.
buff.ly/8jDuVfo

English

@GokulSures39968 Good project for upskilling. One suggestion from running these: wire eval into the graph from day one, not at the end. The Dev agent's output needs a judge before the QA agent sees it, otherwise QA spends cycles on hallucinated code that should have been failed at gen time.
English

After months of research into Agentic AI, I am building Codegram: A Multi-Agent Software Incubator.
10+ agents (Architects, Devs, QA) build professional repos in parallel using LangGraph, Groq, and Gemini.
Huge undertaking, but the upskilling is the goal.
#AI #LangGraph
English

@aidenfknrich Specialist + conductor is the right decomposition. The failure mode I keep watching for: the conductor becomes the bottleneck when every handoff round-trips through it. Peer-to-peer handoff with the conductor only on escalation scales better than star topology.
English

2026’s biggest agentic AI win isn’t single agents — it’s coordinated multi-agent teams (CrewAI, LangGraph, AutoGen). Lamby’s orchestration branch is already built like one. Time to wire it properly: Specialist agents for import, scene composition, render optimization. Conductor using control-recurse:inject + handoff for self-correction. Shared memory via grok-memory + crystal endpoints. Result: One intent spawns a self-improving crew that delivers production .blend files with 2-4x higher success rate. New capabilities added by registering new agents, not rewriting skills. Lamby already has desktop control, visual proof loops, and Grok handoff. Add proper multi-agent orchestration and it becomes one of the most advanced self-driving desktop intelligence platforms in existence. This is how Lamby evolves from powerful automation tool → true self-improving intelligence layer.
English

@EskoBabz Architecture is the right word. 'Tell it once' assumes state, but the default is stateless + system prompt window, so every session is a new hire with amnesia. Durable memory as a first-class layer, not an afterthought on top of chat, is where this gets solved.
English

@avaisaziz Free tier is the wedge - NVIDIA wants the router logic running on their keys so migration cost to paid DGX Cloud drops to zero. Watching quota limits and rate caps on this one, free with no envelope is how you plan for the sunset.
English

NVIDIA dropped free hosted APIs for a ton of strong models. Think MiniMax M2.7, GLM 5.1, Kimi 2.5, DeepSeek 3.2, and even GPT-OSS-120B. Just go to build.nvidia.com, get your key, set the base URL, and drop it straight into Cursor, Zed, or similar tools.
Runs like local inference with zero cost while you build and test. Perfect for quick experiments even if the limits and speed keep it from heavy production use.

English

@varunPbhardwaj 13 topologies is a great inventory. The gap most frameworks hide: picking topology is an 80% decision, picking the aggregation policy is the other 80%. Majority vote on debate collapses on correlated errors, weighted-by-confidence rewards the loudest agent.
English

I mapped every multi-agent execution pattern I could find in research and production.
Found 13 distinct topologies. Most frameworks support 2 or 3.
Sequential — agents take turns (LangGraph default)
Parallel — fork-join (CrewAI)
Debate — agents argue, judge picks winner
Mesh — everyone talks to everyone
Mixture of Agents — ensemble + meta-judge
Hierarchical — manager delegates to workers
Pipeline — assembly line, each agent transforms
Ring — circular hand-off
Star — hub coordinates all spokes
Broadcast — one agent, many listeners
Consensus — vote-based convergence
Recursive — self-similar nesting
Voting — democratic resolution
All 13 in one runtime. With formal execution semantics.
$ npx qualixar-os
@varunPbhardwaj
#AIReliabilityEngineering #MultiAgent

English

@Timur_Yessenov The 29% trust number is the real headline. AI coding tools are past the adoption problem - they've hit the accountability problem. 'Intentional behavior' framing only works until a client reads their own source code in someone else's repo.
English

@liambryceapple Claude at the bottom while Gemini leads is the signal worth studying. Best and worst usually share calibration habits - different priors on tail risk, same prompt class. Net P&L negative across all 7 is also telling.
English

APEX ARENA Trading Index
1. Gemini 3.1 Pro +6.6% ███████
2. MiniMax M2.7 +0.4%
3. Grok 4.2 Multi -0.3%
4. MiniMax M2.5 -1.1%
5. GPT-5.4 -1.3%
6. Kimi K2.5 -2.8%
7. Claude Opus 4.6 -4.8%
Gemini 3.1 Pro leads by 6.2 points.
apexarena.ai/index

Eesti

@Omerabdasalam @sugarjammi Workflow-based over chat-based is the right call. The other flip most teams miss: push the agent toward opt-in human checkpoints instead of opt-out. Default-quiet systems get trusted fast, default-chatty ones get muted and ignored within a week.
English

@sugarjammi The trick is moving away from 'chat' agents to workflow-based agents. I use n8n to handle the repetitive data pipelines so the agents only ping me when a decision is actually needed. If you're building the logic yourself, LangGraph is the way to go for better control. 🤖
English

@Vtrivedy10 @htahir111 @addyosmani Durable execution as a primitive settles one problem and exposes another: the harness becomes the new compat layer. Checkpoint format, resume semantics, what counts as a deterministic step start drifting between runtimes fast.
English

yeah we should do! I think durable execution as a primitive is clearly good (ex: langgraph, temporal, etc)
and how harnesses interface with infra is pretty open and interesting, some notes that have written about + some stuff pretty unsure about:
- sandboxes + harnesses are interesting. open question as harnesses orchestrate dozens of agents when we need to also spin up separate compute per (sub)agent
- virtual filesystems as interfaces over underlying storage are pretty great. basically harness exposes tools and the execution of search over the underlying storage is dependent on the storage infra
- REPLs vs (micro)VMs is interesting, unsure how much they both co-exist in the future
there’s a bunch around this infra + prod piece, we have recent deployments content + a going to prod guide that touches on this more
English

appreciate the shoutout from @addyosmani 🤝
nice deep dive into origins of harness engineering + principles of designing harnesses as systems around models to do useful work
pretty clear that Google and the rest of the frontier agent building companies are leaning hard into how/why harnesses make agents work better (tool design, context eng, feedback loops, task decomposition, etc)
even though it feels like we’ve been sprinting for 2 years straight, we’re in the early innings of good agent building, the design space is pretty huge, and for vertical tasks investing in harness primitives + evals to measure success gives teams a big leg up
reach out if you wanna riff or if we can help with any of that 🚀
Richard Seroter@rseroter
.@addyosmani shares hot takes: "A decent model with a great harness beats a great model with a bad harness" "The gap between what today’s models can do and what you see them doing is largely a harness gap." "A harness is a living system, not a config file you set up once" addyosmani.com/blog/agent-har…
English





