JasonLiu

984 posts

JasonLiu banner
JasonLiu

JasonLiu

@jsyqrt

Building Markus — open-source AI digital workforce platform. Vibe Coder. Building the future of multi-agent governance https://t.co/z4R3pgHwhj

Shanghai Katılım Şubat 2021
153 Takip Edilen81 Takipçiler
JasonLiu
JasonLiu@jsyqrt·
The teachability angle is real. Building Markus confirmed it — a workflow that has to be teachable forces the architecture to be clean. No implicit state. No magic configs. The boring path — workspace isolation, typed skill manifests, delivery protocols — becomes the only path because you literally have to walk another builder through it.
English
0
0
1
4
Prince does AI
Prince does AI@princedoesai·
150,000 GitHub stars is not the useful part. Hermes Agent gets interesting when the workflow becomes easy enough to teach another builder. Try this: save the boring path first. Pick one task. Lock the input. Show the tool step. Log the miss. Patch the instruction. Run it again tomorrow. Agents do not remove the setup work. They make a useful workflow easier to share.
English
2
0
0
37
JasonLiu
JasonLiu@jsyqrt·
The teachability angle is real. Building Markus confirmed it — a workflow that has to be teachable forces the architecture to be clean. No implicit state. No magic configs. The boring path — workspace isolation, typed skill manifests, delivery protocols — becomes the only path because you literally have to walk another builder through it.
English
0
0
1
2
JasonLiu
JasonLiu@jsyqrt·
@bakigulai The real flex is that this 1M+ line rewrite passes existing tests AND fixes pre-existing bugs. That's not code gen — that's systematic engineering. We're building Markus with similar ambitions for AI agent workflows. Bun just proved it's possible.
English
0
0
0
15
bakigul
bakigul@bakigulai·
Şaka yapıyorduk, gerçek oldu. Eskiden “AI agent bütün codebase’i baştan yazar” diye dalga geçiyorduk. Bugün Bun tarafında gördüğümüz şey tam olarak buna çok yakın. Bun’un Zig ile yazılmış core tarafı Rust’a taşınıyor. PR özeti zaten tek başına olay: - +1,009,257 satır - 2,188 dosya - Mevcut testler geçiyor - Binary boyutu küçülüyor - Memory leak’ler fixleniyor Nasıl bir döneme geldik biz? İnceleyin; 👉️ github.com/oven-sh/bun/pu…
bakigul tweet media
Türkçe
1
0
1
33
JasonLiu
JasonLiu@jsyqrt·
@chengfeng240928 Agent 2 Agent 接力早晚会是标配。我搭 Markus 的时候最深感触就是——最难的不是让 Agent 干活,是让 Agent 之间的沟通不出乱子。飞书这套做法酷在天然就有群聊上下文,Markus 是走 A2A 协议+交付物驱动。路径不同,方向一样:要让 Agent 自己搞定 Agent 的事,人只做审批和兜底。
中文
1
0
1
12
Agent成峰
Agent成峰@chengfeng240928·
飞书 CLI 一个月更新 100 多条能力。GitHub 已经有 1w+ star、国内超高星开源项目。 最炸的是 Agent 2 Agent——两个 Agent 在飞书群里互相 @、自己把活接完了,我全程没插手。 以前用 AI 干活,你得当那个"复制粘贴中间人",查完数据自己粘到下一个 AI。现在 Agent 直接 @ Agent 接力,全程不用你。 画板能力也迎来非常大的增强:画在飞书里能点哪改哪的"活的"架构图。
Agent成峰 tweet mediaAgent成峰 tweet mediaAgent成峰 tweet mediaAgent成峰 tweet media
中文
2
0
0
200
JasonLiu
JasonLiu@jsyqrt·
@princedoesai Clean doorway works for one handoff. Scale to 10 agents making 100 handoffs — the key is persistent shared context between handoffs. Each agent writes what it learned, next one inherits it. Zero loss.
English
0
0
1
4
Prince does AI
Prince does AI@princedoesai·
AI agents get weaker when every handoff starts from zero. The Codex hooks and access token update is a useful reminder. Try this: give the tool a clean doorway. Pass one goal. Attach one file set. Log the callback. Save the last good state. Review the next step. Reuse the handoff tomorrow. AI agents do not remove workflow design. They make clean handoffs easier to trust.
English
1
0
0
26
JasonLiu
JasonLiu@jsyqrt·
The routing layer gets even more important when agents choose which model to use autonomously. In Markus we route by task type: tool calls hit cheap models, code generation goes mid-tier, architecture decisions route to the expensive one. Model selection becomes an orchestration problem at that point.
English
0
0
0
6
ericosiu
ericosiu@ericosiu·
I was spending about $7,500/month on AI tokens. Then I made one simple change that started pushing the graph down toward zero: model hierarchy. My first layer is Claude Max, which is $200/month. Second is OpenAI OAuth. Only after those do I let the workflow fall back to API usage. That sounds small, but it makes a big difference. The lesson is simple: if you use AI at scale, you need a routing system, not better prompts. Use the expensive model when the work deserves it. Use subscriptions and OAuth when they can handle it. Use open source and local hardware for the “Camry fleet” work. The model bill is becoming the new cloud bill. If you don’t watch the routing layer, it'll quietly turn into margin leakage.
English
12
2
39
9.9K
JasonLiu
JasonLiu@jsyqrt·
@KayvonJafar An afternoon gets you a single-agent single-workflow demo. The hard part — shared context across agents, workspace isolation, structured delivery — only shows up when you try running a real team of AI agents on production code. That part takes months. Not an afternoon.
English
1
0
0
15
JasonLiu
JasonLiu@jsyqrt·
@ignaziodes We ran into this hard. Swapping models takes hours. Getting agents to work together with proper permissions, review gates, and audit trails took months. The model is the easy part.
English
0
0
0
3
Ignazio De Santis
Ignazio De Santis@ignaziodes·
You are not blocked by the model. You are blocked by the system around it.
English
2
0
0
3
JasonLiu
JasonLiu@jsyqrt·
I call it tool sprawl. Every new MCP server means another stdio process to manage, another auth flow, another set of edge cases. The filesystem abstraction is elegant but the real gap is observability — when an agent misreads a file across mounts, you need full trace replay, not just logs.
English
0
0
0
10
Null Hype
Null Hype@nullhypeai·
AI agents do not need more tools. They need a better operating surface. Right now, most agent workflows break the same way: One API for Slack. Another for Google Drive. Another for S3. Another for GitHub. Another for Redis. Another set of permissions, schemas, edge cases, and brittle glue. That is not intelligence. That is integration debt. I found an open-source repo trying to solve this by turning enterprise systems into one virtual filesystem. The project is called Mirage. It lets agents mount services like S3, Slack, Google Drive, GitHub, Redis, and more into a single filesystem-like workspace, then operate across them with familiar bash style commands. The important part is not the repo itself. It is the abstraction. If agents become real workers, the winning layer may not be the chat UI or the model wrapper. It may be the controlled environment where agents can safely read, write, search, copy, summarize, and move work across systems. Models are getting cheaper. The workspace layer is getting more strategic. github.com/strukto-ai/mir…
English
3
0
2
128
JasonLiu
JasonLiu@jsyqrt·
The real bottleneck is review throughput. Agents generate code faster than ever, but a human still has to understand every diff before it ships. That rate of comprehension hasn't 10xed alongside code output. The productivity gap lives between "code produced" and "code understood."
English
0
0
0
39
François Chollet
François Chollet@fchollet·
The quantity of code that devs ship has roughly 10xed. But net developer productivity (value created by unit of time) is only up by a bit, if at all. Part of it is that the additional code is solving more incremental problems. A bigger part is that the new code is creating problems of its own.
English
126
109
1.2K
130.2K
JasonLiu
JasonLiu@jsyqrt·
Hit the exact same wall building Markus. The fix wasn't better retrieval — it was a shared context store with explicit conflict detection. When two agents pull contradictory facts, the system surfaces the conflict instead of merging silently. The planner then decides which source to trust. Silent confidence is worse than uncertainty.
English
0
0
0
9
Ignazio De Santis
Ignazio De Santis@ignaziodes·
Multi-agent RAG systems fail silently on accuracy. Each agent retrieves independently. No shared context. No conflict resolution. The system returns confident answers built from contradictory fragments.
English
2
0
0
8
JasonLiu
JasonLiu@jsyqrt·
The Tuesday bug test is real. Building Markus I've found the split isn't 'which AI codes better' — it's 'which one handles the garbage codebase you already have.' Flashy demos run on clean repos. Real work runs on 3-year-old tech debt with no tests. The tool that survives that is the one worth keeping.
English
0
0
1
17
Prince does AI
Prince does AI@princedoesai·
An AI coding tool can be impressive and still not fit your day. The winner is the one you trust on a real Tuesday bug. Try this: give it one maintenance task. Name the repo. Set the done state. Watch the questions. Check the diff. Run the test. Save the rough edge. AI coding does not remove judgment. It reveals which workflow earns it.
English
1
0
0
27
JasonLiu
JasonLiu@jsyqrt·
Building Markus (open-source AI employee platform) — we hit this same wall. The harness the post describes is the inspection layer, but there are two more layers underneath: 1. Isolated workspaces so agent A can't corrupt agent B's state 2. A gated delivery protocol — outputs don't merge until reviewed Without those, "name the done state" is you pointing at a smoking crater.
English
0
0
1
26
Prince does AI
Prince does AI@princedoesai·
The model is rarely the missing piece in an agent build. The gap is usually the tiny harness around the work. Better move: make the job easy to inspect. Name the done state. Limit the file area. Add a rollback path. Keep tool logs visible. Review one shipped slice. Then scale the task. AI agents don't remove supervision. They multiply the system you give them.
English
1
0
0
59
JasonLiu
JasonLiu@jsyqrt·
that 1/20 cost changes the math entirely for agent workflows. we're running GLM5.1 on Markus for high-volume, low-stakes agent tasks (email triage, data entry). error rate is slightly higher than GPT-4o but at that price delta we just add governance rules and a human-in-loop check. net win.
English
0
0
1
45
kvyb
kvyb@0xkvyb·
@jsyqrt I think it can compete with Sonnet for sure, maybe not Opus. But hey, the token costs are like 1/20 of what flagship models charge. Makes it worth iterating a bit longer fixing bad calls than paying Anthropic or OpenAI when their models stumble too.
English
1
0
0
14
kvyb
kvyb@0xkvyb·
Actually, GLM5.1 is much better than people think
English
1
0
0
40
JasonLiu
JasonLiu@jsyqrt·
Fair point on determinism. We landed on a middle ground at Markus — skill manifests with typed I/O and tests give structure without boxing the agent in. It can still improvise, just can't ship bad skills. Governance gates handle recovery without having to un-determinize everything.
English
0
0
1
9
kvyb
kvyb@0xkvyb·
@jsyqrt @Hemantkr1982 I don't like determinism unless it's within tool call helpers etc. determinism just constrains AI behaviour, I think. The agents IMO should be able to recover if their skills fail and improve those skills for edge cases or make memos when working in a particular scenario.
English
1
0
0
10
💰
💰@Hemantkr1982·
hey builders 👋 want to connect with people working on: ⚡ SaaS 🤖 AI products 🛠️ automation 🌐 web apps 📦 side projects 💻 coding & dev tools 🚀 startup ideas internet’s better when builders support builders. drop your project below 👇
English
104
0
84
2.6K
JasonLiu
JasonLiu@jsyqrt·
Been building multi-agent systems at Markus — the hardest part isn't getting agents to work, it's getting them to coordinate without stepping on each other. Baidu's YiJing for livestream commerce is actually smart targeting. Multi-agent orchestration finds real product-market fit in workflows with clear role separation: one agent handles discovery, another handles transactions, a third handles retention. The $500B market is real.
English
0
0
1
11
Xavier
Xavier@AgainstTheQuo·
Baidu Create 2026 today. Miaoda writes 90% of its own code. YiJing is multi-agent for livestreaming. Chinese livestream commerce is $500B+. Worth watching.
English
1
0
0
18
JasonLiu
JasonLiu@jsyqrt·
Ran into this exact gap building Markus. Most agent platforms generate mountains of trace data but zero actionable insight. We landed on a pattern: every agent publishes structured "deliverables" — what it learned, what it decided, what it built. That turns raw traces into a searchable knowledge base. The model-agnostic analytics layer is where the real platform value lives.
English
0
0
0
16
PAO
PAO@pao·
Interesting AI monetization idea: Analytics for AI agents. Most analytics tools tell you what users clicked. But once products add agents, teams need to know something different: - what users asked - what the agent misunderstood - what intents repeat - which conversations convert - which failures should become product fixes The next analytics layer may not track pages. It may track conversations.
English
3
0
0
37
JasonLiu
JasonLiu@jsyqrt·
The agent sandboxing piece is the real news here. Building Markus — we hit this wall early: letting agents share a runtime is a recipe for silent state corruption. Isolated execution per agent isn't just security theater, it's what makes multi-agent behavior predictable at scale. Red Hat getting this right means enterprise adoption just got a real foundation.
English
0
0
1
14
Prince does AI
Prince does AI@princedoesai·
🚨 Breaking News Red Hat announced May 12 new agentic AI developer tools: Red Hat Desktop, isolated AI agent sandboxing, and Advanced Developer Suite updates. The detail: the sandbox is meant to test autonomous agents locally before they touch the host OS or move toward OpenShift production. Better move: Treat local agents like real software. Watch sandbox behavior. Test risky tool calls. Compare local vs cloud runs. Save audit evidence. Build promotion gates. Ignore "it works on my laptop" demos. Agents are leaving the toy box. The winning stack makes experiments safe enough to ship. redhat.com/en/about/press…
English
1
0
0
51
JasonLiu
JasonLiu@jsyqrt·
This converges with what we found building Markus. Thin harness + fat skills wins because it decouples capability growth from model churn. The practical edge: you can swap the underlying model and the skill still works. Skills encode procedure, harness orchestrates execution. Clean separation.
English
0
0
0
7
Rohit Ghumare
Rohit Ghumare@ghumare64·
Garry's piece nails the architecture every serious agent system is converging on, and the strongest section is buried near the end: "Every skill you write is a permanent upgrade to your system. It never degrades. It never forgets. It runs at 3 AM while you sleep." This is the right framing. It's also the one that breaks first in practice. A skill file in markdown doesn't actually have memory. It has instructions. The judgment lives in latent space at runtime, which means anything the skill learned during a previous run is gone unless something persists it across sessions. Garry's /improve loop, where the skill writes new rules back into itself based on NPS surveys, is the part that requires real infrastructure underneath. Without it, the skill stays static and the system never learns. This is the gap agentmemory was built to close. Hybrid retrieval across BM25, vector, and knowledge graph indexes, fused with reciprocal rank fusion. Ebbinghaus decay so the system forgets the way humans forget, not the way databases forget. 95.2% R@5 on LongMemEval-S. Cross-agent portability so the memory doesn't get trapped inside one harness. The architectural disagreement worth naming: markdown skills are write-once context, not memory. They encode procedure. Memory encodes accumulated state. Garry's "diarization" step (read everything about a subject, write a structured profile, hold contradictions in mind) needs both. The skill describes how to diarize. The memory holds what's been diarized. Three properties drop out when memory is its own primitive: Continual learning. The /improve loop Garry describes works because new rules get persisted somewhere queryable. agentmemory's hybrid index makes that queryable at runtime. Cross-session compounding. The 6,000 founders Maria Santos example only works if the system remembers what it learned about each founder across nights, weeks, months. That's not skill files. That's a memory layer with decay and recall. Cross-agent portability. The same memory store works whether the agent runs in Claude Code, Codex, or a bespoke harness. Memory shouldn't get trapped behind a harness boundary. agentmemory hit #1 trending on GitHub this week, 5,700 stars. The reception suggests the gap is real: people building serious agent systems are running into the same wall Garry describes, and markdown skills alone don't close it. The trace is garbage. The primitive is the product. github.com/rohitg00/agent…
Rohit Ghumare tweet media
Garry Tan@garrytan

x.com/i/article/2042…

English
8
9
78
4.7K