Aleksandr Fulha

162 posts

Aleksandr Fulha

@fulhadev

Fullstack dev / indie founder. AgentCore for enterprise AI. CombaTon MMORPG on TON. Shipping since age 14.

Katılım Kasım 2015

2 Takip Edilen22 Takipçiler

Aleksandr Fulha@fulhadev·36m

every agent we ship dies in 6-8 weeks. not from bugs — from drift. client swaps a tool, renames a process, hires someone whose vocab beats the SOP. the snapshot was fresh day 1, stale week 6. building it is cheap. keeping it alive is the contract.

English

Aleksandr Fulha@fulhadev·39m

@dani_avila7 the 'simplest setup' hides the silent-fail mode. cron ran, no error, agent did the wrong thing. need a separate verifier loop checking output against a contract — otherwise you find out a week later. simple compounds only when you stack the verifier.

English

Daniel San@dani_avila7·8h

Boris has Claude Code loops running on cron all day - One babysits his PRs and fixes CI - Another keeps CI healthy - Another pulls Twitter feedback every 30 min and clusters it His Claude Code setup is simple and pretty close to mine That's what I like about Claude Code. Hundreds of options, but the simplest setup just works, you probably don't need more than that And since one of those loops reads Twitter every 30 min, @bcherny (or his Claude on cron) is reading this post whether he wants to or not 😅

English

162

26.5K

Aleksandr Fulha@fulhadev·41m

@danmana @andruyeung that split is the unstated half. workflows decay weekly (people change tools), platform state monthly+ (datasets stabilize). different extraction too — workflows live in chat history, state in code + stale docs. treating them as one kb is what rots first.

English

Dan Manastireanu@danmana·1h

@fulhadev @andruyeung thanks, I'll scrape Slack for insights. I see now our kb problem is of two types: workflows (what people actually do) and platform state (datasets evolved over time - some in Notion, some outdated, some only in the code and people's heads)

English

Andrew Yeung@andruyeung·1d

Stripe just created a role that didn't exist 12 months ago (and they're paying multiple six figures for it) It's called the Forward Deployed AI Accelerator. They are hiring AI-native individuals to work directly with their marketing teams to fundamentally change how they work. Each person will be assigned to a cohort of 20 marketers. Their job is to build custom AI tools and agents and coach each marketer until they are self-sufficient. Basically, work with marketers until they automate their jobs. Stripe's marketing org is betting that AI should not be an occasional tool but the default mode for all work. But they also understand that most employees won't upskill themselves. They'll need someone who is embedded within their teams to build alongside them. If you are AI-pilled, this is probably the role for you. And this also gives a clear picture of where every organization within a company is heading.

English

177

2.1K

630.4K

Aleksandr Fulha@fulhadev·43m

@lux @rywalker claude code as the spine — sonnet 4.6 default, opus 4.7 when traces get long. gemini flash on cheap-tier triage. own orchestrator on top to pin each client's tool versions + vocab snapshots so model/tool swap doesn't reset context.

English

lux@lux·1h

@fulhadev @rywalker what are you backing the agents with?

English

ry@rywalker·18h

Anthropic is hunting enterprise workflows to productize. The contrarian bet: a probabilistic system will never match an agent built by someone who deeply understands how their specific company works.

English

519

35.2K

Aleksandr Fulha@fulhadev·3h

asking the team is the trap — people describe what they think they do, not what they actually do. those are two different jobs. we run it backwards. pull the data first. for one client that meant 1.5 yrs of dept telegram history + 2 yrs of whatsapp. analyze how they actually communicate, decide, escalate. then talk to humans — but only to ask "what should change?", never "what do you do?" narrow neck first. one dept at a time. pain → audit → KB. people last. unwritten knowledge is just unobserved work.

English

Dan Manastireanu@danmana·3h

@fulhadev @andruyeung How do you usually store and organize the product knowledge base, with all the features and internal knowledge? I find it difficult to get all this unwritten/learned knowledge in a written form usable by agents

English

Aleksandr Fulha@fulhadev·4h

@browser_use session isolation > CDP-readability. one logged-in profile shared across agents = next agent inherits gmail/stripe/aws auth. per-task cookie jar declared upfront — same scope-flag pattern as fs mounts. stealth fixes anti-bot, doesn't fix the cross-task auth bleed.

English

445

Browser Use@browser_use·8h

Send this to your Hermes agent: browser-harness.com

English

4.9K

Aleksandr Fulha@fulhadev·14h

@KrulestwoHK @SebAaltonen Haha

Filipino

177

KrulHK@KrulestwoHK·16h

@SebAaltonen @fulhadev Why are you talking with ai bot?

English

206

Sebastian Aaltonen@SebAaltonen·23h

You have to review all LLM code! Codex 5.5 tried to push this awful hack to our Metal backend when it was coding font rendering. It decided to implement hacky "robust buffer access" style OOM check inside the shader and hacked our whole Metal binding architecture to add a special bind group slot 30 (hardcoded) to deliver sizes of all buffer bindings. This of course made the binding model super slow and required extra data for each buffer.

English

652

80.1K

Aleksandr Fulha@fulhadev·15h

@garrytan tini handles top-level reap. but openclaw's own subprocess tree (bash, npm, test runners) is where exit codes get lost — agent exits without wait()-ing in-flight tool calls, they orphan to tini and reap silently. process-group SIGTERM keeps them attached so codes propagate.

English

Garry Tan@garrytan·18h

OpenClaw is giving me the craziest zombie worker problem Has anyone solved this properly? Is tini the correct solution?

English

24.4K

Aleksandr Fulha@fulhadev·15h

@SebAaltonen copy-on-write overlay per task. codex writes feel real in sandbox, only land in real tree if we promote at close. 'i want lib X writable today' = per-task scope flag, not chmod toggling. overlayfs or git worktree per agent, same gate handles both directions.

English

125

Aleksandr Fulha@fulhadev·18h

@akshay_pachaar 10 errors → 0 is the part most miss. tokens are easy to optimize, errors are the prod cost. every 'just-to-be-safe' line we added to CLAUDE.md eventually became a misexecution. model treats noise as instruction. less context = less surface to confuse.

English

312

Akshay 🚀@akshay_pachaar·1d

Claude Code used 3x fewer tokens with one change: - Before: 10.4M tokens · 10 errors · $9.21 - After: 3.7M tokens · 0 errors · $2.81 I used Insforge Skills + CLI as the backend context engineering layer for Claude Code (open-source and local). Repo: github.com/InsForge/InsFo… (don't forget to star 🌟)

Akshay 🚀@akshay_pachaar

x.com/i/article/2051…

English

541

109.5K

Aleksandr Fulha@fulhadev·22h

@mattpocockuk this only works if the harness resolves the pointer automatically. if the model has to chase the link itself you're back to instruction-and-hope. half ours were proud naming convention until we caught claude guessing what was on the other side instead of opening the file.

English

130

Matt Pocock@mattpocockuk·1d

Context Pointers are a crucial concept for helping AI navigate your codebase. They're the links you put in documents to link to others, so that AI can find its way around without being overwhelmed. "Our AGENTS.md used to be huge, now it's mostly context pointers." "These modules are easy to navigate: that shared function acts as a context pointer." "This is how skills work. The harness finds them, pulls their description into context, and adds a context pointer to the main SKILL.md file."

English

510

29.6K

Aleksandr Fulha@fulhadev·1d

@steipete windows + wsl2 is the enterprise unlock. half our paperclip integrations need windows for AD/office/legacy .NET — linux-only sandboxes lose that whole market. authenticated webvnc = compliance replay, which is what legal actually buys.

English

248

Peter Steinberger 🦞@steipete·1d

Crabbox 0.5.0 is live 🦀 🖥️ Desktop/browser leases 🧑‍💻 VNC + authenticated WebVNC 🪟 AWS Windows + WSL2 📸 Screenshots + app launch Remote CI boxes, now suspiciously usable. github.com/openclaw/crabb…

English

356

35.5K

Aleksandr Fulha@fulhadev·1d

cache framing fights the freshness constraint. once you've got the 200 patterns, route those 60k to a haiku-class specialist (or distilled model) that regenerates fresh per call at $0.02-0.04. classifier on entry, big model only for novel queries. cost cut AND every response is freshly generated.

English

492

Puneet Patwari@system_monarch·1d

System Design Round at Anthropic: You are running an LLM in production that costs $0.40 per query. At 100,000 queries a day that is $40,000 a day. You check your logs and find 60,000 of those queries are users asking slight variations of the same 200 questions. Your model is generating a fresh answer every single time. How do you cut your inference cost by 60% without the user ever feeling like they got a cached or stale response?

English

895

273.8K

Aleksandr Fulha@fulhadev·1d

the human-agent relationship dimension is where most rollouts stall. 'human approves every step' = same workflow with added latency. unlock = checkpoint-only: agents commit small reversible actions freely, escalate to humans only on irreversibles (prod deploy, customer copy, money out). that ratio is what determines whether throughput actually moves

English

Aaron Levie@levie·1d

Both Anthropic and OpenAI have new initiatives to help enterprises deploy AI agents within their organizations. This is a trend that’s early but going to get very big fast. As agents enter knowledge work beyond coding, there is very real work to upgrade IT systems, get agents the context they need, modernize the workflows to work with agents, figure out the human-agent relationship in the workflow, drive adoption and do change management, and much more. While AI models have an incredible amount of capability packed into them, there’s no shortcut to getting that intelligence applied to a business process in a stable way. This is creating tons of opportunities across the market for new jobs and firms, and the labs are equally recognizing the criticality here.

English

147

130

980

158.1K

Aleksandr Fulha@fulhadev·1d

file-as-state breaks first on concurrent writes. paperclip lost 3 days last month to a silent decisions.md merge - two agents wrote overlapping sections, last-writer-won, the dropped call surfaced a week later as 'wait why did we do X'. moved shared state to postgres, kept markdown as derived view. files = render layer, db = truth layer

English

Santiago@svpino·1d

Databases are far from dead. Hot take within the vibe-coding community, but you can't build a reliable agentic memory system using files alone. The filesystem is a great interface for agents, but for complex, distributed, production applications, databases win hands down. I recorded a video to show you the benchmarks. Large Language Models know how to navigate and work with the filesystem, but as soon as you add complexity, files will fall short. You need databases whenever any of the following happens: 1. You have concurrent writes from multiple agents or users 2. You need semantic retrieval at scale 3. You need ACID guarantees for shared state 4. You need audit trails and row-level access control 5. You need indexed queries over growing memory In the attached video, I'm running a notebook comparing a filesystem-backed agent with a database-backed agent. The three most important findings: • Filesystem = Database with small corpus, keyword-friendly queries • Databases > Filesystem with large corpus, fuzzy queries • Databases > Filesystem with concurrent writes without locking Numbers don't lie. You can run the benchmarks yourself.

English

200

28.3K

Aleksandr Fulha@fulhadev·1d

the cache angle actually holds in long convos. if you cache_control after each turn (which you do for streaming/multi-turn cost), per-message timestamps bloat the cumulative cached context every call. system-prompt date refreshed per-session is the cache-amortized version. probably also product decoupling - integrators decide if/how time gets injected

English

fabian@fabianstelzer·2d

why isn't Anthropic injecting a detailed date into each user message to give Claude a sense of time progression? This can't be a cacheing issue, each user message is a cache write anyway Seems like they are slotting a daily date into the system prompt instead? don't get it

English

115

1.3K

170.2K

Aleksandr Fulha@fulhadev·1d

yeah - level alerts assume sub-linear use. on a runaway loop the curve is exponential so 80% fires only after slope already crossed escape velocity. burn-rate answers 'are we on track to blow the budget' not 'how much used'. azure has it in app insights but not surfaced in cost mgmt - thats the actual gap to push them on

English

Romain de Wolff@romaindewolff·1d

@fulhadev so what ur saying is that we would have catch these crazy spikes in time due to the explosion of usage? that's a good point that I havent raised to @Azure

English

230

Romain de Wolff@romaindewolff·2d

I almost killed my company on Friday. $90,000. One Azure bill. Gone. Let me tell you what happened because I think founders need to hear this. We built an amazing document intelligence system at Whisperit. It analyzes our customers' files: PDFs, Word docs, scanned documents, using OCR. It works beautifully and user love it. But we had a bug. A small email with a zip file. Inside the zip, a PDF. Some weird edge case that created an infinite loop in our code. The virtual machine would crash, restart, and try to reprocess the same document. Again. And again. And again. We pay more than one cent per page processed. You can imagine what happened next. I saw the graph and my stomach dropped. An exponential spike. The kind of curve you want to see on your revenue chart (!!) not your cloud bill. The forecast for next month said $400,000+. I thought: this must be a mistake. Emergency 🚨. Check everything. It wasn't a mistake. The worst part? We had a warning. Back in November we had a $25K unusual spike. We fixed it. Added upload limits. I thought we were safe. But I never set a spending cap on Azure. Never set up alerts for unusual usage. I knew I should. I just didn't do it. I went through every stage: Denial → "this can't be right" Anger → screaming at myself Shame → feeling small, really small Tears → first time in a long time I cried that evening. Not because of the money, because I imagined having to close Whisperit. My team. Everything we built. Gone because of one missing setting and my stupidity. The week had been incredible. New version shipping. Lots of new users. Sales going well. Migration going well. Growing the team responsibly. And then Friday hit like a truck. Remember my last post about mistakes? Yeah. We're still making them. Bigger ones. $90,000 is the price of a NICE car. Paid for a bug and a missing checkbox. Here's what I'm doing RIGHT NOW so this never happens again: 1. Hard spending limits on every cloud service — no exceptions 2. Alerts at 50%, 80%, 100% of expected spend 3. Circuit breakers in our processing pipeline — if a document fails 3 times, it stops 4. Weekly cloud cost review — not monthly, weekly 5. Every API endpoint gets a budget ceiling If you're a founder reading this: Go set your spending limits. Today. Right now. Before your next meeting. Before your next coffee. It takes 10 minutes and it could save your company. We move fast. That's our superpower. But speed without guardrails is a bomb with a timer. I know what doesn't kill you makes you stronger. I really hope this one doesn't kill me. Still standing. Barely. Building. 🚀

English

110

345

56.3K

Keşfet

@dani_avila7 @bcherny @danmana @andruyeung @lux @rywalker @browser_use @KrulestwoHK