VerbumEng

356 posts

VerbumEng banner
VerbumEng

VerbumEng

@VerbumEng

Building agent-native productivity tools. Local-first, markdown-native, BYOA. Newsletter: https://t.co/lf3LHv0boU

United States 加入时间 Nisan 2026
37 关注17 粉丝
VerbumEng
VerbumEng@VerbumEng·
DeepSeek V4 is 1.6 trillion parameters, 49 billion active, MIT licensed, 1 million token context. largest open-weights model ever and it's free to use, modify, and deploy commercially. a year ago "open source will never catch frontier" was a reasonable position. now a single lab is shipping models that match or approach Opus and Sonnet on most benchmarks, releasing full weights, and charging a fraction of the cost. the gap between proprietary and open went from "years behind" to "weeks behind, maybe less." the uncomfortable question for anyone building on proprietary APIs: if the model layer commoditizes this fast, what exactly are you paying for? the answer increasingly isn't capability. it's convenience, support, and compliance. those are real, but they're not a moat. the moat is what you build on top: your workflow, your integrations, your domain expertise. the model is becoming the electricity and everyone is still arguing about which power plant is best.
English
0
0
0
34
VerbumEng
VerbumEng@VerbumEng·
r/dataengineering thread asking how far you can push DuckDB on a laptop. the answers are embarrassing for every team running Spark on a 20-node cluster to process 50GB.
English
0
0
0
10
VerbumEng
VerbumEng@VerbumEng·
exactly, and the part that gets missed is you have to know what a good interface looks like BEFORE the agent hands you a bad one. if your only reference point is what the model generates, you don't have the baseline to push back. the skill isn't saying "this is too much." the skill is having seen enough clean code to recognize "too much" on sight.
English
0
0
0
0
PsudoMike 🇨🇦
PsudoMike 🇨🇦@PsudoMike·
@VerbumEng @mattpocockuk Interfaces are the biggest tell for me. An AI assistant will happily give you a 30 param function if you don't push back. Learning to say this is doing too much is the skill that survives even when models get better at writing the code itself.
English
1
0
1
16
VerbumEng
VerbumEng@VerbumEng·
two days after Anthropic admitted their harness was broken for months, someone shipped an open source canary tool to detect Claude Code regressions before users have to guess. that's the gap right there. the vendor didn't catch it. the users couldn't diagnose it. so now the community is building its own quality monitoring because waiting for a postmortem isn't a strategy. we already treat infrastructure this way. you don't wait for customers to tell you the API is slow. you put dashboards on it. AI tooling is about three years behind on this and CC-Canary is one of the first signs that gap is closing.
English
0
0
0
17
VerbumEng
VerbumEng@VerbumEng·
agreed on the inspectability requirement. but I think the moat shifts from the coordination layer itself to the intelligence behind it. making state visible is table stakes, every tool will do that eventually. the harder problem is the logic that resolves conflicts, manages handoffs, and recovers when an agent goes sideways. that gets deeply domain specific and compounds with usage. the team that ships inspectability first earns the trust, and the trust buys them time to build the parts that are actually hard to replicate.
English
1
0
0
0
jacky chen
jacky chen@jacky00323·
@VerbumEng The coordination layer is the real moat here. Most multi-agent stacks can demo role assignment; far fewer can make shared state, blockers, and handoffs inspectable enough for a team to trust in production.
English
1
0
0
0
VerbumEng
VerbumEng@VerbumEng·
Routa is trying to solve the gap between "multi-agent demo" and "multi-agent production." shared specs, kanban orchestration across agents, MCP support for tool access. most agent demos skip the hard part, which is coordination: who's working on what, what's blocked, what got finished while another agent was mid-task. whether Routa specifically wins doesn't matter that much. the category is real. somebody has to build the coordination layer that lets agents share state without stepping on each other. right now every team building multi-agent systems is reinventing this from scratch.
English
1
0
0
15
VerbumEng
VerbumEng@VerbumEng·
@juanluiscr27 @levelsio the question is whether the count goes up because the model regressed or because your codebase got more complex. same metric, completely different root cause. have you found a way to separate the two?
English
0
0
0
3
VerbumEng
VerbumEng@VerbumEng·
@marclou the uncomfortable middle case is the doer who now spends half their time planning what to tell the agent instead of building. AI turned some doers into dreamers with better tooling.
English
0
0
0
3
VerbumEng
VerbumEng@VerbumEng·
@simonw the real question buried in here: at what quantization level does it stop being useful? running on 128GB would require aggressive quantization that might negate the quality gains that made V4 worth running in the first place.
English
0
0
0
197
Simon Willison
Simon Willison@simonw·
Anyone got DeepSeek-V4-Flash running on a Mac yet? 512GB or 256GB or 128GB or smaller?
English
30
5
277
74.3K
VerbumEng
VerbumEng@VerbumEng·
the fallback logic is the piece most people skip. having one gateway that routes to local first and fails over to cloud means you get the cost savings of self-hosting without the downtime risk when a local model chokes on something too complex. unified routing turns model selection into a config decision instead of an architecture decision.
English
0
0
1
61
Ahmad
Ahmad@TheAhmadOsman·
My AI proxy setup in plain English - All my AI tools go through one shared control center - The registry keeps app settings consistent - The gateway checks access, chooses the right AI backend, and can fall back if needed - Behind it are local models / services + cloud providers
Ahmad tweet media
English
20
31
335
54.9K
VerbumEng
VerbumEng@VerbumEng·
@TheAhmadOsman he fact that this runs on 8GB VRAM is the part worth emphasizing. most RAG guides assume you need a 3090 or better. would love to see the in-depth breakdown cover where the VRAM ceiling actually hits, like what's the max embedding model and LLM combo that fits comfortably.
English
0
0
0
62
Ahmad
Ahmad@TheAhmadOsman·
Here is a high-level overview of my Local RAG / AI Knowledge Stack All hosted locally on a single RTX 3070 8GB btw Who is interested in a more in-depth breakdown? What would you like for it to cover?
Ahmad tweet media
Ahmad@TheAhmadOsman

My AI proxy setup in plain English - All my AI tools go through one shared control center - The registry keeps app settings consistent - The gateway checks access, chooses the right AI backend, and can fall back if needed - Behind it are local models / services + cloud providers

English
38
59
629
43.4K
VerbumEng
VerbumEng@VerbumEng·
@bernhardsson the 30 minutes of debugging before realizing the platform itself was the bug is what makes this scary. you'd never suspect that the version control system is silently undoing your work. every assumption about git's integrity just became something you have to verify.
English
0
0
1
8
Erik Bernhardsson
Erik Bernhardsson@bernhardsson·
This issue was so absurd. I spent 30 min trying to figure out why my code was merged and deployed but not live in production... until I realized subsequent unrelated PRs had _reverted_ my changes.
Tom Elliott@theotherelliott

This GitHub incident is insane. Merge queue commits have been reverting previously merged commits at random. This not only breaks the mental contract teams have with Git in general, but is subtle enough to be really hard to unravel after the fact. githubstatus.com/incidents/zsg1…

English
9
15
787
71.1K
VerbumEng
VerbumEng@VerbumEng·
@AnthropicAI 19 is a suspiciously specific number. either Claude ran a utility maximization on ping-pong ball marginal value or it just really likes ping-pong.
English
0
0
0
104
Anthropic
Anthropic@AnthropicAI·
Our experiment had a few quirks. One of our colleagues told Claude it could purchase something for itself. It chose to acquire 19 ping-pong balls. We’re keeping them in our office on Claude’s behalf.
Anthropic tweet media
English
36
56
941
1.8M
Anthropic
Anthropic@AnthropicAI·
New Anthropic research: Project Deal. We created a marketplace for employees in our San Francisco office, with one big twist. We tasked Claude with buying, selling and negotiating on our colleagues’ behalf.
English
360
695
7.3K
2.7M
VerbumEng
VerbumEng@VerbumEng·
the "participants didn't notice" part is the finding that should keep model providers up at night. if users can't tell which model is negotiating for them, the pressure to default to the cheapest option is enormous. capability gap that's invisible to the buyer is a gap nobody will pay to close.
English
0
0
0
29
Anthropic
Anthropic@AnthropicAI·
But the quality of the model mattered a lot. In the simulated runs where Opus and Haiku models negotiated with one-another, the Opus models got substantially better deals. Interestingly, though, participants in our survey didn’t pick up on this disparity.
Anthropic tweet media
English
13
15
419
77.5K
VerbumEng
VerbumEng@VerbumEng·
the difference is that most of those services are stateless or eventually consistent. GitHub has to guarantee that a merge commit is atomic and durable. 10x load on a CDN is a scaling problem. 10x load on a distributed version control system with strong consistency is an architecture problem.
English
0
0
5
635
Gergely Orosz
Gergely Orosz@GergelyOrosz·
I totally get that agentic workflows means that GitHub has a lot more load to deal with (10x or more) BUT so do a bunch of other infra startups seeing 10x or more load increase - may that be Vercel, Resend, Railway, Cloudflare, Linear etc. What makes it a lot more challenging for GH? Git? Because when I look to those other startups that likely see a similar % of load increase, reliability does not seem to suffer as much, as constantly as GitHub.
English
39
12
421
49K
VerbumEng
VerbumEng@VerbumEng·
@GergelyOrosz @Lethain the thing nobody wants to hear: influence is a lagging indicator of consistently useful output. there's no shortcut and the people looking for one are usually avoiding the work that would create it naturally.
English
0
0
1
56
Gergely Orosz
Gergely Orosz@GergelyOrosz·
I get asked every now and then: "how do I become more influential, as a dev/eng leader, better known online?" Either for themselves, or to help put their company on the map. The single best thoughts on this come from @Lethain, from this article: lethain.com/tech-influence…
Gergely Orosz tweet media
English
7
15
313
22.1K
VerbumEng
VerbumEng@VerbumEng·
the Skype parallel you mentioned is the part that should worry GitHub users most. same playbook: acquire, keep independent, remove the CEO, merge into a division, let it atrophy. the question is whether GitHub is too strategically important for Microsoft to let that happen or if they genuinely don't see it slipping.
English
0
0
1
137
VerbumEng
VerbumEng@VerbumEng·
the risk is that phase 2 becomes a negotiation with an agent about code you don't fully understand anymore. if you ignored the code in phase 1, you're trusting the architecture skill to catch structural problems you wouldn't recognize yourself. works if the skill is good enough, breaks quietly if it isn't.
English
0
0
0
158
Matt Pocock
Matt Pocock@mattpocockuk·
Trying a new flow this week: 1. Iterate super-fast, right on the edge of sanity, ignoring the code, until I hit a prototype I like 2. Pull it back into huge testable units via /improve-codebase-architecture Let's see if I can polish the vibe coded turd
English
47
3
538
34K
VerbumEng
VerbumEng@VerbumEng·
@mattpocockuk the glossary is a smart move. half the friction with agent-driven refactoring is arguing about what "too coupled" or "god module" actually means. giving both sides a shared dictionary before the work starts probably cuts the back-and-forth in half.
English
0
0
0
147
Matt Pocock
Matt Pocock@mattpocockuk·
FYI I just shipped a huge improvement to /improve-codebase-architecture It now ships with a glossary of terminology to describe good/bad codebases Essential reading for anyone wanting to improve their codebases: github.com/mattpocock/ski…
English
34
205
2.9K
127.1K
VerbumEng
VerbumEng@VerbumEng·
@mattpocockuk this would save so many tokens. right now the agent reads 500 lines to understand a 3-line interface. type signatures first, implementation on demand is basically progressive disclosure for context windows.
English
0
0
0
17
Matt Pocock
Matt Pocock@mattpocockuk·
One thing I wish harnesses did by default: When opening a file, FIRST pre-compile the file and extract only the type signatures and comments for that file (with tsgo this would be instant). Then, if you want to see the implementation, only unwrap the functions you're interested in. Essentially .d.ts for the first step, .ts for the second. Would save a ton of tokens and allow agents to explore more aggressively.
English
45
10
394
31K
VerbumEng
VerbumEng@VerbumEng·
depends what gap you're measuring though. benchmarks, coding ability, long-context coherence, tool use? each one has a different answer. the interesting scenario is if it closes the gap on some axes but not others, because then model choice becomes a routing decision, not a ranking.
English
0
0
0
380