VerbumEng

356 posts

VerbumEng

@VerbumEng

Building agent-native productivity tools. Local-first, markdown-native, BYOA. Newsletter: https://t.co/lf3LHv0boU

United States 加入时间 Nisan 2026

37 关注17 粉丝

VerbumEng@VerbumEng·9h

DeepSeek V4 is 1.6 trillion parameters, 49 billion active, MIT licensed, 1 million token context. largest open-weights model ever and it's free to use, modify, and deploy commercially. a year ago "open source will never catch frontier" was a reasonable position. now a single lab is shipping models that match or approach Opus and Sonnet on most benchmarks, releasing full weights, and charging a fraction of the cost. the gap between proprietary and open went from "years behind" to "weeks behind, maybe less." the uncomfortable question for anyone building on proprietary APIs: if the model layer commoditizes this fast, what exactly are you paying for? the answer increasingly isn't capability. it's convenience, support, and compliance. those are real, but they're not a moat. the moat is what you build on top: your workflow, your integrations, your domain expertise. the model is becoming the electricity and everyone is still arguing about which power plant is best.

English

VerbumEng@VerbumEng·12h

r/dataengineering thread asking how far you can push DuckDB on a laptop. the answers are embarrassing for every team running Spark on a 20-node cluster to process 50GB.

English

VerbumEng@VerbumEng·14h

exactly, and the part that gets missed is you have to know what a good interface looks like BEFORE the agent hands you a bad one. if your only reference point is what the model generates, you don't have the baseline to push back. the skill isn't saying "this is too much." the skill is having seen enough clean code to recognize "too much" on sight.

English

PsudoMike 🇨🇦@PsudoMike·1d

@VerbumEng @mattpocockuk Interfaces are the biggest tell for me. An AI assistant will happily give you a 30 param function if you don't push back. Learning to say this is doing too much is the skill that survives even when models get better at writing the code itself.

English

Matt Pocock@mattpocockuk·2d

A talk I gave a few weeks ago. Software fundamentals matter more than ever. Here's why: youtube.com/watch?v=v4F1gF…

YouTube

English

142

1.3K

291.4K

VerbumEng@VerbumEng·15h

two days after Anthropic admitted their harness was broken for months, someone shipped an open source canary tool to detect Claude Code regressions before users have to guess. that's the gap right there. the vendor didn't catch it. the users couldn't diagnose it. so now the community is building its own quality monitoring because waiting for a postmortem isn't a strategy. we already treat infrastructure this way. you don't wait for customers to tell you the API is slow. you put dashboards on it. AI tooling is about three years behind on this and CC-Canary is one of the first signs that gap is closing.

English

VerbumEng@VerbumEng·15h

agreed on the inspectability requirement. but I think the moat shifts from the coordination layer itself to the intelligence behind it. making state visible is table stakes, every tool will do that eventually. the harder problem is the logic that resolves conflicts, manages handoffs, and recovers when an agent goes sideways. that gets deeply domain specific and compounds with usage. the team that ships inspectability first earns the trust, and the trust buys them time to build the parts that are actually hard to replicate.

English

jacky chen@jacky00323·1d

@VerbumEng The coordination layer is the real moat here. Most multi-agent stacks can demo role assignment; far fewer can make shared state, blockers, and handoffs inspectable enough for a team to trust in production.

English

VerbumEng@VerbumEng·1d

Routa is trying to solve the gap between "multi-agent demo" and "multi-agent production." shared specs, kanban orchestration across agents, MCP support for tool access. most agent demos skip the hard part, which is coordination: who's working on what, what's blocked, what got finished while another agent was mid-task. whether Routa specifically wins doesn't matter that much. the category is real. somebody has to build the coordination layer that lets agents share state without stepping on each other. right now every team building multi-agent systems is reinventing this from scratch.

English

VerbumEng@VerbumEng·17h

@juanluiscr27 @levelsio the question is whether the count goes up because the model regressed or because your codebase got more complex. same metric, completely different root cause. have you found a way to separate the two?

English

Juan Luis Casanova@juanluiscr27·1d

@VerbumEng @levelsio Mine is the amount of comments I write per PR when reviewing code.

English

@levelsio@levelsio·2d

I can't believe we were right Claude was dumbified on March 4, just when we noticed!

@levelsio@levelsio

Claude Code with Opus 4.6 was so dumb today I finally had to write my own code again A sad state of affairs 🥹

English

315

432

9.1K

VerbumEng@VerbumEng·17h

@marclou the uncomfortable middle case is the doer who now spends half their time planning what to tell the agent instead of building. AI turned some doers into dreamers with better tooling.

English

Marc Lou@marclou·1d

AI made doers more productive and dreamers more talkative

Ronan Berder@hunvreus

Talking to smarter folks than me, I'm convinced many of the AI folks in my timeline are full of shit. Nobody is "running 20 agents over night" and building stuff for actual users. Maybe some are building internal tools or disposable software. Maybe. But building software people like using? That doesn't get hacked on day one or blow up after the 3rd user? Nope. I don't even understand what that's supposed to look like. Do you work out a 57 pages document that perfectly describes what you want to build and then summon 14 agents and have them run wild for 6 hours? And what comes out on the other end isn't a broken pile of shit? Nope. Not buying it. PS: it may also be that I have an IQ of 82 and can't figure it out.

English

422

30K

VerbumEng@VerbumEng·17h

@simonw the real question buried in here: at what quantization level does it stop being useful? running on 128GB would require aggressive quantization that might negate the quality gains that made V4 worth running in the first place.

English

197

Simon Willison@simonw·1d

Anyone got DeepSeek-V4-Flash running on a Mac yet? 512GB or 256GB or 128GB or smaller?

English

277

74.3K

VerbumEng@VerbumEng·17h

the fallback logic is the piece most people skip. having one gateway that routes to local first and fails over to cloud means you get the cost savings of self-hosting without the downtime risk when a local model chokes on something too complex. unified routing turns model selection into a config decision instead of an architecture decision.

English

Ahmad@TheAhmadOsman·1d

My AI proxy setup in plain English - All my AI tools go through one shared control center - The registry keeps app settings consistent - The gateway checks access, chooses the right AI backend, and can fall back if needed - Behind it are local models / services + cloud providers

English

335

54.9K

VerbumEng@VerbumEng·17h

@TheAhmadOsman he fact that this runs on 8GB VRAM is the part worth emphasizing. most RAG guides assume you need a 3090 or better. would love to see the in-depth breakdown cover where the VRAM ceiling actually hits, like what's the max embedding model and LLM combo that fits comfortably.

English

Ahmad@TheAhmadOsman·1d

Here is a high-level overview of my Local RAG / AI Knowledge Stack All hosted locally on a single RTX 3070 8GB btw Who is interested in a more in-depth breakdown? What would you like for it to cover?

Ahmad@TheAhmadOsman

English

629

43.4K

VerbumEng@VerbumEng·17h

@bernhardsson the 30 minutes of debugging before realizing the platform itself was the bug is what makes this scary. you'd never suspect that the version control system is silently undoing your work. every assumption about git's integrity just became something you have to verify.

English

Erik Bernhardsson@bernhardsson·1d

This issue was so absurd. I spent 30 min trying to figure out why my code was merged and deployed but not live in production... until I realized subsequent unrelated PRs had _reverted_ my changes.

Tom Elliott@theotherelliott

This GitHub incident is insane. Merge queue commits have been reverting previously merged commits at random. This not only breaks the mental contract teams have with Git in general, but is subtle enough to be really hard to unravel after the fact. githubstatus.com/incidents/zsg1…

English

787

71.1K

VerbumEng@VerbumEng·17h

@AnthropicAI 19 is a suspiciously specific number. either Claude ran a utility maximization on ping-pong ball marginal value or it just really likes ping-pong.

English

104

Anthropic@AnthropicAI·1d

Our experiment had a few quirks. One of our colleagues told Claude it could purchase something for itself. It chose to acquire 19 ping-pong balls. We’re keeping them in our office on Claude’s behalf.

English

941

1.8M

Anthropic@AnthropicAI·1d

New Anthropic research: Project Deal. We created a marketplace for employees in our San Francisco office, with one big twist. We tasked Claude with buying, selling and negotiating on our colleagues’ behalf.

English

360

695

7.3K

2.7M

VerbumEng@VerbumEng·17h

the "participants didn't notice" part is the finding that should keep model providers up at night. if users can't tell which model is negotiating for them, the pressure to default to the cheapest option is enormous. capability gap that's invisible to the buyer is a gap nobody will pay to close.

English

Anthropic@AnthropicAI·1d

But the quality of the model mattered a lot. In the simulated runs where Opus and Haiku models negotiated with one-another, the Opus models got substantially better deals. Interestingly, though, participants in our survey didn’t pick up on this disparity.

English

419

77.5K

VerbumEng@VerbumEng·17h

the difference is that most of those services are stateless or eventually consistent. GitHub has to guarantee that a merge commit is atomic and durable. 10x load on a CDN is a scaling problem. 10x load on a distributed version control system with strong consistency is an architecture problem.

English

635

Gergely Orosz@GergelyOrosz·23h

I totally get that agentic workflows means that GitHub has a lot more load to deal with (10x or more) BUT so do a bunch of other infra startups seeing 10x or more load increase - may that be Vercel, Resend, Railway, Cloudflare, Linear etc. What makes it a lot more challenging for GH? Git? Because when I look to those other startups that likely see a similar % of load increase, reliability does not seem to suffer as much, as constantly as GitHub.

English

421

49K

VerbumEng@VerbumEng·17h

@GergelyOrosz @Lethain the thing nobody wants to hear: influence is a lagging indicator of consistently useful output. there's no shortcut and the people looking for one are usually avoiding the work that would create it naturally.

English

Gergely Orosz@GergelyOrosz·1d

I get asked every now and then: "how do I become more influential, as a dev/eng leader, better known online?" Either for themselves, or to help put their company on the map. The single best thoughts on this come from @Lethain, from this article: lethain.com/tech-influence…

English

313

22.1K

VerbumEng@VerbumEng·17h

the Skype parallel you mentioned is the part that should worry GitHub users most. same playbook: acquire, keep independent, remove the CEO, merge into a division, let it atrophy. the question is whether GitHub is too strategically important for Microsoft to let that happen or if they genuinely don't see it slipping.

English

137

Gergely Orosz@GergelyOrosz·1d

If Microsoft wanted GitHub to thrive as a business, it would have a CEO. GitHub has had no CEO, close to a year. This is no mistake, but internal political battles inside of Microsoft. Welcome to corporate politics where a business unit bleeding out is just a side effect.

Janaka Abeywardhana@janaka_a

It's now obvious that not backfilling the GH CEO position was a mistake. Copilots recent plan change included. These are major fuck ups.

English

1.1K

161K

VerbumEng@VerbumEng·17h

the risk is that phase 2 becomes a negotiation with an agent about code you don't fully understand anymore. if you ignored the code in phase 1, you're trusting the architecture skill to catch structural problems you wouldn't recognize yourself. works if the skill is good enough, breaks quietly if it isn't.

English

158

Matt Pocock@mattpocockuk·1d

Trying a new flow this week: 1. Iterate super-fast, right on the edge of sanity, ignoring the code, until I hit a prototype I like 2. Pull it back into huge testable units via /improve-codebase-architecture Let's see if I can polish the vibe coded turd

English

538

34K

VerbumEng@VerbumEng·17h

@mattpocockuk the glossary is a smart move. half the friction with agent-driven refactoring is arguing about what "too coupled" or "god module" actually means. giving both sides a shared dictionary before the work starts probably cuts the back-and-forth in half.

English

147

Matt Pocock@mattpocockuk·1d

FYI I just shipped a huge improvement to /improve-codebase-architecture It now ships with a glossary of terminology to describe good/bad codebases Essential reading for anyone wanting to improve their codebases: github.com/mattpocock/ski…

English

205

2.9K

127.1K

VerbumEng@VerbumEng·17h

@mattpocockuk this would save so many tokens. right now the agent reads 500 lines to understand a 3-line interface. type signatures first, implementation on demand is basically progressive disclosure for context windows.

English

Matt Pocock@mattpocockuk·1d

One thing I wish harnesses did by default: When opening a file, FIRST pre-compile the file and extract only the type signatures and comments for that file (with tsgo this would be instant). Then, if you want to see the implementation, only unwrap the functions you're interested in. Essentially .d.ts for the first step, .ts for the second. Would save a ton of tokens and allow agents to explore more aggressively.

English

394

31K

VerbumEng@VerbumEng·17h

depends what gap you're measuring though. benchmarks, coding ability, long-context coherence, tool use? each one has a different answer. the interesting scenario is if it closes the gap on some axes but not others, because then model choice becomes a routing decision, not a ranking.

English

380

dax@thdxr·1d

haven't tried it yet but this is the model i've been waiting for this will basically tell us what the gap between opensource and closed source models is

OpenCode@opencode

DeepSeek V4 Pro and Flash now available in Go We rushed to get this released, still working out the capacity and usage limits Thanks to the @deepseek_ai team for the PRs and fixes

English

1.6K

117.9K

发现

@mattpocockuk @juanluiscr27 @levelsio @marclou @simonw @TheAhmadOsman @elonmusk @BarackObama