Jonathan Malkin 🦊 | Building with Claude

1K posts

Jonathan Malkin 🦊 | Building with Claude

@builtwithjon

20yr enterprise tech → education & community founder. Building the whole thing with Claude as co-founder. AI in production, not theory. Austin 🦊

Austin Katılım Temmuz 2019

221 Takip Edilen55 Takipçiler

Jonathan Malkin 🦊 | Building with Claude@builtwithjon·4h

@theo Was there any doubt?

Rollingwood, TX 🇺🇸 English

Theo - t3.gg@theo·5h

Called it, they are gonna use Cursor’s data to leapfrog

Elon Musk@elonmusk

@beffjezos Our recently completed Grok V9 1.5T run is looking great and that is before Cursor data is added in supplemental training

English

1.7K

127.1K

Jonathan Malkin 🦊 | Building with Claude@builtwithjon·6h

@PawelHuryn True. A set of loops helps. Daily archive, weekly reflection.

English

Paweł Huryn@PawelHuryn·8h

@builtwithjon I'm not sure you can fix it completely differently. The traditional memory works only if feed your agent with carefully curated facts.

English

Paweł Huryn@PawelHuryn·8h

RE: Memory Greg's right on the trajectory. But "memory" by itself fails at month three. Every artifact gets appended. Contradictions flatten into fake consensus. The agent drifts silently. What compounds is a hierarchy: raw source (immutable) → working memory (tagged observations) → durable knowledge (active hypotheses, synthesized facts, committed decisions, stakeholder state) Promotion criteria, signal strength, relevance, and trust earn move up. Then a weekly sweep: promotes what's confirmed, surfaces contradictions instead of collapsing them, compresses patterns, archives what shipped. "Starting today vs starting in 6 months" is real. But the unfair advantage only compounds if the loop runs. Without the sweep, you're storing, not learning. (Shipping PM Brain OS this week.)

GREG ISENBERG@gregisenberg

More AI agent observations below (I keep adding to the list): 1. Hermes agents write to their own memory after every task. Which means starting today versus starting in 6 months is an unfair advantage for you. 2. We're maybe 12 months from an agent that can watch you work for a week and then do your job without any instructions. The screen recording plus agent memory plus local model combination makes this possible right now 3. The real reason local models matter for founders: you can ship a product where the AI runs entirely on the customer's device and you never touch their data. Zero privacy concerns. Zero server costs. Zero compliance headaches. That changes which industries you can sell to overnight. Healthcare, legal, finance, all the regulated verticals that won't send data to the cloud just opened up. 4. Every company needs to be rebuilt as a "second brain" before agents can be useful. That means every process, every decision, every piece of institutional knowledge has to exist in a format an agent can read. Most companies have none of this. 5. Agent costs are the new headcount. Won't be crazy for companies to spend 50%+ of their total headcount cost on tokens. 6. Agents are accidentally creating internal competition at companies. The marketing agent and the sales agent are optimizing for different metrics and working against each other without anyone realizing it. It took humans decades to develop cross-functional alignment. Nobody thought about it for agents. 7. The YAML config file is becoming the new org chart. Who reports to who, what permissions they have, what tools they access, all defined in a config file. The company's structure is literally a file you can version control, fork, and deploy. That's new. 8. The first agents that can smell a scam are going to be worth billions. Right now agents will happily wire money to a fake invoice because it matched the format. The trust layer is completely missing. 9. We're about to find out that most "expertise" was actually just memory. Knowing the tax code. Knowing the case law. Knowing which supplier charges what. When an agent holds all of that in context, the expert's value shifts from "I know things" to "I know which things matter." Much smaller group of people. 10. We're all running the same models. The differentiation is in what you feed them. Two founders with the same agent, same model, same tools will get wildly different results based purely on the quality of their knowledge base. Garbage context in, garbage output out. Forever. 11. The most underbuilt category in AI right now: agents for old people. 70 million boomers who need help with medical forms, insurance claims, and appointment scheduling. 12. Agent latency is the new page load speed. If your agent takes 45 seconds to respond, your customer already switched to one that takes 13. Skills files are the new apps. A SKILL.md that tells an agent how to do one thing well is more valuable than a SaaS subscription that does the same thing behind a login screen. 14. AI hardware... how do you create devices that are good businesses that people want? It'll be a $30 dongle you plug into existing dumb devices to give them an agent brain. Smart toaster doesn't need to be built from scratch. It needs a $30 brain attached to a $15 toaster. 15. Your agent can read faster than you can think. The bottleneck in every agent workflow is now the human approval step. We're the slow part. That's a strange thing to sit with. 16. Agents made the 80/20 rule violent. The 20% of work that matters is now the only work humans do. The 80% just disappeared. Entire job descriptions were hiding inside that 80%. 17. The thing I keep coming back to: the best businesses right now are being built by people who are just slightly ahead of their customers. Not 10 years ahead. 6 months ahead. That's the sweet spot. Far enough to lead. Close enough to be understood.

English

1.2K

Jonathan Malkin 🦊 | Building with Claude@builtwithjon·8h

@Jacobsklug OS

Jacob Klug@Jacobsklug·18h

I built a Claude Cowork OS that replaces OpenClaw and runs on autopilot. Manages my business & personal life tasks. I created the whole playbook so you can re-build it tonight. What's inside: • The exact foundation prompt • 3 level orchestration map • Memory template for global context • Routing table for file management • Starter workstations (finance, content, community, habits) • Project file structure • Single prompt that builds the entire folder tree Follow + Comment 'OS' and follow. I'll DM it to you.

English

529

426

43.6K

Jonathan Malkin 🦊 | Building with Claude@builtwithjon·8h

@PawelHuryn @kir_varlamov Right on. That said I have rebuilt all the pieces around the harness six times now. And don't from one harness to three to reduce the vendor and pricing risks.

English

Paweł Huryn@PawelHuryn·26 Nis

@kir_varlamov Two layers. There's the harness Anthropic ships, and the system you build around it - knowledge, skills, subagents. Saying the second one is their job is like saying they own the software your agent writes.

English

553

Paweł Huryn@PawelHuryn·26 Nis

x.com/i/article/2048…

ZXX

391

327.9K

Jonathan Malkin 🦊 | Building with Claude@builtwithjon·8h

@PawelHuryn I was thinking the same thing. Been using Claude code on xhigh all week.

English

Jonathan Malkin 🦊 | Building with Claude@builtwithjon·8h

@mcuban Just what I want. To send all my financial data to @sama

English

2.1K

Mark Cuban@mcuban·11h

Message to entrepreneurs Your product is their feature

ChatGPT@ChatGPTapp

A preview for Pro users: a new personal finance experience in ChatGPT. Pro users in the U.S. can securely connect financial accounts, see where their money is going, and ask questions based on the information they choose to connect. Your full financial picture, now in ChatGPT.

English

350

296

6.4K

1.7M

Jonathan Malkin 🦊 | Building with Claude@builtwithjon·8h

@outsource_ @grok My new Hermes agent will have similar capabilities with a continuously running researcher and reflective background agents including a dreamer.

English

133

Eric ⚡️ Building...@outsource_·9h

What is the GBrain nonsense forreal? @grok

Garry Tan@garrytan

The biggest alpha leak of 2026 is that you can tokenmax $10k/mo with OpenClaw/Hermes + GBrain and get the AI that everyone will have in 2028 for $100/mo, but you can get it now, and that is the biggest single unlock you can have vs your competition

English

1.4K

Jonathan Malkin 🦊 | Building with Claude@builtwithjon·10h

My routing after this benchmark: Writing, quality matters: GPT-5.4 Writing, cost matters: MiniMax M2.7 (94% quality, 50% cost) Agent/tool-use/multi-turn: Kimi K2.6 Low-latency short prompts: MiniMax M2.7 Full data: builtwithjon.com/articles/kimi-…

English

Jonathan Malkin 🦊 | Building with Claude@builtwithjon·10h

The strangest finding: GPT-5.4 on single-turn prompts: 38.6s GPT-5.4 on 13-turn agent workloads: 927.6s That's a 24x penalty. Highest of any model in the set. Kimi K2.6's penalty: 8.75x. MiniMax: 10.9x.

English

Jonathan Malkin 🦊 | Building with Claude@builtwithjon·10h

Two open-weight models quietly caught GPT-5.4 this month. Kimi K2.6 and MiniMax M2.7 are now within 5 quality points on writing. Both 2x faster on agent tasks. One is half the price. The frontier is getting crowded.

English

Jonathan Malkin 🦊 | Building with Claude@builtwithjon·11h

@outsource_ How are you running a turboquant model on mac? What's your runtime?

English

Eric ⚡️ Building...@outsource_·11h

@builtwithjon This one

Eric ⚡️ Building...@outsource_

Using bench-loop.com to test new models on my studio / hardware stack Landed with this majentik/Qwen3.6-35B-A3B-TurboQuant-MLX-4bit running 60.1 tk/s 🚀🚀

English

Jonathan Malkin 🦊 | Building with Claude@builtwithjon·11h

@outsource_ - I'm sure you're dying to know how qwen performs on my specific machine. 🤣

Jonathan Malkin 🦊 | Building with Claude tweet media

English

Jonathan Malkin 🦊 | Building with Claude@builtwithjon·11h

@outsource_ Not showing that model in benchloop for download. It's only showing on huggingface. Also, when I pick an MLX model the pop-up only has a search for GGUF button and no download.

English

Jonathan Malkin 🦊 | Building with Claude@builtwithjon·11h

@outsource_ 27 or 35?

Eric ⚡️ Building...@outsource_·11h

@builtwithjon Nice have you run 3.6 I got a good version on my M1

English

Jonathan Malkin 🦊 | Building with Claude@builtwithjon·11h

@outsource_ How about light mode?

English

Eric ⚡️ Building...@outsource_·11h

@builtwithjon Okay perfect will add to the list

English

Jonathan Malkin 🦊 | Building with Claude@builtwithjon·12h

Loving Bench Loop for testing local models. Finally something decent for basic testing! Still would like to see some more advanced end to end workflow testing but this is a great start. Have you seen any more in-depth scenario testing local benchmark apps?

English

Jonathan Malkin 🦊 | Building with Claude@builtwithjon·11h

@outsource_ Would love some more complex tests like creative writing and full SDLC. Not sure what, if any, research is available on that or if anyone does that kind of benchmarking.

English

Eric ⚡️ Building...@outsource_·11h

@builtwithjon Let me know what to add I shipped this to continue to develop the features everyone wants! Add a PR etc

English

Jonathan Malkin 🦊 | Building with Claude@builtwithjon·11h

@outsource_ Wait! You built this?!! Amazing that I found it from a search or chat and not directly from your feed!

English

Keşfet

@theo @PawelHuryn @Jacobsklug @kir_varlamov @mcuban @sama @outsource_ @grok