JP Wallhorn

3.9K posts

JP Wallhorn

@jpwallhorn

Head of Engineering @GoSuppli | Prev. @attentiveHQ & Founder

Katılım Ekim 2011

1.3K Takip Edilen432 Takipçiler

Sabitlenmiş Tweet

JP Wallhorn@jpwallhorn·7 May

x.com/i/article/2052…

ZXX

JP Wallhorn@jpwallhorn·23h

The benchmark you can trust. At least it’s 10x more accurate than the rest.

Serena Ge (Datacurve)@serenaa_ge

Today we’re releasing DeepSWE, a new standard for agentic coding benchmarks. On public leaderboards, top models often look relatively close in capability. DeepSWE shows where they actually diverge, reflecting the realistic experience of developers in their day-to-day work.

English

JP Wallhorn retweetledi

Garry Tan@garrytan·1d

This is the new standard for engineering evals

Serena Ge (Datacurve)@serenaa_ge

English

812

104.3K

JP Wallhorn@jpwallhorn·23h

@theo Much better.

English

Theo - t3.gg@theo·1d

This is the first code bench that actually aligns with how it feels to use these models coding.

Serena Ge (Datacurve)@serenaa_ge

English

109

152

3.4K

268.3K

JP Wallhorn@jpwallhorn·1d

@kushalbyatnal this looks very promising

English

Kushal Byatnal@kushalbyatnal·1d

Over 1 billion PDFs are created every day, but your agents still can’t read them reliably. Today we’re releasing Parse 2.0, the most accurate document parsing API in the world. Extend already processes millions of pages daily for leading AI teams like Brex, Mercury, Opendoor, Flatiron Health, and hundreds of others. Now, its even better. Parse 2.0 is SOTA quality on RealDoc-Bench, our open source benchmark that measures agent success rate on real world docs that agents actually encounter in production. We trained Parse 2.0 on 1M+ pages of the hardest documents seen in production. Here’s how it stacks up: - #1 in healthcare, real estate, logistics, and financial services - 95.7% agent Q&A accuracy on 581 docs (next best: 92%) - 0.847 F1 on layout (next best: 0.759) Give it a try today and build production-ready document agents with Extend.

English

136

1.3K

542.9K

JP Wallhorn@jpwallhorn·2d

@SMB_Attorney More importantly stay active.

English

SMB Attorney@SMB_Attorney·2d

Guys, stay alive and healthy for as long as possible. The advances in medicine right now are crazy and accelerating. Over the next 20 years, they’re going to cure everything.

Crémieux@cremieuxrecueil

Eli Lilly has done it. They've gone and made what seems to be a powerful, permanent gene therapy for LDL cholesterol. That means they'll be able to effectively prevent most heart disease with a single infusion!

English

103

2.1K

239.5K

JP Wallhorn@jpwallhorn·2d

@jsawadd Don’t do a contractor. Somebody has to own the outcome long term.

English

234

Jonathan Awad@jsawadd·2d

We need an engineer at Baselayer (maybe contractor?) who AI pills our GTM org - sets up an AI forward system infra for our whole GTM org - builds our “second brain” - helps set up Claude co-work for everyone, from BDRs to sol Eng We will pay top dollar - who’s down?

English

167

30.9K

JP Wallhorn@jpwallhorn·2d

I’m so exited for @xai and @cursor_ai to ship the next model and agent harness. It’s going to be very difficult to beat this team. What a luxury to have so much competition. It forces everybody to ship 10X more.

English

JP Wallhorn@jpwallhorn·2d

Interesting.

Lenny Rachitsky@lennysan

My biggest takeaways from @danshipper: 1. The future of work will happen inside Codex or Claude Code. Instead of putting AI into your SaaS tool, you’ll use your SaaS tools inside your favorite AI agents' in-app browser. Dan spends all his time in Codex now—writing documents, managing email, doing research, everything. He's using Google Docs, PostHog, and everything he needs within the agent's in-app browser. The agent can see what he’s doing, and has all of his context, so he and his agent collaborate quickly and super effectively. 2. Automation is a lie—every automation needs a human. Dan's company doubled in size this year despite being incredibly AI-forward. Why? Because in order to make automation work well, you need humans making sure everything keeps working. This is why benchmarks are misleading—they measure AI on problems we’ve already framed and can score, but there’s always a higher frame. 3. PMs will win the AI era. Marcus, a former PM who previously ran Axios’s writing product, joined Every after getting super AI-pilled. Now he runs their product Spiral, and ships faster than anyone on the team. He pairs technical knowledge with spiky product sense, deep user empathy, and an eye for what matters. Dan thinks any PM who gets really AI-native will be incredibly dangerous because the building is done for you—what matters is figuring out what to build and if it’s great. 4. Full-stack designers are becoming superheroes. Designers used to make beautiful interactions that engineers didn’t want to build or couldn’t execute properly. Now designers don’t need to hand things off; they can build it themselves. Designers are naturally creative people, and AI is the perfect tool for them because it lets them bring their vision to life without the traditional bottlenecks. 5. SaaS is not dead. In fact, Dan is bullish on SaaS stocks. When users bring their own AI (via Codex or Claude Code) to use SaaS products, the user—not the SaaS company—pays for tokens. This saves SaaS company’s margins. Since the agents need their own seats, Dan predicts that agents will create massive new demand for SaaS because there will be tons of agents using these products at high volume. 6. Every company will have one “super-agent” inside their Slack that every employee will use. Dan initially thought every employee would have their personal work agent, like a shadow AI org chart, but he’s completely flipped his view. He realized agents need humans who care about them. When someone gets tired of maintaining their personal agent, it becomes useless. The winning model is one forward-deployed engineer or AI-savvy person who maintains a company-wide agent (like Shopify’s River or Viktor), and then it trickles down to more specialized team agents as models improve and become less fiddly. 7. The AI job apocalypse is not happening, but you do need to evolve to stay relevant. Models make yesterday’s human competence cheap. But because everyone uses the same models, it all looks the same if you use it the default way; it becomes commoditized slop. Humans then take that frozen competence and use it to make something new and interesting for their specific situation. The key: “ride the models”—use them for everything you do, try new models when they drop, keep turning over rocks. 8. We will read way more AI-generated writing, and we will like it. Human writing is incredibly important for things that matter, but for internal docs, planning, and email, AI-generated is often better because most people are bad at writing strategy documents. 9. Build software for humans and agents to use together. The current model is building a CLI that an agent uses independently. Instead, you and your agent should be using the app together. This creates new design challenges—agents can make a billion requests in three seconds, so you need approval flows, inboxes that summarize what happened, logs, and easy rollback. 10. Forward-deployed engineers are the new most essential role. The big model companies have teams of people managing their internal agents, and those teams aren’t going away. It’s different from traditional software building, and certain engineers love it. As models get better, this role will evolve—you’ll be managing more agents doing more things.

English

JP Wallhorn retweetledi

Lee Robinson@leerob·3d

You might believe you should spend less time thinking about code because of AI. I strongly disagree! We’re watching this play out live where tons of AI generated code becomes a liability. At the end of the day, an engineer needs to be responsible / on call for code that gets shipped to production. If you don’t understand the system you’re trying to debug, you’re probably going to have a bad time. Yes, AI can help with all of this, if you set up the proper systems. You can have agents triage prod logs, look at errors, etc. You can speed up parts of the investigation, but an engineer needs to make the call. There might be serious customer or financial implications from that change. I expect the trend continue for trimming dependencies, vendoring code so you can modify it directly, preferring simpler systems with fewer abstractions, and spending waaaay more time thinking about system design and code maintenance. I’ve said this before, but it’s a great time to get familiar with CS fundamentals and some of the history behind what great software looks like. Many parts will be different in the coming years as AI progresses, but also a lot more than people realize will stay the same.

English

264

527

4.1K

586.1K

JP Wallhorn@jpwallhorn·4d

One of the best. True insight

Dan Shipper 📧@danshipper

We’ve automated every single thing we can @every with AI agents. And yet there’s way more human work to do than ever. We’ve gone from 4 -> 30 human employees since GPT-3. I wrote a report on the structural reasons: how AI makes expert competence cheap, why that drives up demand for experts, and why the dynamic only intensifies as we approach AGI. After Automation: every.to/p/after-automa…

English

JP Wallhorn retweetledi

David Marcus@davidmarcus·6d

Say what you want about @spencerpratt, but in my nearly 20 years in California, I’ve never seen a more concrete, common sense plan to end homelessness, make our streets safer and enable small businesses to thrive again. Watch for yourselves, and vote!

English

427

3.2K

56K

JP Wallhorn@jpwallhorn·5d

Very good post, worth the read.

Zeb Evans@DJ_CURFEW

Today we reduced headcount by 22%. The business is the strongest it's ever been. So I think it's important to be direct about what I'm seeing and why. First, I made this decision and I own it. I did it because the way to operate at the highest level of productivity is changing, and to win the future, ClickUp needs to change with it. Second, this wasn't about cutting costs. Most savings from this change will flow directly back into the people who stay. We'll be introducing million-dollar salary bands. If you create outsized impact using AI, you'll be paid outside of traditional bands. Most importantly, I have the deepest gratitude for those affected. We're doing this from a position of strength specifically so we can take care of people properly. Everyone affected receives a package aimed at honoring their contributions and easing the transition. I only see two options: wait for this to play out gradually in the market or be honest about what I'm seeing and act proactively. THE 100X ORGANIZATION The primary change is that we're restructuring around what I call 100x org. The goal is 100x output. The roles required to build at the highest level are fundamentally different than they were a year ago. Incremental improvements to existing systems won't get us there. We need new ones. That means creating enough disruption to rebuild rather than iterate on what's already broken. The common narrative is that AI makes everyone more productive. It doesn't. Many of the workflows of today, if left unchanged, create bottlenecks in AI systems. These roles will evolve. But waiting for that to happen naturally means falling behind now. The 100x org is actually heavily dependent on people - infinitely more than today. This is only possible with 10x people that have embraced and adopted new ways of working. THE BUILDERS, AGENT MANAGERS, AND FRONT-LINERS — THE BUILDERS: 10X ENGINEERS I don't think most companies have internalized what's actually happening with AI in engineering. The common narrative is that AI makes all engineers more productive. That may be true in isolation, but at an organization level - that is the farthest thing from reality. Here's what we've validated recently at ClickUp: the great engineers, the ones who can orchestrate, architect, and review, are becoming 100x engineers. They're not writing code. They're directing agents that write code. The skill is judgment. AI makes the best engineers wildly more productive, and everyone else using AI slows these engineers down. Think about it - the bottlenecks are (1) orchestration - telling AI what to do, and (2) reviewing - what AI did. Everything is leapfrogged and no longer needed. So who do you want orchestrating and reviewing code? And how do you want your best engineers to spend their time? If your best engineers are spending time reviewing other people's code, then this is inherently an inefficient bottleneck. These engineers can review their agent's code much faster than reviewing human code. The new world is about enabling your 10x engineers to become 100x. The wrong strategy is to push every engineer to use infinite tokens. Companies doing this are celebrating 500% more pull requests. But customer outcomes don't match the volume of code being generated. I call this the great reckoning of AI coding, and every company will face this soon if not already. More code is just another bottleneck to the best engineers, and ultimately to your company's impact as well. — THE BUILDERS: 10X PRODUCT MANAGERS Product management and design roles are merging. Designers that have customer focus, become more like product managers. And product managers that have intuition for UX become more like designers. The bottleneck of user research is gone. It takes us just one mention of an agent to kickoff research and analyze results. The bottleneck of product <> design iteration is also gone. The product builder iterates on their own, along with agents and skills that ensure alignment with quality and strategy. Also controversial today - I believe that the wrong strategy is to have your PMs shipping code - that just introduces another bottleneck that the best engineers will waste their time on. To be clear, PMs should be coding but they should do this in a playground to iterate, validate, and scope. That code should not go to production. Everything outside of managing systems, orchestrating AI, and reviewing output becomes a bottleneck. That's why the other roles that are critical along with these are the systems managers (to reduce bottlenecks) along with a bottleneck you can't replace - customer meeting time. — THE SYSTEM MANAGERS Ironically, the people that automate their jobs with AI will always have a job. They become owners of the AI systems - agent managers. We have many examples of these people at ClickUp. The underlying systems in which we operate are absolutely critical to get right. I think most companies are delusional to think they can iterate on existing systems and compete in this new world. You must create enough disruption so that old systems are deprecated entirely. If there's any definition for 'AI native' that's what it is. — THE FRONT-LINERS In a world that will become saturated with AI communication, the human touch will matter more than anything to customers. This is a bottleneck that you shouldn't replace - even when agents are high enough quality to do video meetings. One-on-one meeting time with customers is something that shouldn't be automated. The systems around the meetings should be - so that front-liners spend nearly 100% of their time with customers. REWARDING 100X IMPACT In a world where companies are able to do so much more with less, where does that excess money go? In our case, much of the savings in this new operating model will flow directly back to those that enabled it. We must reward people that create productivity accordingly. This aligns incentives on both sides. Plus, in a world where your best people create 100x impact, you can't afford to lose them. You should aim to retain these employees for decades. The context they have and their ability to efficiently orchestrate and review will be nearly impossible to replace. Compensation bands of today should be thrown out the door. We're introducing $1 million cash/year salary bands with a path available to nearly everyone in the company if they produce 100x impact by creating or managing AI systems. THE FUTURE Nearly every company will make changes like these. The ones that do it proactively will define what comes next. The future is not fewer people. It's different work, new roles, and better rewards for those who embrace it. We're already seeing entirely new roles emerge, like Agent Managers, that didn't exist a year ago. ClickUp is positioning to lead this shift, not just internally, but for our customers too. I've never been more certain about where we're headed.

English

JP Wallhorn@jpwallhorn·20 May

this is an absolute banger project

Open Design@nexudotio

How to use Open Design inside Codex: Ask Codex to deploy Open Design locally, then tell Codex what you want to build. The real unlock: Codex can operate the design tool for you — choosing workflows, filling briefs, inspecting files, testing interactions, and iterating on the result. Design tools are becoming agent-operated.

English

JP Wallhorn@jpwallhorn·20 May

@joshm @browsercompany I’ve been a huge fan but the latest change that resulted moving towards artifacts and answering simple questions in a whole html page is painfully slow

English

136

Josh Miller@joshm·19 May

👇 @browsercompany team has a mandate to make AI feel more playful, visual, alive -- case study #1:

Adam Stern@adamstern_

@diabrowser has a new 3D logo orb that appears throughout the product 🔮 Here’s a few details on how we built this...

English

395

59.6K

JP Wallhorn@jpwallhorn·20 May

@mvanhorn Well deserved

English

Matt Van Horn@mvanhorn·20 May

Still can’t believe this got 1.2m views. Thanks, all! And what’s even more wild is how the library has more than doubled in size from the community making CLIs.

Matt Van Horn@mvanhorn

Introducing the Printing Press, a CLI-factory and a CLI-library. Built with @trevin. 🏭🖨📚 Most APIs suck for agents. Most MCPs suck for agents. Most official CLIs suck for agents. They waste tokens and time. @steipete started making his own because of this. 📚 A Library of agent-native CLIs you install today (Linear, ESPN, Flight GOAT (Google Flights + Kayak nonstop), Contact Goat (LinkedIn + Happenstance + Deepline more) +30+ more) 🏭 A factory that prints new ones for any service - just type /printing-press CLIs are fast, local, SQLite-backed. Work in Claude Code, Codex, OpenClaw, Hermes. 🌐 printingpress.dev

English

126

21.5K

JP Wallhorn@jpwallhorn·20 May

@PolymarketMoney It’s happening. They just want to avoid redoing the S1 - totally fine

English

724

Polymarket Money@PolymarketMoney·20 May

SpaceX is reportedly planning to acquire Cursor 30 days after its IPO.

English

161

2.6K

151.1K

JP Wallhorn@jpwallhorn·20 May

@mweinbach Sounds like a lawsuit. Bundling is where Microsoft got tripped up

English

263

Max Weinbach@mweinbach·19 May

The $20 Gemini sub is really hard to beat It includes Google Health, YouTube Premium Lite (less ads, background play), Gemini app, Antigravity, 5TB of storage, Google Home Premium Value prop here is really hard to beat

English

177

149

4.9K

306.1K

JP Wallhorn@jpwallhorn·20 May

@thesamparr What’s actually worked for me is AI agents that have access to your accounts and notify you about stuff that you believe is important. It takes some time to fine tune it but it allows you to disable all other notifications by default and just rely on the agent

English

191

Sam Parr@thesamparr·20 May

The amount of notifications I receive daily are so distracting its nearly debilitating. I have a social media following, so that adds to it -- but even people who aren't active on social -- are you feeling this? Slack, email, phone texts, imessage on my computer, calls, insta, linkedin, DMs on all those platforms...and then people in office. Every month I do the same thing: delete them all from my phone/desktop. But I'm human - they're very hard to fully ignore. Retention/memory is feels worse than before, flow is harder to get into, focus time is much shorter. The only solution that seems to work is print work I need and do it manually without phones/computer in the room. Feels like this generation's obesity will be notifications. Attention diabetes!

English

107

16.5K

JP Wallhorn@jpwallhorn·20 May

@theo @OpenAIDevs It’s honestly really good. Banger

English

Theo - t3.gg@theo·20 May

Honestly I'm still really impressed with the Codex app. It works reliably. It adds useful features consistently. It has taste. The mobile integration is awesome. The git integration is solid. If you haven't used it yet, I highly recommend it.

English

223

104

4.1K

835.6K

JP Wallhorn@jpwallhorn·19 May

Composer 2.5 from @cursor_ai outperforms Claude Opus 4.7 on their benchmarks. At 10x lower cost. And it's faster. That's not an incremental update, that's a pricing and performance reset in one release. The actual capability shift: better at sustained work on long-running tasks + more reliable on complex instruction chains. For production AI systems, that's where the cost lived. Retries, broken flows, manual recovery. And this is just the bridge. @cursor_ai + SpaceXAI are training a new model from scratch on Colossus 2's million H100-equivalents with 10x total compute. This is a compute sovereignty play and the moat is just starting to get significant.

English

Keşfet

@theo @kushalbyatnal @SMB_Attorney @jsawadd @xai @cursor_ai @spencerpratt @joshm