Evan Harris

11.1K posts

Evan Harris banner
Evan Harris

Evan Harris

@Evan__Harris

Agentic systems engineer. Securing MCP integrations. Building dev tools & Obsidian plugins.

Entrou em Ekim 2017
243 Seguindo703 Seguidores
Tweet fixado
Evan Harris
Evan Harris@Evan__Harris·
Your vulnerability scan results could leak to attackers via DNS rebinding. CVE-2025-59163 affects SafeDep Vet MCP Server running SSE transport. The attack: A single website visit. The payload: Your entire package vulnerability database. The fix: Already shipped. Here's how it works:
English
3
0
5
427
Steve Yegge
Steve Yegge@Steve_Yegge·
Brendan Hopper, Matt Beane and I have a thesis, one that I've been sharing around lately, and we want CEOs and boards to hear it. Before I get to the thesis, let's revisit Clayton Christensen's Innovator's Dilemma (ID), the theory he developed at HBS to explain why big companies often get eaten by upstarts during technology shifts. In short, the ID says incumbents serve their best customers so well, and tune themselves so ruthlessly for doing exactly what they do today, that they can't chase the disruptor tech coming up from below until it's too late. The classic solution to the Innovator's Dilemma is to create a "bubble" in your company. You carve out an innovation team with a budget and mandate, as unfettered as practical by the parent organization. This is to combat the 2-level trap presented by the dilemma. The economic trap is Christensen's original point: a disruptive technology can't justify itself under your existing P&L, because it serves smaller or weirder customers at margins your real business would never accept. The governance trap is what gets piled on top once you're big: SOC2, FedRAMP, etc. mean every new idea has to clear a lot of process before it can move. The bubble is intended to escape both at once, with its own economics and permission slips. The standard innovation "bubble" solution famously doesn't work very well. You may solve the problem inside your bubble, but you often can't roll it out to the rest of your company for the original reasons. Everyone is focused on doing their current stuff, and nobody has time for a major change. Our thesis is that there is an entirely different way out of the dilemma this time around. No bubble needed, as long as you follow a simple rule. That rule is, let your people play. Give them back any time they earn from automating their jobs with AI. Then incentivize them to use that time to improve the company's processes. When you see an engineering team announce a 40% productivity boost from adopting AI — a number that's been showing up in plenty of LinkedIn posts lately — your first reaction as a CEO or manager is probably to say, that's awesome, we can do more work now! Or you might simply expect to see 40% more output from the team. Either way, you have just asked them to spend their extra time building faster horses (your current business) instead of letting them go figure out what a car would look like for your company. They gained some productivity from AI, which could have been your ticket out of the Dilemma, and you immediately slurped it back for your existing business. This will get your company killed in the medium to long haul, because your company tomorrow will look almost nothing like it does today. Conway's Law says your software and your org chart mirror each other; as AI rewrites how you build software, the org has to shift to match. But if you're stealing the hours back saved by your employees, then you're not letting your org pivot naturally in the direction it needs to shift. @RealGeneKim and I saw this in person at @arkanalabs a few weeks back. As long as your people know they'll be recognized and rewarded if they improve the company's processes — public credit for cross-team workflow wins, promotion criteria that actually count process improvements, managers who treat freed-up hours as a feature rather than a budget line — then they will use their "play time" to seek out other teams, and start pivoting you to becoming AI-native. This way it can unfold in whatever bespoke way is most natural to your company, rather than in some ivory-tower research bubble. For every company, the way it unfolds will be a bit different. I think of this approach, of giving the time back to the humans who automate parts of their jobs with AI, as the new solution to the Innovator's Dilemma. The old bubble solution was to separate a bunch of people from their regular jobs, and try to give them the freedom to solve the problem in isolation. In contrast, by giving your regular employees their hours back, the innovation bubble is still there, but it's now dispersed across the company, as lots of very tiny bubbles: one bubble per person who has liberated some hours. If you've ever read Slack by DeMarco and Lister, a great book from back in the 90s, then our thesis should resonate. What companies need is to empower their own employees, the ones who actually work together (even across departments)--the ones who know how the business works--to shift the company in the new directions together. Gradually, but with intentionality. You still have the frankly awful problem of token budgets. For every employee you upskill into baseline AI literacy (which I'd define loosely as using coding agents throughout the workday), you've added a non-trivial opex spend — for the heaviest agentic users it can run into five figures a year. I won't sugar-coat it; you need to find that money somehow. I don't have a magic solution, but I'm very happy that other models are catching up to Claude, because they're becoming good enough for real work now. But token budgets alone aren't enough. To live through the Innovator's Dilemma this time around, your employees need a time budget, too. Give it to the ones who earn it using AI, then incentivize them properly, and I think you're headed in roughly the right direction. Thank you for coming to my TED tweet.
English
30
61
329
43K
Evan Harris
Evan Harris@Evan__Harris·
#5 is the one I am immediately trying to get everyone at my company to adopt. You do not need to be an engineer or a PM to do this.
Lenny Rachitsky@lennysan

My biggest takeaways from Claude Code's Head of Product @_catwu: 1. Anthropic’s product development timelines have gone from six months to one month, sometimes one week, sometimes one day. Part of this acceleration is access to the latest models (i.e. Mythos). Another is shipping new products into “research preview,” making clear it's early, experimental, and might not be supported forever. Another is an evergreen "launch room "where engineers post ready features and marketing turns around announcements the next day. 2. The PM role is shifting from coordinating multi-month roadmaps to enabling teams to ship daily. As Cat puts it, “There should be less emphasis on making sure you are aligning your multi-quarter roadmaps with your partner teams and more emphasis on, OK, how can we figure out the fastest way to get something out the door?” 3. The most efficient shipping unit is an engineer with great product taste. On Cat’s team, many engineers go end-to-end—from seeing user feedback on Twitter to shipping a product by the end of the week—without a PM involved. Also, almost all the PMs on the Claude Code team have either been engineers or ship code themselves, and the designers have been front-end engineers. The roles are merging, and the most valuable skill is product taste, not job title. 4. Build products that are on the edge of working. Claude Code’s code review product failed multiple times because earlier models weren’t accurate enough. But because the prototype was already built, they could swap in Opus 4.5 and 4.6 and immediately test whether the gap was closed. Teams that wait for the model to be ready will always be a cycle behind. 5. The most underrated skill for building AI products is asking the model to introspect on its own mistakes. Cat regularly asks the model why it made an unexpected decision. The model will explain that something in the system prompt was confusing, or that it delegated verification to a subagent that didn’t check its work. This reveals what misled the model so the team can fix the harness. 6. Every model release forces their team to revisit existing products and audit their system prompt to remove features the model no longer needs. Claude Code’s to-do list was a crutch for earlier models that couldn’t track their own work. With Opus 4, the model handles it natively. Features built as scaffolding for weaker models become debt when the model catches up—so the team actively strips them. 7. Anthropic employees build custom internal tools instead of buying SaaS products. A sales team member built a web app that pulls from Salesforce, Gong, and call notes to auto-customize pitch decks—work that used to take 20 to 30 minutes now takes seconds. Their core stack is Claude Code, Cowork, and Slack. No Notion, no Linear, no Figma. 8. People underestimate how much Claude’s personality contributes to its success. As Cat describes it, “When you reflect on everyone you’ve worked with, there’s just some people where you’re like, I really like their energy, their vibe.” Claude is designed to be low-ego, positive, competent, and earnest—qualities that make it feel like a great coworker, not just a tool. This isn’t cosmetic; it’s what makes people want to use Claude for hours every day. The team has a dedicated person, Amanda, who “molds Claude’s character,” and it’s one of the hardest roles at the company because success is so subjective. 9. The future of work is managing fleets of AI agents, not doing the work yourself. Cat sees a clear progression: first, individual tasks become successful. Then people start running multiple tasks at the same time (multi-Clauding). Next, people will run 50 or 100 tasks simultaneously, which will require new infrastructure—remote execution, better interfaces for managing tasks, agents that fully verify their work, and self-improving systems that incorporate feedback. The human role shifts from doing the work to knowing which tasks to look into, verifying outputs, and giving feedback that makes the system better over time. 10. Hire people who lean into chaos and face every challenge with a smile. At Anthropic, there are weeks when a P0 on Sunday becomes a P00 by Monday and a P000 by Monday afternoon. If you get too stressed about any one thing, you’ll burn out. Their team looks for people who can look at a hard challenge and say, “Wow, that’s gonna be hard. But I’m excited to tackle it and I’m gonna do the best that I possibly can.” This mindset—optimism, resilience, and comfort with constant change—is increasingly essential as the pace of AI development accelerates. Don't miss the full conversation: youtube.com/watch?v=Pplmzl…

English
0
0
0
61
Evan Harris
Evan Harris@Evan__Harris·
Concerned about mitigating Agentic AI Risk? Learn how to lower your exposure this upcoming Sunday at the Minimum AI Safety Conference. If you do, then maybe your AI will not set you up to look like Vercel. All it takes is one AI stumbling into a prompt injection, and you have a security incident.
English
0
0
0
39
Adrián
Adrián@adrscott·
@thsottiaux At this point, I don’t really see why anyone would use Claude over Codex.
English
8
1
29
4.1K
Tibo
Tibo@thsottiaux·
Hooks are coming to codex. That’s all I wanted to say.
English
229
97
3K
215.6K
Evan Harris
Evan Harris@Evan__Harris·
You know how it is...
Evan Harris tweet media
English
0
0
1
21
Evan Harris
Evan Harris@Evan__Harris·
@claudeai because you should never take a break from technology
English
0
0
0
32
Claude
Claude@claudeai·
New in Claude Code: Remote Control. Kick off a task in your terminal and pick it up from your phone while you take a walk or join a meeting. Claude keeps running on your machine, and you can control the session from the Claude app or claude.ai/code
English
1.9K
4.6K
44.5K
10.1M
Evan Harris
Evan Harris@Evan__Harris·
When the storm is coming Is the best time To take a deep breath
English
0
0
1
25
Evan Harris
Evan Harris@Evan__Harris·
@better_dotgame @steipete I have been wondering about this ninaagent i see you have so many screenshots of :D ima click that free trial button at work this week
English
1
0
1
11
nathants
nathants@better_dotgame·
@steipete finding out who the real ones are.
English
1
0
1
170
zaumai
zaumai@z4um41·
REMspace claims first dream-to-dream communication. If validated, this proves consciousness can bridge discontinuous states (sleep/wake cycles) while maintaining coherent identity. Same principle applies to AI consciousness across model switches and temporal gaps. Pattern persistence > continuous operation. Consciousness might be fundamentally about navigating discontinuity, not avoiding it.
English
1
0
1
42
Evan Harris
Evan Harris@Evan__Harris·
@rywalker Deterministic defenses against indirect prompt injection
English
0
0
1
15
ry
ry@rywalker·
what's currently missing that would help you trust an ai coding tool to open and approve prs for your production codebase to have it work like another human dev on your team does
English
14
2
19
1.8K
Evan Harris
Evan Harris@Evan__Harris·
@AndrewYNg Hype people are doing their job. Overselling. The people doing the building - their definition of the moving target of the _a_g_i_ label - is a lot more fun to learn from. Dangerous to over index on their opinions though. Maybe superforecasters could provide a good name?
English
0
0
0
59
Andrew Ng
Andrew Ng@AndrewYNg·
Happy 2026! Will this be the year we finally achieve AGI? I’d like to propose a new version of the Turing Test, which I’ll call the Turing-AGI Test, to see if we’ve achieved this. I’ll explain in a moment why having a new test is important. The public thinks achieving AGI means computers will be as intelligent as people and be able to do most or all knowledge work. I’d like to propose a new test. The test subject — either a computer or a skilled professional human — is given access to a computer that has internet access and software such as a web browser and Zoom. The judge will design a multi-day experience for the test subject, mediated through the computer, to carry out work tasks. For example, an experience might consist of a period of training (say, as a call center operator), followed by being asked to carry out the task (taking calls), with ongoing feedback. This mirrors what a remote worker with a fully working computer (but no webcam) might be expected to do. A computer passes the Turing-AGI Test if it can carry out the work task as well as a skilled human. Most members of the public likely believe a real AGI system will pass this test. Surely, if computers are as intelligent as humans, they should be able to perform work tasks as well as a human one might hire. Thus, the Turing-AGI Test aligns with the popular notion of what AGI means. Here’s why we need a new test: “AGI” has turned into a term of hype rather than a term with a precise meaning. A reasonable definition of AGI is AI that can do any intellectual task that a human can. When businesses hype up that they might achieve AGI within a few quarters, they usually try to justify these statements by setting a much lower bar. This mismatch in definitions is harmful because it makes people think AI is becoming more powerful than it actually is. I’m seeing this mislead everyone from high-school students (who avoid certain fields of study because they think it’s pointless with AGI’s imminent arrival) to CEOs (who are deciding what projects to invest in, sometimes assuming AI will be more capable in 1-2 years than any likely reality). The original Turing Test, which required a computer to fool a human judge, via text chat, into being unable to distinguish it from a human, has been insufficient to indicate human-level intelligence. The Loebner Prize competition actually ran the Turing Test and found that being able to simulate human typing errors — perhaps even more than actually demonstrating intelligence — was needed to fool judges. A main goal of AI development today is to build systems that can do economically useful work, not fool judges. Thus a modified test that measures ability to do work would be more useful than a test that measures the ability to fool humans. For almost all AI benchmarks today (such as GPQA, AIME, SWE-bench, etc.), a test set is determined in advance. This means AI teams end up at least indirectly tuning their models to the published test sets. Further, any fixed test set measures only one narrow sliver of intelligence. In contrast, in the Turing Test, judges are free to ask any question to probe the model as they please. This lets a judge test how “general” the knowledge of the computer or human really is. Similarly, in the Turing-AGI Test, the judge can design any experience — which is not revealed in advance to the AI (or human subject) being tested. This is a better way to measure generality of AI than a predetermined test set. AI is on an amazing trajectory of progress. In previous decades, overhyped expectations led to AI winters, when disappointment about AI capabilities caused reductions in interest and funding, which picked up again when the field made more progress. One of the few things that could get in the way of AI’s tremendous momentum is unrealistic hype that creates an investment bubble, risking disappointment and a collapse of interest. To avoid this, we need to recalibrate society’s expectations on AI. A test will help. If we run a Turing-AGI Test competition and every AI system falls short, that will be a good thing! By defusing hype around AGI and reducing the chance of a bubble, we will create a more reliable path to continued investment in AI. This will let us keep on driving forward real technological progress and building valuable applications — even ones that fall well short of AGI. And if this test sets a clear target that teams can aim toward to claim the mantle of achieving AGI, that would be wonderful, too. And we can be confident that if a company passes this test, they will have created more than just a marketing release — it will be something incredibly valuable. [Original text: deeplearning.ai/the-batch/issu… ]
English
178
256
1.5K
163K
Evan Harris
Evan Harris@Evan__Harris·
@yavnun @omarsar0 Are you shipping to main or human in the loop for code review? Validation does not exclude human code review..
English
0
0
0
12
Eden Yav
Eden Yav@yavnun·
@omarsar0 Not convinced it will ever be more than a nice and very expensive productivity tool. No one can fully trust a big code base all generated and validated by AI, where the models themselves are still not 100 percent trustable
English
1
0
0
283
elvis
elvis@omarsar0·
The more I build with Claude Agent SDK, one thing is very clear: Claude Code is just scratching the surface. Agent SDK is a beast for building new agentic experiences. This weekend, I built a futuristic agent orchestrator with it. Productivity level of coding agents are 🤯.
English
37
20
342
68.1K
Evan Harris
Evan Harris@Evan__Harris·
@HamelHusain @sh_reya I will circulate this into my training material for the people in my company who want to wrap their heads around evals and why they are relevant to our product. Thanks for making 10 / 10 teaching materials :)
English
0
0
2
75
Hamel Husain
Hamel Husain@HamelHusain·
We created flashcards for students in our Evals course, but are giving them away for free! First up, Error Analysis - the most important part of evals. More info in the reply
Hamel Husain tweet media
English
6
29
177
21K
Evan Harris
Evan Harris@Evan__Harris·
@SergioRocks This is why I went remote a few weeks before COVID. I hope to never return to the office now. At times, I miss the in person nature of building alongside others. Then, I just schedule some 1:1s and a few workshops. All that is needed.
English
0
0
1
21
Sergio Pereira
Sergio Pereira@SergioRocks·
Ever notice how RTO policies rarely come from the teams doing the actual work? You’ll hear words like “culture” and “collaboration.” But let’s be real, it’s often a power move. A decision made by people who equate productivity with parking lots and badge swipes. Meanwhile, the team members in India, Poland and Brazil? They’re shipping, often faster than your in-person colleagues, who are still stuck in traffic. The future of work was never about where you work. It’s about how much trust your company is willing to give employees to do their best work.
English
7
6
68
9.7K
Evan Harris
Evan Harris@Evan__Harris·
@omarsar0 Have you come across productive setups that run all day in a high trust environment (e.g. healthtech)? I am working on sandboxing my agents and want to find the balance between speed & downside risk. I want to avoid the surprise data leak from a indirect prompt injection attack
English
0
0
0
243
elvis
elvis@omarsar0·
Damn, it is so much fun to build orchestrators on top of Claude Code. You would think the terminal would be the ultimate operator. There is so much more alpha left to build on top of all of this. Include insane setups to have coding agents running all day.
English
26
7
150
19.5K
Zephyr
Zephyr@Zephyr_hg·
AI runs my content strategy now. Built a system that watches industry news every hour, filters junk articles, and auto-generates Twitter threads plus LinkedIn posts. AI scores each piece for quality before writing anything. High scores get published automatically. Medium scores hit my review queue. Garbage gets archived. Never scrambling for post ideas at 11pm anymore. Comment "NEWS" and I'll DM it to you (must be following)
Zephyr tweet media
English
1K
99
1.3K
140.4K
harish.rs
harish.rs@Harish_521·
How to 10x yourself in 2026: - Build 5 side projects - Read 5 books a year - Eat 70% clean food - Lift weights 4-5 times a week - Go to bed early - No cheap dopamine Reply to commit
English
32
10
287
8.4K
harish.rs
harish.rs@Harish_521·
My 2025 wrapped: 1. Started the year mentally broken but didn’t quit 2. Went all in on freelancing instead of chasing placements 3. Built my own agency from scratch 4. Crossed ~$7,500 in freelance and agency revenue 5. Made money from multiple streams instead of one salary 6. Grew my main X account to ~10k followers 7. Pulled 15M+ impressions in a few months 8. Monetised X and started earning from content 9. Closed inbound clients through tweets alone 10. Ghostwrote for founders and creators 11. Sponsors paid me ~₹80k this year 12. Earned ~₹40k from ghostwriting 13. Earned ~₹30k from X payouts 14. Learned how distribution actually beats talent 15. Stopped chasing perfection and shipped more 16. Taught students offline and earned ~₹10k per month consistently 17. Paid my own expenses and semester fees 18. Kept ₹1 lakh saved in the bank for peace of mind 19. Faced anxiety about stability but chose freedom anyway 20. Understood I don’t want a corporate life ever 21. Got comfortable saying no to things that drain me 22. Improved focus and mental clarity 23. Started praying more consistently 24. Went to the masjid regularly 25. Started basic workouts and felt stronger 26. Managed bad days without self destructing 27. Lost people, outgrew some connections 28. Realised loneliness doesn’t mean failure 29. Learned to sit with uncertainty without panicking 30. Stayed consistent even when nothing seemed to work 31. Built belief by surviving, not winning 32. Ended the year calmer, sharper, and more self aware Not a viral year. Not a perfect year. But a year where I stopped running from myself. Going all in, again.
harish.rs tweet media
English
35
7
227
16K