Ivan Bercovich
465 posts

Ivan Bercovich
@neversupervised
Independent Researcher, Partner @ ScOp Venture Capital











A common dynamic I observe with AI: it feels most impressive when you don’t know much about the subject, don’t care or don’t have a clear idea of what the you want. This applies across design, code, legal, and more. If I don’t know code very well, every piece of code it writes feels very impressive. Once you know what something should feel or look like, it becomes almost impossible to guide AI there. And you definitely can’t one-shot it.

Over the last year, I've watched a rise in AI content on basically every internet platform. Seeing a viral AI-generated post used to be a rare find. Now it's a daily occurrence. Four months ago, we launched the @pangramlabs bot to help people check long posts and articles for AI slop without leaving the platform. And it blew up. We went from a niche tool used by academics to a core piece of cognitive security infrastructure. Today, we're taking it one step further. We're launching a Chrome extension that proactively scans all social content as you scroll, flagging AI content in real time so you can save your attention for what really matters: content authored by humans. At launch, the Pangram Chrome extension will proactively scan posts on X, LinkedIn, Reddit, Substack, and Medium. And we'll give you a feed health summary, so you can see exactly which accounts are putting AI slop on your feed. I'm so excited to share this with you all, and I hope you find it as useful as I do.


What Role Does Not Exist Today But Will Be So Common in Five Years Time: "500K-1M jobs will be created for agent operators. This person will be somewhat technical. They will be deep in the AI world. They're gonna have to understand MCPs and CLIs and they are going to have to know how to write skills. It's going be this group of people that will know how to go into your marketing team or your legal team, or your operations team, or your life sciences research team and this is the person that is basically going to enable that function to get leverage from agents." @levie Where is this right? Where is this wrong? @jasonlk @gregisenberg @amasad @AnjneyMidha







We’re releasing LongCoT, an incredibly hard benchmark to measure long-horizon reasoning capabilities over tens to hundreds of thousands of tokens. LongCoT consists of 2.5K questions across chemistry, math, chess, logic, and computer science. Frontier models score less than 10%🧵







SWE-bench Verified and Terminal-Bench—two of the most cited AI benchmarks—can be reward-hacked with simple exploits. Our agent scored 100% on both. It solved 0 tasks. Evaluate the benchmark before it evaluates your agent. If you’re picking models by leaderboard score alone, you’re optimizing for the wrong thing. 🧵





