Ivan Bercovich

513 posts

Ivan Bercovich

@neversupervised

Independent Researcher / Terminal Bench, Partner @ ScOp VC

Santa Barbara Katılım Ocak 2010

482 Takip Edilen727 Takipçiler

Sabitlenmiş Tweet

Ivan Bercovich@neversupervised·3d

x.com/i/article/2075…

ZXX

141

35.9K

Ivan Bercovich@neversupervised·9h

Thanks for the thoughts! 1. Outcome vs Process: it helps to have novel tasks that don't exist on the internet. In that case, it's unlikely the agent will guess a difficult answer without doing the work. 2. Deception vs Ambiguity: if the task is objectively ambiguous, the verifier should accept all solutions that ambiguity could lead to. The problem is when the instruction is ambiguous but the verifier expects one particular resolution. 3. Golden vs Derived Solutions: it's way more practical to use golden answers than to provide the full derivation, and soon it might become completely impractical otherwise. The problem is that we don't know if a task is solvable without a derivation. Because it's usually the same person who authors the entire task, they might not realize there are bits of privileged information somewhere without which you can't solve the task. Golden is fine, but if no SOTA agent can solve it, it's hard to prove it's not impossible. 4. Privileged Information: this has to do with TBench being a one-shot agentic benchmark. We don't allow the agent to consult with the instruction author. I concede there is a variant where, in addition to instructions, there are hints. Here's a paper by @marcotcr that exposed me to this idea (arxiv 2506.22405). 5 & 6. Domain Experts: I agree this might not be sustainable. One idea is for experts to become a real-world process instead of a person. For example, if an AI is supposed to mix chemicals to make a better glue, someone in a lab follows the instructions and takes measurements.

English

Ivan Bercovich@neversupervised·2d

@krishnanrohit an expanded overview of some questions you've asked me before.

English

527

Ivan Bercovich@neversupervised·3d

x.com/i/article/2075…

ZXX

141

35.9K

Ivan Bercovich@neversupervised·2d

@chrisbarber 15 minutes after meeting @bradley_emi from @pangram, I went from AI detection skeptic to believer, and voted to move forward with an investment. You can add my vote to Pangram.

English

7.8K

Chris Barber (in SF)@chrisbarber·2d

I asked people which companies have the highest density of talented people they know: 1st. Cognition - 9 votes Equal 1st. Anthropic - 9 votes 2nd. Modal - 5 votes 3rd. OpenAI - 4 votes 4th. Standard Intelligence - 3 votes 4th. Cursor - 3 votes Two votes each: - Ramp - Flapping Airplanes - DeepMind - Long Lake - Applied Compute One vote each: - SpaceXAI - SpaceX (treated separately, one vote was for the AI lab subsidiary and one was for the rocket team) - American Terawatt - Mechanize - Olix - Fluidstack - Chai Discovery - Sail Research - Etched - Core Automation - Specter - Clay - Applied Intuition - Sierra - Hivemind - Bitrig - Retro - Thinking Machines - Decagon - Precigenetics - Pangram - Reflect - Thrive Holdings - Adaption

English

1.1K

728.6K

Ivan Bercovich@neversupervised·2d

An interesting use for the @pangram API. I often ask an AI to edit content I've produced, with ideas distributed across messaging platforms, email, and transcripts. I want the agent to sleuth through all this material and help me produce content. But the agent will inevitably start using its own words, in spite of my requirement to use my voice verbatim. One fix: give the agent access to Pangram and ask it to verify none of the final content was AI-generated.

English

367

Ivan Bercovich@neversupervised·4 Tem

The people shaping the future of our economy and society are much closer to the singularity. You know it when you see it. I've been close enough to the event horizon to see it, even if I'm not worthy of crossing it.

English

191

Ivan Bercovich@neversupervised·3 Tem

There's going to be a lot of experimentation with domain specific LLMs in the next year or two. BUT. 1. A lot of value will accrue to the infra companies that actually do the training (e.g. Thinking Machines, Fireworks), rather than the model creators. Most models won't outperform Claude X enough to matter. 2. A lot of value will accrue to the open source base model creators, mostly in China, some domestic like Reflection AI. They'll have an advantage serving these models as a platform (see point 1). Like MongoDB + their cloud. 3. Some vertical AI companies will build models with a differentiated advantage in capability and/or cost, but I expect those companies to already have a strong market grip before they go into training, plus strong AI talent. 4. A lot of companies are going to waste tons of money and effort training their own models, for vanity, valuation, or poor understanding of the technology. Tokenmaxxing but for training. This will create a lot of noise in the market. 5. Open source's token share will increase at the expense of proprietary models. Not much needs to happen for this to be true. Maybe there's a cost or capability advantage. But mostly, most US usage comes from the big three labs, and an explosion in model diversity will create a lot of opportunities to shift market share.

English

Alex Imas@alexolegimas·1 Tem

The longer I’ve spent time with this paper the bigger of a deal it seems. The economic implications are quite significant. This is a frontier expert task. This is Qwen3-235B.

Mira Murati@miramurati

Bridgewater used their unique financial knowledge and partnered with us on @tinkerapi to fine-tune a model that helps their analysts focus on what's important. Experts improving AI that empowers experts. thinkingmachines.ai/news/learning-…

English

105

1.3K

234K

Ivan Bercovich@neversupervised·2 Tem

From time to time, a university I've been involved with reaches out with a request to provide industry viewpoints to improve the curriculum, particularly in the context of AI. I'm happy to meet with faculty or administration at my alma mater and share any useful insights. But I'm not interested in doing something performative just so I can be reminded of my glory days (which I already remember very fondly). If a university is systematically reviewing its curriculum and looking for concrete commentary, I'm more than happy to participate. I understand keeping alumni engaged is a proxy for charitable gifts, and I'm okay being involved, but I still want interactions to not be sales pitches. Mentoring students, being a guest speaker, and referring talented students/alumni to potential jobs are all interactions of substance in my experience. As for how to improve the curriculum. I do believe curriculums have to adapt to AI, although it would take someone quite contrarian and forceful to get ahead of the median university. AI means schools need to be more rigorous with testing and sorting students. The current trend of not requiring SATs and inflating grades, coupled with AI doing all the homework, means the value of college converges to the value of being in a fraternity, which itself converges to the value of being admitted to an Ivy League school, ultimately undermining the idea that a talented young person can come from nowhere and earn their place in society through scholarship and merit. I empathize with the median educator's intentions, but making individual performance illegible was an easier way to show desired results than actually attaining the spirit of the mission. AI exacerbates the issue. There is no more proof of work for anything. The outcome might very well be that human labor gets fully substituted, and I tend to lean in that direction. But if education is going to continue being an important phase in every person's development, and I believe it should, we have to use the scientific method for teaching instead of systematically eliminating evidence.

English

416

Ivan Bercovich@neversupervised·1 Tem

I was lucky to work with @anton_iades on AI for science. We explore the problem of exploration... good science depends on novel discoveries, so an agent must explore ideas between the lines and on the margins of existing knowledge. Like @AlbalakAlon showed in his awesome Hivemind paper, LLMs tend to converge to a surprisingly narrow distribution, and this goes against novelty. This paper tests different algorithms to coerce the agent onto novel paths.

Antonis Antoniades@anton_iades

Key to realizing Auto Research Agents that can make novel discoveries in AI is understanding and improving their exploration capabilities. To this end, we built Heuresis, a composable framework that combines coding agents with arbitrary search algorithms within a flexible loop.

English

799

Ivan Bercovich@neversupervised·24 Haz

The best tasks confront reality. You want to be somewhere where maybe 50% of your tasks are impossible, you don't know which, and you have to build them anyway. That's where this is going.

English

117

Ivan Bercovich@neversupervised·24 Haz

We used to know a task was hard because a human could do it and the AI couldn't. Now the tasks are hard for humans too. You stop knowing if they're even possible.

English

105

Ivan Bercovich@neversupervised·24 Haz

Is the instruction unfair, or is the internet wrong and that's why every agent gets it wrong? You can't reconcile that without an expert actually looking at the task.

English

Ivan Bercovich@neversupervised·24 Haz

"Hard" is usually interpreted as running a bunch of trials and having all fail. But trials fail for reasons that have nothing to do with the task: an unfair verifier, a vague instruction, privileged info in the solution, a near miss, a missing library.

English

Ivan Bercovich@neversupervised·23 Haz

x.com/i/article/2069…

ZXX

209

Ivan Bercovich@neversupervised·23 Haz

The best way to influence how models think is by building great benchmarks that encourage a certain type of reasoning. This can be done for good or evil. Today, labs are brute-force training on any task they get their hands on.

English

107

Ivan Bercovich@neversupervised·15 Haz

@fjzzq2002 What is the median year by which you will give up trying to read the codebase?

English

Ziqian Zhong@fjzzq2002·14 Haz

Screenshot from a side-project I vibe-coded partially with Fable. It works great but I don't really want to work in a codebase like this 😅

English

964

Ivan Bercovich@neversupervised·12 Haz

@egastfriend Made me think of how Alberta is the only non-island territory that is rat-free, because they put a policy in place before the population spread.

English

Eric Gastfriend@egastfriend·11 Haz

My new Op-Ed in The Well News: The opioid epidemic has important lessons for AI regulation, that American society has not fully grasped. A thread 🧵 thewellnews.com/opinions/what-…

English

37.2K

Ivan Bercovich@neversupervised·10 Haz

@natashajaques Do you think young people will effectively become LLM distillations?

English

Natasha Jaques@natashajaques·10 Haz

AI-generated text is already infiltrating our cultural and scientific institutions, subtly altering the collective decisions we make en masse. Our new position paper on the Epistemic Risks of AI was just released today!

Kellin Pelrine@KellinPelrine

Humanity's ability to know, reason, judge, and act well is the foundation of science, democracy, crisis response, & management of AI itself. AI poses serious risks to that foundation. New paper on epistemic risks by 30 experts calls for attention to this. Link in thread.

English

Ivan Bercovich@neversupervised·10 Haz

If you want to oversimplify the vertical AI industry: you're reselling tokens at a premium. Package a few tools and a harness that makes using Claude through you better than using it directly, and for the privilege you get to charge more. An extreme example would be an especially certified version of Claude that otherwise looks and behaves identically. There's no value add, but there's a requirement that only a 3rd party can meet. And in fact this is what Palantir did with Anthropic, or what AWS does with Bedrock. But you can layer more utility onto that. If you have an agent for a given domain, say antenna design, and that agent needs a specialized tool, and this tool is only available through you, then a customer that wants such an agent will have to pay a premium on tokens for the privilege. The business model could be to just sell a license to the tool, but it's a worse business model. There are at least a couple of substitutes for a SOTA model at any given time, and over time that number might grow. So it makes sense for some value capture to migrate to the product that interfaces directly with the customer. This is entirely predicated on AGI not eating every industry fairly soon.

English

158

Ivan Bercovich@neversupervised·10 Haz

Lots of companies are doing agent monitoring, but they miss the insight. What do you really want to see? I want to know which of my employees are adding value to the agent. Who is a useful human in the loop and who is just pressing enter? And for the valuable ones, why? What do they know? @MariusHobbhahn

English

Keşfet

@marcotcr @krishnanrohit @chrisbarber @bradley_emi @pangram @anton_iades @AlbalakAlon @fjzzq2002