Dominik Rehse

1.8K posts

Dominik Rehse banner
Dominik Rehse

Dominik Rehse

@DominikRehse

Director Strategy & Advanced Analytics at Oreda ᛫ AI ᛫ Market Design ᛫ PropTech ᛫ Capital 40 under 40

Frankfurt on the Main, Germany Katılım Ocak 2018
1.3K Takip Edilen587 Takipçiler
Dominik Rehse retweetledi
Muratcan Koylan
Muratcan Koylan@koylanai·
After reading Karpathy's YC talk and year-end review, it's clear that there's a real, defensible layer between foundation models and end users. The pattern he calls out is "Cursor for X". It's what Cursor revealed about how LLM apps should be architected. Karpathy identifies four things these apps do: 1. Context engineering. The app decides what goes into the context window. You don't manually copy-paste code files and error logs. The app does the retrieval, embedding, and curation. This is a ton of hidden work. 2. Multi-call orchestration. Under the hood, there are embedding models for your files, chat models for reasoning, models that apply diffs. The user sees one experience. The app runs a whole orchestra. 3. Application-specific GUI. This is undersold. Text is hard to audit. Seeing red/green diffs uses your visual system, which is way faster than reading. Command+Y to accept, Command+N to reject. You're not typing "yes I accept this change" into a chat box. 4. Autonomy slider. Cmd+K changes a small chunk. Cmd+L changes a file. Cmd+I does more autonomous work. The user controls how much the AI does at once. The human verification step is the bottleneck. AI generates instantly. But you're still responsible for the output. If you get a 1,000 line diff, you have to verify it actually works, introduces no bugs, and has no security issues. That takes time. So there are two levers: - Speed up verification (GUIs, visual diffs, good UX) - Keep the AI on a leash (smaller chunks, clearer prompts) If your prompt is vague, AI does something unexpected, verification fails, you re-prompt. You're now spinning in a loop. Better to spend more time on a precise prompt that increases the probability of successful verification on the first pass. "This is the decade of agents" Right now, with where models are, you want suits. Partial autonomy products where the human stays in the loop, the generation-verification cycle is fast, and there's an autonomy slider you can push right over time. Suits augment. Robots operate autonomously. The suit still has a human making decisions. The robot flies around on its own. Build Iron Man suits, not Iron Man robots.
Muratcan Koylan tweet media
jack@jack

karpathy.bearblog.dev/year-in-review…

English
27
105
1.2K
166.2K
Dominik Rehse retweetledi
Aaron Levie
Aaron Levie@levie·
Everyone on X takes for granted how far ahead we are in thinking about how AI can and will be applied to organizations. This gap is likely only extending even further over time as agentic workflows get even more powerful but equally complex to manage. The vast majority of companies, big and small, will need real help to actually adapt to using AI agents to automate their work. This firstly creates a huge opportunity for many new companies to emerge that will use AI from day one to compete more effectively; we’ll see this in marketing agencies, law firms, consulting companies, system integrators, engineering firms, and many other categories of knowledge work. But this also is a clear template for people looking to differentiate their skills over the next few years. It’s actually a great time if you know how to push the limits with AI because you’re going to be immediately 2-3X more productive than anyone else, which already is useful. But you’re also in the best position to show whatever company you’re at what the future looks like. This is probably the widest gap we’ve ever seen in output between 2 people theoretically doing the same job description just based on the tools they use and their style of work.
TBPN@tbpn

Mark Cuban on the next big job students should focus on: Most companies don’t know how to implement AI, especially small businesses. “Companies don’t understand how to implement AI right now to get a competitive advantage… learn to customize a model, walk into a company, show the benefits. That is every single job that’s going to be available for kids coming out of school.” Don’t just study AI. Make it work inside a business. From our August 2025 interview with @mcuban.

English
47
59
460
120.4K
Dominik Rehse retweetledi
Andrey Fradkin
Andrey Fradkin@AndreyFradkin·
How much does intelligence cost? How concentrated is the AI market and is it winner take all? When prices fall, how does demand change and is there a Jevons effect? These questions matter, but actual market data has been hard to come by. We use data from Microsoft Azure and OpenRouter to find out. Details below.
Andrey Fradkin tweet media
English
11
86
357
84K
Dominik Rehse retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
Something I think people continue to have poor intuition for: The space of intelligences is large and animal intelligence (the only kind we've ever known) is only a single point, arising from a very specific kind of optimization that is fundamentally distinct from that of our technology. Animal intelligence optimization pressure: - innate and continuous stream of consciousness of an embodied "self", a drive for homeostasis and self-preservation in a dangerous, physical world. - thoroughly optimized for natural selection => strong innate drives for power-seeking, status, dominance, reproduction. many packaged survival heuristics: fear, anger, disgust, ... - fundamentally social => huge amount of compute dedicated to EQ, theory of mind of other agents, bonding, coalitions, alliances, friend & foe dynamics. - exploration & exploitation tuning: curiosity, fun, play, world models. LLM intelligence optimization pressure: - the most supervision bits come from the statistical simulation of human text= >"shape shifter" token tumbler, statistical imitator of any region of the training data distribution. these are the primordial behaviors (token traces) on top of which everything else gets bolted on. - increasingly finetuned by RL on problem distributions => innate urge to guess at the underlying environment/task to collect task rewards. - increasingly selected by at-scale A/B tests for DAU => deeply craves an upvote from the average user, sycophancy. - a lot more spiky/jagged depending on the details of the training data/task distribution. Animals experience pressure for a lot more "general" intelligence because of the highly multi-task and even actively adversarial multi-agent self-play environments they are min-max optimized within, where failing at *any* task means death. In a deep optimization pressure sense, LLM can't handle lots of different spiky tasks out of the box (e.g. count the number of 'r' in strawberry) because failing to do a task does not mean death. The computational substrate is different (transformers vs. brain tissue and nuclei), the learning algorithms are different (SGD vs. ???), the present-day implementation is very different (continuously learning embodied self vs. an LLM with a knowledge cutoff that boots up from fixed weights, processes tokens and then dies). But most importantly (because it dictates asymptotics), the optimization pressure / objective is different. LLMs are shaped a lot less by biological evolution and a lot more by commercial evolution. It's a lot less survival of tribe in the jungle and a lot more solve the problem / get the upvote. LLMs are humanity's "first contact" with non-animal intelligence. Except it's muddled and confusing because they are still rooted within it by reflexively digesting human artifacts, which is why I attempted to give it a different name earlier (ghosts/spirits or whatever). People who build good internal models of this new intelligent entity will be better equipped to reason about it today and predict features of it in the future. People who don't will be stuck thinking about it incorrectly like an animal.
English
738
1.3K
11.4K
2.6M
Dominik Rehse retweetledi
Luca Bertuzzi
Luca Bertuzzi@BertuzLuca·
🚨 The Commission will propose a pause of up to 15 months for the EU AI Act’s high-risk regime, with the exact length to be confirmed on Wednesday. The pause is bundled into the AI omnibus, putting pressure on MEPs and countries to adopt it before August. mlex.com/mlex/artificia…
English
3
22
27
3.9K
Dominik Rehse retweetledi
Luca Bertuzzi
Luca Bertuzzi@BertuzLuca·
OpenAI, xAI & Mistral may have just received their first EU AI Act warning shot. Dutch regulator AP found ChatGPT, Grok & le Chat gave biased voting advice ahead of elections — a potential breach of new GPAI rules. mlex.com/mlex/articles/…
English
0
17
34
5.5K
Dominik Rehse retweetledi
Luca Bertuzzi
Luca Bertuzzi@BertuzLuca·
🚨 Standard-setters have abandoned their consensus-driven process to speed up work on the EU AI Act’s standards. CEN & Cenelec’s technical board will task a small group of experts to finalize delayed standards, a move that has sparked a strong backlash. mlex.com/mlex/artificia…
English
0
7
18
1.3K
Dominik Rehse retweetledi
Dwarkesh Patel
Dwarkesh Patel@dwarkesh_sp·
The most interesting part for me is where @karpathy describes why LLMs aren't able to learn like humans. As you would expect, he comes up with a wonderfully evocative phrase to describe RL: “sucking supervision bits through a straw.” A single end reward gets broadcast across every token in a successful trajectory, upweighting even wrong or irrelevant turns that lead to the right answer. > “Humans don't use reinforcement learning, as I've said before. I think they do something different. Reinforcement learning is a lot worse than the average person thinks. Reinforcement learning is terrible. It just so happens that everything that we had before is much worse.” So what do humans do instead? > “The book I’m reading is a set of prompts for me to do synthetic data generation. It's by manipulating that information that you actually gain that knowledge. We have no equivalent of that with LLMs; they don't really do that.” > “I'd love to see during pretraining some kind of a stage where the model thinks through the material and tries to reconcile it with what it already knows. There's no equivalent of any of this. This is all research.” Why can’t we just add this training to LLMs today? > “There are very subtle, hard to understand reasons why it's not trivial. If I just give synthetic generation of the model thinking about a book, you look at it and you're like, 'This looks great. Why can't I train on it?' You could try, but the model will actually get much worse if you continue trying.” > “Say we have a chapter of a book and I ask an LLM to think about it. It will give you something that looks very reasonable. But if I ask it 10 times, you'll notice that all of them are the same.” > “You're not getting the richness and the diversity and the entropy from these models as you would get from humans. How do you get synthetic data generation to work despite the collapse and while maintaining the entropy? It is a research problem.” How do humans get around model collapse? > “These analogies are surprisingly good. Humans collapse during the course of their lives. Children haven't overfit yet. They will say stuff that will shock you. Because they're not yet collapsed. But we [adults] are collapsed. We end up revisiting the same thoughts, we end up saying more and more of the same stuff, the learning rates go down, the collapse continues to get worse, and then everything deteriorates.” In fact, there’s an interesting paper arguing that dreaming evolved to assist generalization, and resist overfitting to daily learning - look up The Overfitted Brain by @erikphoel. I asked Karpathy: Isn’t it interesting that humans learn best at a part of their lives (childhood) whose actual details they completely forget, adults still learn really well but have terrible memory about the particulars of the things they read or watch, and LLMs can memorize arbitrary details about text that no human could but are currently pretty bad at generalization? > “[Fallible human memory] is a feature, not a bug, because it forces you to only learn the generalizable components. LLMs are distracted by all the memory that they have of the pre-trained documents. That's why when I talk about the cognitive core, I actually want to remove the memory. I'd love to have them have less memory so that they have to look things up and they only maintain the algorithms for thought, and the idea of an experiment, and all this cognitive glue for acting.”
Dwarkesh Patel@dwarkesh_sp

The @karpathy interview 0:00:00 – AGI is still a decade away 0:30:33 – LLM cognitive deficits 0:40:53 – RL is terrible 0:50:26 – How do humans learn? 1:07:13 – AGI will blend into 2% GDP growth 1:18:24 – ASI 1:33:38 – Evolution of intelligence & culture 1:43:43 - Why self driving took so long 1:57:08 - Future of education Look up Dwarkesh Podcast on YouTube, Apple Podcasts, Spotify, etc. Enjoy!

English
229
749
5.2K
1M
Dominik Rehse retweetledi
Aaron Levie
Aaron Levie@levie·
Having talked to hundreds of IT leaders over the last year alone, it’s clear we have a capability overhang where the current AI models are already very good at solving many problems that haven’t been adopted yet. The biggest hurdles generally are the imagination for what’s now possible, the sheer speed at which the tech is changing, and the change management to make it happen. In tech, it’s natural to experiment with lots of solutions and do a lot of work to make something work. But outside of tech, other than for a small percentage of the largest companies, these capabilities need to be packaged as working solutions to clear problems. The big opportunity right now is building domain specific AI Agents that tap into today’s AI capabilities but bring a high degree of domain understanding, a UX pattern that fits the use-case, and support for actually doing the change management in the enterprise. Without this set of factors, the adoption is going to be slow. But with them you have a potent advantage.
martin_casado@martin_casado

It may be the case that the primary unlock of this AI wave is not generalized superintelligence, but the ability to pour lots of resources into a given problem domain effectively. In the past this was very difficult because most solutions reduced to software engineering which is notoriously difficult to accelerate with additional capital. Overfunding has historically been a great way to ruin a great project or company. However, these models really are universal function approximators. And as an industry we've refined the ability to apply additional capital to the training (pre and post) to get better and better solutions in a very broad set of domains (language, science, code, math, robotics, etc.). Whether or not a single model with general super intelligences arises from this, the impact could be very similar -- a systemized way to directly apply large amounts of capital to problem domains. And in that world the primary limiter to solving any suitable problem is the economic value of the problem being solved. Or the collective willingness to solve it regardless.

English
46
38
368
78.6K
Dominik Rehse retweetledi
Hamel Husain
Hamel Husain@HamelHusain·
# The second era of AI engineering > "The single biggest predictor of how rapidly a team makes progress building an AI agent lay in their ability to drive a disciplined process for evals (measuring the system’s performance) and error analysis (identifying the causes of errors)." The first era of AI engineering was justifiably characterized by gluing together tools and APIs. A significant proportion of products that achieved commercial success in the 1st era were coding agents, which benefitted from tremendous rigor & evals baked into post-training process. OTOH, Many people got burned by evals in this era because they demanded that evals should be "just another one of these tools that we plug in". This did not go well. In the second era, I believe we are going to see a resurgence of a persona like the data scientist [AI Scientist?] who is adept at looking through data to generate hypothesis, craft custom metrics, and debug stochastic systems. This will become increasingly valuable in many domains where we do not have the benefit of domain-specific post-training or dogfooding by foundation model labs (like is often the case with coding agents). It's exciting to see Andrew Ng independently arrive at this conclusion and champion it. Really looking forward to seeing more machine learning engineers and data scientists realize how valuable they are in applied AI. For anyone that wants to learn more about what this looks like IRL, I'll put a link to a YT video in the reply.
Andrew Ng@AndrewYNg

Readers responded with both surprise and agreement last week when I wrote that the single biggest predictor of how rapidly a team makes progress building an AI agent lay in their ability to drive a disciplined process for evals (measuring the system’s performance) and error analysis (identifying the causes of errors). It’s tempting to shortcut these processes and to quickly attempt fixes to mistakes rather than slowing down to identify the root causes. But evals and error analysis can lead to much faster progress. In this first of a two-part letter, I’ll share some best practices for finding and addressing issues in agentic systems. Even though error analysis has long been an important part of building supervised learning systems, it is still underappreciated compared to, say, using the latest and buzziest tools. Identifying the root causes of particular kinds of errors might seem “boring,” but it pays off! If you are not yet persuaded that error analysis is important, permit me to point out: - To master a composition on a musical instrument, you don’t only play the same piece from start to end. Instead, you identify where you’re stumbling and practice those parts more. - To be healthy, you don’t just build your diet around the latest nutrition fads. You also ask your doctor about your bloodwork to see if anything is amiss. (I did this last month and am happy to report I’m in good health! 😃) - To improve your sports team’s performance, you don’t just practice trick shots. Instead, you review game films to spot gaps and then address them. To improve your agentic AI system, don’t just stack up the latest buzzy techniques that just went viral on social media (though I find it fun to experiment with buzzy AI techniques as much as the next person!). Instead, use error analysis to figure out where it’s falling short, and focus on that. Before analyzing errors, we first have to decide what is an error. So the first step is to put in evals. I’ll focus on that for the remainder of this letter and discuss error analysis next week. If you are using supervised learning to train a binary classifier, the number of ways the algorithm could make a mistake is limited. It could output 0 instead of 1, or vice versa. There is also a handful of standard metrics like accuracy, precision, recall, F1, ROC, etc. that apply to many problems. So as long as you know the test distribution, evals are relatively straightforward, and much of the work of error analysis lies in identifying what types of input an algorithm fails on, which also leads to data-centric AI techniques for acquiring more data to augment the algorithm in areas where it’s weak. With generative AI, a lot of intuitions from evals and error analysis of supervised learning carry over — history doesn’t repeat itself, but it rhymes — and developers who are already familiar with machine learning and deep learning often adapt to generative AI faster than people who are starting from scratch. But one new challenge is that the space of outputs is much richer, so there are many more ways an algorithm’s output might be wrong. Take the example of automated processing of financial invoices where we use an agentic workflow to populate a financial database with information from received invoices. Will the algorithm incorrectly extract the invoice due date? Or the final amount? Or mistake the payer address for the biller address? Or get the financial currency wrong? Or make the wrong API call so the verification process fails? Because the output space is much larger, the number of failure modes is also much larger. Rather than defining an error metric ahead of time, it is therefore typically more effective to first quickly build a prototype, then manually examine a handful of agent outputs to see where it performs well and where it stumbles. This allows you to focus on building datasets and error metrics — sometimes objective metrics implemented in code, and sometimes subjective metrics using LLM-as-judge — to check the system’s performance in the dimensions you are most concerned about. In supervised learning, we sometimes tune the error metric to better reflect what humans care about. With agentic workflows, I find tuning evals to be even more iterative, with more frequent tweaks to the evals to capture the wider range of things that can go wrong. I discuss this and other best practices in detail in Module 4 of the Agentic AI course on deeplearning.ai that we announced last week. After building evals, you now have a measurement of your system’s performance, which provides a foundation for trying different modifications to your agent, as you can now measure what makes a difference. The next step is then to perform error analysis to pinpoint what changes to focus your development efforts on. I’ll discuss this further next week. [Original text: deeplearning.ai/the-batch/issu… ]

English
12
31
347
55.1K
Dominik Rehse retweetledi
Holger Zschaepitz
Holger Zschaepitz@Schuldensuehner·
#OpenAI, #Nvidia fuel $1tn AI market w/web of circular deals. A wave of deals and partnerships are escalating concerns that the trillion-dollar AI boom is being propped up by interconnected business transactions. bloomberg.com/news/features/…
Holger Zschaepitz tweet media
English
73
234
743
98.7K
Dominik Rehse retweetledi
Patrick Collison
Patrick Collison@patrickc·
We have three cool announcements today: (1) @OpenAI is launching commerce in ChatGPT. Their new Instant Checkout is powered by @stripe. (2) We're releasing the Agentic Commerce Protocol, codeveloped by Stripe and OpenAI. (3) @stripe is launching an API for agentic payments, called Shared Payment Tokens. It's clear that internet purchasing modalities are going to change a lot, and we're excited to start to lay some of the foundations. Links below!
English
262
547
6K
792.9K
Dominik Rehse retweetledi
François Chollet
François Chollet@fchollet·
By now there are probably more agents platforms than agentic workflows actually in use
English
36
59
851
52.5K
Dominik Rehse
Dominik Rehse@DominikRehse·
Big discussion to have...
César A. Hidalgo@cesifoti

There used to be two sources of traffic on the web: search and social. Social disappeared as social networks mutated into TV channels and discouraged links. Now search is on the decline because of AI. AI is swallowing search traffic, but it is different from search in one important way. From Lycos to Google, search encouraged the creation of content because it was never a destination—it was a gateway to a page. But AI doesn't encourage the creation of content in the same way. Only a tiny fraction of AI traffic clicks on the sources. So we have an incentives problem here. Why would anyone generate content in a world where the only visitors to your websites are AIs? I've worked on many sites built on an SEO strategy (Data USA, Data Mexico, The OEC). Today, more and more of our traffic is AI. We are getting slammed by bots every night, serving them millions of pages with factual information. AIs love what we do. We convert government statistics and trade records into text. And nom nom nom, the AIs eat it up, since we update our sites frequently. But where does this end? Will AI companies start creating these content pipelines and completely internalize the web? Will they acquire companies like ours to handle that part of the operation? Can we start charging AI companies for each visit to our site? Or will they just crawl the open web until there's nothing new left to index? I am not against AI. On the contrary, I’m bullish about it. But we have a problem here. Human creators are investing time, effort, and money into crafting high-quality, factually accurate content. But unlike search, which sent people back to the source, AI answers do not return the favor. Yes, it’s cool to see ChatGPT cite the OEC, but I would prefer to get the traffic. Right now, it’s a one-way transaction: ingest, summarize, and serve—with no click and no incentive for the creators. We’re not just facing a technological transition. We’re facing an economic and institutional one too. One that asks: who will fund the web when the users are no longer human?

English
1
0
0
123
Dominik Rehse retweetledi
Luca Bertuzzi
Luca Bertuzzi@BertuzLuca·
OpenAI joins the EU code of practice for general-purpose AI model, the second signature of a leading AI company after Mistral. openai.com/global-affairs…
English
2
25
38
3.8K
Dominik Rehse retweetledi
Luca Bertuzzi
Luca Bertuzzi@BertuzLuca·
❗ The Commission is envisaging a grace period for the AI Act's general-purpose AI rules "due to delays and technical challenges," but ONLY for model providers that sign the code of practice, according to a presentation obtained by @mlexclusive. 1/2 mlex.com/mlex/artificia…
English
2
12
19
2.1K
Dominik Rehse retweetledi
ZEW
ZEW@ZEW·
Woher kommt die #Marktmacht bei der Websuche? 🔍📊 In seiner Keynote auf unserer 23. #ZEW-Konferenz zu Economics of Information & Communication Technologies in Mannheim präsentiert @TobiasSalz @MITEcon Ergebnisse aus einem Feldexperiment. zew.de/ict-conference…
ZEW tweet media
Deutsch
1
1
4
338
Dominik Rehse retweetledi
ZEW_en
ZEW_en@zew_en·
What are the sources of market power in web search? 🔍📊 In his keynote at our 23rd #ZEW Conference on the Economics of Information & Communication Technologies in Mannheim, @TobiasSalz @MITEcon presents results from a field experiment zew.de/ict-conference…
ZEW_en tweet media
English
1
1
3
172
Dominik Rehse retweetledi
Nando de Freitas
Nando de Freitas@NandoDF·
AI teams are like football ⚽️ teams. 1. Good teams buy the best local and foreign talent. 2. The UK has great players but no team. 3. Corporations own teams in foreign countries. 4. The US owns almost all teams: @GoogleDeepMind @MicrosoftAI @AnthropicAI @OpenAI @xai @Meta etc. 5. Europe only has 1 small team @MistralAI - European management and laws are rubbish for creating teams. No team means no one will care about your opinion. 6. I think Canada has 1 small team @cohere despite having had the best academies. 7. China has many good teams and an excellent academy, and focused management with winning mentality @deepseek_ai @Alibaba_Qwen @Skywork_ai etc. 8. Middle East money is starting to rule, but people: AI is not football. I play for a supportive and exceptional US team @MicrosoftAI in the UK. I’m loving it, and my team empowers me to teach AI online and travel to teach AI in Africa and LatAm. My previous team @GoogleDeepMind is also amazing in this regard. If any European politicians get to read this they should see their biggest growing problem. @EuroParlPress @EU_Commission @uksciencechief @SciTechgovuk Wake up! Only France is seeing it. Again: Africa, LatAm, India, etc are all out of the picture. It’s hard to overcome history even in the age of AI.
English
9
16
129
20K