Vilhelm von Ehrenheim

447 posts

Vilhelm von Ehrenheim banner
Vilhelm von Ehrenheim

Vilhelm von Ehrenheim

@while

Co-Founder & Chief AI Officer @QA.tech 🚀 Building the new verification layer for AI SDLC Based in Stockholm | Join us: https://t.co/oNXfpv4gtz

Stockholm Katılım Temmuz 2007
238 Takip Edilen138 Takipçiler
Vilhelm von Ehrenheim
@karpathy just described LLM knowledge bases. Ingest, compile into a wiki, query, lint, repeat. We’ve been running this pattern in production for QA agents. The agent builds a knowledge graph of your product. Pages, flows, elements, relationships. It compounds over time. The next step is making it evolutionary. Agents that don’t just accumulate knowledge but learn what matters. Prune what doesn’t. Adapt their testing strategy based on what they’ve seen break before. Not RAG. Not static memory. Something closer to a living understanding of your product.
English
0
0
0
14
Vilhelm von Ehrenheim
This is how I use Claude Code for content, presentations and research already. Raw sources in a directory, LLM compiles and cross-references, I rarely edit directly. Same pattern powers our QA agents at QA.tech. The agent builds a knowledge graph of the product it’s testing. Pages, flows, element relationships. Not selectors. Understanding. The “wiki as compounding artifact” framing is right. We call it a feature graph but the architecture is very similar.
English
0
0
0
12
Andrej Karpathy
Andrej Karpathy@karpathy·
Wow, this tweet went very viral! I wanted share a possibly slightly improved version of the tweet in an "idea file". The idea of the idea file is that in this era of LLM agents, there is less of a point/need of sharing the specific code/app, you just share the idea, then the other person's agent customizes & builds it for your specific needs. So here's the idea in a gist format: gist.github.com/karpathy/442a6… You can give this to your agent and it can build you your own LLM wiki and guide you on how to use it etc. It's intentionally kept a little bit abstract/vague because there are so many directions to take this in. And ofc, people can adjust the idea or contribute their own in the Discussion which is cool.
Andrej Karpathy@karpathy

LLM Knowledge Bases Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge (stored as markdown and images). The latest LLMs are quite good at it. So: Data ingest: I index source documents (articles, papers, repos, datasets, images, etc.) into a raw/ directory, then I use an LLM to incrementally "compile" a wiki, which is just a collection of .md files in a directory structure. The wiki includes summaries of all the data in raw/, backlinks, and then it categorizes data into concepts, writes articles for them, and links them all. To convert web articles into .md files I like to use the Obsidian Web Clipper extension, and then I also use a hotkey to download all the related images to local so that my LLM can easily reference them. IDE: I use Obsidian as the IDE "frontend" where I can view the raw data, the the compiled wiki, and the derived visualizations. Important to note that the LLM writes and maintains all of the data of the wiki, I rarely touch it directly. I've played with a few Obsidian plugins to render and view data in other ways (e.g. Marp for slides). Q&A: Where things get interesting is that once your wiki is big enough (e.g. mine on some recent research is ~100 articles and ~400K words), you can ask your LLM agent all kinds of complex questions against the wiki, and it will go off, research the answers, etc. I thought I had to reach for fancy RAG, but the LLM has been pretty good about auto-maintaining index files and brief summaries of all the documents and it reads all the important related data fairly easily at this ~small scale. Output: Instead of getting answers in text/terminal, I like to have it render markdown files for me, or slide shows (Marp format), or matplotlib images, all of which I then view again in Obsidian. You can imagine many other visual output formats depending on the query. Often, I end up "filing" the outputs back into the wiki to enhance it for further queries. So my own explorations and queries always "add up" in the knowledge base. Linting: I've run some LLM "health checks" over the wiki to e.g. find inconsistent data, impute missing data (with web searchers), find interesting connections for new article candidates, etc., to incrementally clean up the wiki and enhance its overall data integrity. The LLMs are quite good at suggesting further questions to ask and look into. Extra tools: I find myself developing additional tools to process the data, e.g. I vibe coded a small and naive search engine over the wiki, which I both use directly (in a web ui), but more often I want to hand it off to an LLM via CLI as a tool for larger queries. Further explorations: As the repo grows, the natural desire is to also think about synthetic data generation + finetuning to have your LLM "know" the data in its weights instead of just context windows. TLDR: raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki, and all of it viewable in Obsidian. You rarely ever write or edit the wiki manually, it's the domain of the LLM. I think there is room here for an incredible new product instead of a hacky collection of scripts.

English
793
2.2K
22K
4.8M
Yann LeCun
Yann LeCun@ylecun·
@elonmusk Thinking in language has limited applications, largely in coding and mathematics where the language itself can help reasoning. But, as I've been saying for years, thinking manipulates mental models in abstract (continuous) representation space. Soooo, xAI gonna use JEPA now?
English
20
11
257
25K
Elon Musk
Elon Musk@elonmusk·
Hadamard thought in image space
English
3.1K
3.3K
52K
57.4M
Vilhelm von Ehrenheim
Very much agree. This is still a pipe dream originating in extrapolation from hype. Anyone w more solid experience building these backbones knows it. If it is still too hard for top tier engineers to do an ERP migration we are pretty far from it being a reality using agents to build something new. But the future might ofc look different. Even if it does I find it hard to think that you want to spend your engineering effort on internal migrations and maintenance. Someone still needs to own it, understand everything you need to build, map out use cases and all custom logic. It is not as simple as it sounds just because you can vibe together an app in a weekend.
English
0
0
0
2
Fredrik Hjelm
Fredrik Hjelm@FredrikHjelm4·
Most people who say “AI will replace SaaS” have not replaced a single system in reality. They are vibe coders, solopreneurs, and self-proclaimed experts talking theory. We have done real replacements at Voi, at scale, on a modern tech and data stack, with elite engineering resources. Here is the truth. Critical systems like ERP and CRM are not getting ripped out in established companies. Forget it. You do not replace NetSuite, or Salesforce, lightly. Over years, millions of micro-improvements become embedded in finance, reporting, compliance, and operations. The data is too critical. The operational risk is too high. Net-new companies can go AI-native ERP from day one. Incumbents will not gamble their backbone. Mid- and long-tail SaaS is different. Narrow tools with limited surface area and clear workflows can be replaced. But even there, it is not about writing a prompt and deleting a subscription. You must own lifecycle management: integrations, permissions, data models, edge cases, upgrades, monitoring, and governance. That requires real engineering resources. Saving money on SaaS licenses is a nice headline. It is not the frontier. The frontier is replacing human labor. Cross-functional workflows: reporting, validation, translations, parsing, reconciliation, planning, coordination, and decision support. You do not just swap software. You redesign work. We are now building customized, enterprise-grade software that replaces manual white-collar work, then standardizing the components so evolution and maintenance become automated. The real TAM is not SaaS spend. It is white-collar time spent on tasks computers are better at. That is where this goes.
Johan Roslund@futureinvesting

Lovable går verkligen efter SaaS. Från ett event med Lovables CTO idag. Ska bli intressant att följa… $Lime $Vitec $Upsales

English
31
35
291
97.2K
elvis
elvis@omarsar0·
@karpathy I have also been obsessed with building LLM knowledge bases. Here is one example of the type of things you can do that Karpathy is alluding to: x.com/omarsar0/statu… LLMs are excellent at curating and searching (finding connections) once data is stored properly.
elvis@omarsar0

Been exploring a new way to explore AI research papers to discover deeper insights. Agents are at the center of it. So far, I've built this little interactive artifact generator in my orchestrator to visualize things. This allows me to change views and insights (on-demand) from 100s of papers. Just scratching the surface here. More to share soon.

English
6
26
250
57.9K
Andrej Karpathy
Andrej Karpathy@karpathy·
LLM Knowledge Bases Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge (stored as markdown and images). The latest LLMs are quite good at it. So: Data ingest: I index source documents (articles, papers, repos, datasets, images, etc.) into a raw/ directory, then I use an LLM to incrementally "compile" a wiki, which is just a collection of .md files in a directory structure. The wiki includes summaries of all the data in raw/, backlinks, and then it categorizes data into concepts, writes articles for them, and links them all. To convert web articles into .md files I like to use the Obsidian Web Clipper extension, and then I also use a hotkey to download all the related images to local so that my LLM can easily reference them. IDE: I use Obsidian as the IDE "frontend" where I can view the raw data, the the compiled wiki, and the derived visualizations. Important to note that the LLM writes and maintains all of the data of the wiki, I rarely touch it directly. I've played with a few Obsidian plugins to render and view data in other ways (e.g. Marp for slides). Q&A: Where things get interesting is that once your wiki is big enough (e.g. mine on some recent research is ~100 articles and ~400K words), you can ask your LLM agent all kinds of complex questions against the wiki, and it will go off, research the answers, etc. I thought I had to reach for fancy RAG, but the LLM has been pretty good about auto-maintaining index files and brief summaries of all the documents and it reads all the important related data fairly easily at this ~small scale. Output: Instead of getting answers in text/terminal, I like to have it render markdown files for me, or slide shows (Marp format), or matplotlib images, all of which I then view again in Obsidian. You can imagine many other visual output formats depending on the query. Often, I end up "filing" the outputs back into the wiki to enhance it for further queries. So my own explorations and queries always "add up" in the knowledge base. Linting: I've run some LLM "health checks" over the wiki to e.g. find inconsistent data, impute missing data (with web searchers), find interesting connections for new article candidates, etc., to incrementally clean up the wiki and enhance its overall data integrity. The LLMs are quite good at suggesting further questions to ask and look into. Extra tools: I find myself developing additional tools to process the data, e.g. I vibe coded a small and naive search engine over the wiki, which I both use directly (in a web ui), but more often I want to hand it off to an LLM via CLI as a tool for larger queries. Further explorations: As the repo grows, the natural desire is to also think about synthetic data generation + finetuning to have your LLM "know" the data in its weights instead of just context windows. TLDR: raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki, and all of it viewable in Obsidian. You rarely ever write or edit the wiki manually, it's the domain of the LLM. I think there is room here for an incredible new product instead of a hacky collection of scripts.
English
2.3K
5.6K
48.3K
15M
Vilhelm von Ehrenheim
Have been using the same format for my own things the last few months. Keeping track of both investor calls, trends, product feedback and ideas. Really powerful. A bit surprised that this is such new to ppl. @AnthropicAI was clearly inpired by usecases like this in cc when creating cowork.
English
0
0
0
547
GitHub Projects Community
GitHub Projects Community@GithubProjects·
Google Stitch introduced a new concept: DESIGN . md Like README . md but for design systems. A plain markdown file that LLMs read to generate consistent UI. An awesome collection of DESIGN . md files inspired by developer-focused websites like Stripe, Vercel, Linear, Notion, Figma and more. Drop one into your project. Your AI coding agent builds the rest.
GitHub Projects Community tweet media
English
45
335
3.1K
396K
Daniel Langkilde
Daniel Langkilde@langkilde·
I've decided to become more active on X again after years of being away ✍️ Also: I was shocked to find that I've had an account on the platform for over 17 years 😱 I'm getting old... Anyway, why am I returning? Well, as complicated as my emotions about the new management might be, a lot of 𝗴𝗲𝗻𝘂𝗶𝗻𝗲𝗹𝘆 𝗶𝗻𝘁𝗲𝗿𝗲𝘀𝘁𝗶𝗻𝗴 people are on X, posting really insightful things. I also find that LinkedIn isn't great for high frequency stuff. My plan is to dual-post anything major to both platforms, and then post smaller, more high frequency things to X going forward. To get started, I cleaned up who I follow and organize everything into lists. You can see my starting set of 230 accounts in the pdf (if you are on LinkedIn) or through my profile (if you are on X). The script for cleaning my x account is here along with the account list: github.com/dlangk/x-admin Let's go 🚀
English
1
0
2
59
Boris Cherny
Boris Cherny@bcherny·
@Rahatcodes 👋 This is one of the signals we use to figure out if people are having a good experience. We put it on a dashboard and call it the “fucks” chart
English
277
167
4.7K
303.6K
rahat
rahat@Rahatcodes·
Claude Code has a regex that detects "wtf", "ffs", "piece of shit", "fuck you", "this sucks" etc. It doesn't change behavior...it just silently logs is_negative: true to analytics. Anthropic is tracking how often you rage at your AI Do with this information what you will
rahat tweet media
English
548
766
14.4K
1.4M
Vilhelm von Ehrenheim
Jade Rubick makes the case that QA should become “Automated Verification Engineers.” Fast, automated feedback on every PR. No gates. No handoffs. I’d go further. The AVE shouldn’t be a person. It should be an agent. rubick.com/should-qa-exis…
English
0
0
0
13
Vilhelm von Ehrenheim
If your infrastructure can’t let a new hire push to production safely on day one, you can’t let an agent do it either. The need for good engineering practices aren’t replaced by AI. They’re amplified by it.
English
0
0
0
17
Vilhelm von Ehrenheim
@hackernoon We built this at QA.tech. The shift is simple: stop telling agents which selectors to click. Tell them what should be true. Let them figure out the rest.
English
0
0
0
49
Christian Byrne 一人
Christian Byrne 一人@c__byrne·
How can I get a reply back from momentic's sales team? Please, we want to use it @MomenticAI Maybe good alternatives exist? Anyone have suggestions?
English
3
0
7
506
Lee Robinson
Lee Robinson@leerob·
You can now use Cursor with 30+ ACP clients, including OpenClaw 🦞 This means complete access to Composer 1.5, codebase indexing and semantic search, and more! Here's an example with avante.nvim
English
54
52
762
144K
Vilhelm von Ehrenheim
Stripe built "Minions." Ramp built "Inspect." Different companies, same architecture. Cloud sandboxes. Isolated environments. Agents running in parallel. 1,300 PRs/week (Stripe). 30% of all PRs (Ramp). Neither could do this on localhost.
English
0
0
0
30