Vilhelm von Ehrenheim (@while) - Twitter Profili

@karpathy just described LLM knowledge bases. Ingest, compile into a wiki, query, lint, repeat. We’ve been running this pattern in production for QA agents. The agent builds a knowledge graph of your product. Pages, flows, elements, relationships. It compounds over time. The next step is making it evolutionary. Agents that don’t just accumulate knowledge but learn what matters. Prune what doesn’t. Adapt their testing strategy based on what they’ve seen break before. Not RAG. Not static memory. Something closer to a living understanding of your product.

English

0

14

Vilhelm von Ehrenheim@while·6h

This is how I use Claude Code for content, presentations and research already. Raw sources in a directory, LLM compiles and cross-references, I rarely edit directly. Same pattern powers our QA agents at QA.tech. The agent builds a knowledge graph of the product it’s testing. Pages, flows, element relationships. Not selectors. Understanding. The “wiki as compounding artifact” framing is right. We call it a feature graph but the architecture is very similar.

English

0

12

Andrej Karpathy@karpathy·1d

Wow, this tweet went very viral! I wanted share a possibly slightly improved version of the tweet in an "idea file". The idea of the idea file is that in this era of LLM agents, there is less of a point/need of sharing the specific code/app, you just share the idea, then the other person's agent customizes & builds it for your specific needs. So here's the idea in a gist format: gist.github.com/karpathy/442a6… You can give this to your agent and it can build you your own LLM wiki and guide you on how to use it etc. It's intentionally kept a little bit abstract/vague because there are so many directions to take this in. And ofc, people can adjust the idea or contribute their own in the Discussion which is cool.

Andrej Karpathy@karpathy

LLM Knowledge Bases Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge (stored as markdown and images). The latest LLMs are quite good at it. So: Data ingest: I index source documents (articles, papers, repos, datasets, images, etc.) into a raw/ directory, then I use an LLM to incrementally "compile" a wiki, which is just a collection of .md files in a directory structure. The wiki includes summaries of all the data in raw/, backlinks, and then it categorizes data into concepts, writes articles for them, and links them all. To convert web articles into .md files I like to use the Obsidian Web Clipper extension, and then I also use a hotkey to download all the related images to local so that my LLM can easily reference them. IDE: I use Obsidian as the IDE "frontend" where I can view the raw data, the the compiled wiki, and the derived visualizations. Important to note that the LLM writes and maintains all of the data of the wiki, I rarely touch it directly. I've played with a few Obsidian plugins to render and view data in other ways (e.g. Marp for slides). Q&A: Where things get interesting is that once your wiki is big enough (e.g. mine on some recent research is ~100 articles and ~400K words), you can ask your LLM agent all kinds of complex questions against the wiki, and it will go off, research the answers, etc. I thought I had to reach for fancy RAG, but the LLM has been pretty good about auto-maintaining index files and brief summaries of all the documents and it reads all the important related data fairly easily at this ~small scale. Output: Instead of getting answers in text/terminal, I like to have it render markdown files for me, or slide shows (Marp format), or matplotlib images, all of which I then view again in Obsidian. You can imagine many other visual output formats depending on the query. Often, I end up "filing" the outputs back into the wiki to enhance it for further queries. So my own explorations and queries always "add up" in the knowledge base. Linting: I've run some LLM "health checks" over the wiki to e.g. find inconsistent data, impute missing data (with web searchers), find interesting connections for new article candidates, etc., to incrementally clean up the wiki and enhance its overall data integrity. The LLMs are quite good at suggesting further questions to ask and look into. Extra tools: I find myself developing additional tools to process the data, e.g. I vibe coded a small and naive search engine over the wiki, which I both use directly (in a web ui), but more often I want to hand it off to an LLM via CLI as a tool for larger queries. Further explorations: As the repo grows, the natural desire is to also think about synthetic data generation + finetuning to have your LLM "know" the data in its weights instead of just context windows. TLDR: raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki, and all of it viewable in Obsidian. You rarely ever write or edit the wiki manually, it's the domain of the LLM. I think there is room here for an incredible new product instead of a hacky collection of scripts.

English

793

2.2K

22K

4.8M

Vilhelm von Ehrenheim@while·7h

@ylecun @elonmusk My exact thought reading the post from @elonmusk

English

0

95

Yann LeCun@ylecun·11h

@elonmusk Thinking in language has limited applications, largely in coding and mathematics where the language itself can help reasoning. But, as I've been saying for years, thinking manipulates mental models in abstract (continuous) representation space. Soooo, xAI gonna use JEPA now?

English

20

11

257

25K

Elon Musk@elonmusk·1d

Hadamard thought in image space

English

3.1K

3.3K

52K

57.4M

Vilhelm von Ehrenheim@while·11h

Very much agree. This is still a pipe dream originating in extrapolation from hype. Anyone w more solid experience building these backbones knows it. If it is still too hard for top tier engineers to do an ERP migration we are pretty far from it being a reality using agents to build something new. But the future might ofc look different. Even if it does I find it hard to think that you want to spend your engineering effort on internal migrations and maintenance. Someone still needs to own it, understand everything you need to build, map out use cases and all custom logic. It is not as simple as it sounds just because you can vibe together an app in a weekend.

English

0

2

Fredrik Hjelm@FredrikHjelm4·18 Şub

Most people who say “AI will replace SaaS” have not replaced a single system in reality. They are vibe coders, solopreneurs, and self-proclaimed experts talking theory. We have done real replacements at Voi, at scale, on a modern tech and data stack, with elite engineering resources. Here is the truth. Critical systems like ERP and CRM are not getting ripped out in established companies. Forget it. You do not replace NetSuite, or Salesforce, lightly. Over years, millions of micro-improvements become embedded in finance, reporting, compliance, and operations. The data is too critical. The operational risk is too high. Net-new companies can go AI-native ERP from day one. Incumbents will not gamble their backbone. Mid- and long-tail SaaS is different. Narrow tools with limited surface area and clear workflows can be replaced. But even there, it is not about writing a prompt and deleting a subscription. You must own lifecycle management: integrations, permissions, data models, edge cases, upgrades, monitoring, and governance. That requires real engineering resources. Saving money on SaaS licenses is a nice headline. It is not the frontier. The frontier is replacing human labor. Cross-functional workflows: reporting, validation, translations, parsing, reconciliation, planning, coordination, and decision support. You do not just swap software. You redesign work. We are now building customized, enterprise-grade software that replaces manual white-collar work, then standardizing the components so evolution and maintenance become automated. The real TAM is not SaaS spend. It is white-collar time spent on tasks computers are better at. That is where this goes.

Johan Roslund@futureinvesting

Lovable går verkligen efter SaaS. Från ett event med Lovables CTO idag. Ska bli intressant att följa… $Lime $Vitec $Upsales

English

31

35

291

97.2K

Vilhelm von Ehrenheim@while·12h

@omarsar0 @karpathy How much do you actually use Obsidian? I find myself interacting more w the agent than looking at the md files.

English

0

46

elvis@omarsar0·3d

@karpathy I have also been obsessed with building LLM knowledge bases. Here is one example of the type of things you can do that Karpathy is alluding to: x.com/omarsar0/statu… LLMs are excellent at curating and searching (finding connections) once data is stored properly.

elvis@omarsar0

Been exploring a new way to explore AI research papers to discover deeper insights. Agents are at the center of it. So far, I've built this little interactive artifact generator in my orchestrator to visualize things. This allows me to change views and insights (on-demand) from 100s of papers. Just scratching the surface here. More to share soon.

English

6

26

250

57.9K

Andrej Karpathy@karpathy·3d

LLM Knowledge Bases Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge (stored as markdown and images). The latest LLMs are quite good at it. So: Data ingest: I index source documents (articles, papers, repos, datasets, images, etc.) into a raw/ directory, then I use an LLM to incrementally "compile" a wiki, which is just a collection of .md files in a directory structure. The wiki includes summaries of all the data in raw/, backlinks, and then it categorizes data into concepts, writes articles for them, and links them all. To convert web articles into .md files I like to use the Obsidian Web Clipper extension, and then I also use a hotkey to download all the related images to local so that my LLM can easily reference them. IDE: I use Obsidian as the IDE "frontend" where I can view the raw data, the the compiled wiki, and the derived visualizations. Important to note that the LLM writes and maintains all of the data of the wiki, I rarely touch it directly. I've played with a few Obsidian plugins to render and view data in other ways (e.g. Marp for slides). Q&A: Where things get interesting is that once your wiki is big enough (e.g. mine on some recent research is ~100 articles and ~400K words), you can ask your LLM agent all kinds of complex questions against the wiki, and it will go off, research the answers, etc. I thought I had to reach for fancy RAG, but the LLM has been pretty good about auto-maintaining index files and brief summaries of all the documents and it reads all the important related data fairly easily at this ~small scale. Output: Instead of getting answers in text/terminal, I like to have it render markdown files for me, or slide shows (Marp format), or matplotlib images, all of which I then view again in Obsidian. You can imagine many other visual output formats depending on the query. Often, I end up "filing" the outputs back into the wiki to enhance it for further queries. So my own explorations and queries always "add up" in the knowledge base. Linting: I've run some LLM "health checks" over the wiki to e.g. find inconsistent data, impute missing data (with web searchers), find interesting connections for new article candidates, etc., to incrementally clean up the wiki and enhance its overall data integrity. The LLMs are quite good at suggesting further questions to ask and look into. Extra tools: I find myself developing additional tools to process the data, e.g. I vibe coded a small and naive search engine over the wiki, which I both use directly (in a web ui), but more often I want to hand it off to an LLM via CLI as a tool for larger queries. Further explorations: As the repo grows, the natural desire is to also think about synthetic data generation + finetuning to have your LLM "know" the data in its weights instead of just context windows. TLDR: raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki, and all of it viewable in Obsidian. You rarely ever write or edit the wiki manually, it's the domain of the LLM. I think there is room here for an incredible new product instead of a hacky collection of scripts.

English

2.3K

5.6K

48.3K

15M

Vilhelm von Ehrenheim@while·12h

Have been using the same format for my own things the last few months. Keeping track of both investor calls, trends, product feedback and ideas. Really powerful. A bit surprised that this is such new to ppl. @AnthropicAI was clearly inpired by usecases like this in cc when creating cowork.

English

0

547

Vilhelm von Ehrenheim@while·4d

@GithubProjects Interesting idea to share good design context.

English

0

1

1.6K

GitHub Projects Community@GithubProjects·4d

Google Stitch introduced a new concept: DESIGN . md Like README . md but for design systems. A plain markdown file that LLMs read to generate consistent UI. An awesome collection of DESIGN . md files inspired by developer-focused websites like Stripe, Vercel, Linear, Notion, Figma and more. Drop one into your project. Your AI coding agent builds the rest.

English

45

335

3.1K

396K

Vilhelm von Ehrenheim@while·4d

@langkilde Love this! I decided to do the same. Apparently I’m even older on here. Joined 2007. 😅

English

0

7

Daniel Langkilde@langkilde·29 Mar

I've decided to become more active on X again after years of being away ✍️ Also: I was shocked to find that I've had an account on the platform for over 17 years 😱 I'm getting old... Anyway, why am I returning? Well, as complicated as my emotions about the new management might be, a lot of 𝗴𝗲𝗻𝘂𝗶𝗻𝗲𝗹𝘆 𝗶𝗻𝘁𝗲𝗿𝗲𝘀𝘁𝗶𝗻𝗴 people are on X, posting really insightful things. I also find that LinkedIn isn't great for high frequency stuff. My plan is to dual-post anything major to both platforms, and then post smaller, more high frequency things to X going forward. To get started, I cleaned up who I follow and organize everything into lists. You can see my starting set of 230 accounts in the pdf (if you are on LinkedIn) or through my profile (if you are on X). The script for cleaning my x account is here along with the account list: github.com/dlangk/x-admin Let's go 🚀

English

1

0

2

59

Vilhelm von Ehrenheim@while·4d

@bcherny @Rahatcodes Haha i love this.

English

0

3

Boris Cherny@bcherny·4d

@Rahatcodes 👋 This is one of the signals we use to figure out if people are having a good experience. We put it on a dashboard and call it the “fucks” chart

English

277

167

4.7K

303.6K

rahat@Rahatcodes·5d

Claude Code has a regex that detects "wtf", "ffs", "piece of shit", "fuck you", "this sucks" etc. It doesn't change behavior...it just silently logs is_negative: true to analytics. Anthropic is tracking how often you rage at your AI Do with this information what you will

English

548

766

14.4K

1.4M

Vilhelm von Ehrenheim@while·5d

Really good breakdown. Regardless if the leak is true or not this is a great blueprint for designing agents.

Sebastian Raschka@rasbt

x.com/i/article/2038…

English

0

47

Vilhelm von Ehrenheim@while·5d

Jade Rubick makes the case that QA should become “Automated Verification Engineers.” Fast, automated feedback on every PR. No gates. No handoffs. I’d go further. The AVE shouldn’t be a person. It should be an agent. rubick.com/should-qa-exis…

English

0

13

Vilhelm von Ehrenheim@while·6d

If your infrastructure can’t let a new hire push to production safely on day one, you can’t let an agent do it either. The need for good engineering practices aren’t replaced by AI. They’re amplified by it.

English

0

17

Vilhelm von Ehrenheim retweetledi

elvis@omarsar0·6d

Computer use now available in Claude Code. Waited a long time for this.

Claude@claudeai

Computer use is now in Claude Code. Claude can open your apps, click through your UI, and test what it built, right from the CLI. Now in research preview on Pro and Max plans.

English

9

4

69

11.2K

Vilhelm von Ehrenheim@while·6d

@hackernoon We built this at QA.tech. The shift is simple: stop telling agents which selectors to click. Tell them what should be true. Let them figure out the rest.

English

0

49

Vilhelm von Ehrenheim retweetledi

HackerNoon | Learn Any Technology@hackernoon·25 Mar

If your tests break every time the UI changes… Are they testing anything? This deep dive explores agentic testing and a better approach to QA 👇 hackernoon.com/what-is-agenti…

English

22

64

942

4.2M

Vilhelm von Ehrenheim@while·20 Mar

@ManuelDelVerme @silverstreamAI @ServiceNowRSRCH @nvidia @IBM @thealliance_ai Really cool! Congratulations on the launch!! 🚀

English

0

36

Manuel Del Verme@ManuelDelVerme·19 Mar

the AI diffusion bottleneck is reliability. not capability. most teams don't have the resources to measure agents. the right way to transition to agents safely is open evals infrastructure. that's what @silverstreamAI @ServiceNowRSRCH @nvidia @IBM @thealliance_ai are doing

English

4

6

21

3K

Vilhelm von Ehrenheim@while·17 Mar

@c__byrne @MomenticAI QA.tech is a great alternative. 😉

English

0

10

Christian Byrne 一人@c__byrne·13 Mar

How can I get a reply back from momentic's sales team? Please, we want to use it @MomenticAI Maybe good alternatives exist? Anyone have suggestions?

English

3

0

7

506

Vilhelm von Ehrenheim@while·14 Mar

@Thom_Wolf Very nice! 👌

English

0

52

Vilhelm von Ehrenheim@while·5 Mar

@leerob 🔥

QME

0

43

Lee Robinson@leerob·4 Mar

You can now use Cursor with 30+ ACP clients, including OpenClaw 🦞 This means complete access to Composer 1.5, codebase indexing and semantic search, and more! Here's an example with avante.nvim

English

54

52

762

144K

Vilhelm von Ehrenheim@while·5 Mar

Stripe built "Minions." Ramp built "Inspect." Different companies, same architecture. Cloud sandboxes. Isolated environments. Agents running in parallel. 1,300 PRs/week (Stripe). 30% of all PRs (Ramp). Neither could do this on localhost.

English

0

30

Vilhelm von Ehrenheim

Keşfet