ʞooH ɯlǝsu∀

19.9K posts

ʞooH ɯlǝsu∀ banner
ʞooH ɯlǝsu∀

ʞooH ɯlǝsu∀

@anselm

Whole systems, ecology, artist, father, alumni of @mozilla, @parcinc. 🇨🇦, alberta, sfo https://t.co/I8AT92TPHz

San Francisco, CA Katılım Mart 2007
3.2K Takip Edilen2.5K Takipçiler
ʞooH ɯlǝsu∀ retweetledi
Pablo Vela
Pablo Vela@pablovelagomez1·
I've migrated the old Mast3r-SLAM example I had made last year to the latest version of @rerundotio and made a bunch of improvements! I wanted to spend some time with agents to modernize it. Here's an example of me walking around with my iPhone and getting a dense reconstruction at about 10FPS on a 5090. Heres the following improvements I made. Brought it into the monorepo with proper packaging: • Using @prefix_dev pixi-build to get rid of all the mast3r/asmk/lietorch vendored code with just a few small patches. This let me remove so 60k lines of code from the repo! • Don't have to build the lietorch code on my machine anymore, which was taking ~10 minutes to compile (and also made it work on blackwell when it previously did not) Rebuilt the @Gradio interface: • Fixed incremental updates, .MOV uploads, and stop behavior • Made the CLI + Gradio interface share the same entry point so updates automatically propagate Upgraded the @rerundotio integration: • Switched to a multiprocessing async logging strategy • Added video/pointmap/confidence logging • Improved blueprint layout and hid noisy entities from 3D view • Biggest perf win was the async background logger - documented about a ~2.5x speedup from decoupling logging from tracking The newest and most interesting part was my attempt to replace the CUDA kernels for Gauss-Newton ray matching with a @Modular Mojo backend. As a Python dev, every time I look at CUDA code I basically shy away as it's pretty difficult for me to understand. Mojo let me rewrite the matching logic in a syntax I'm more comfortable with while still getting near-CUDA performance. Mojo is now the default matching backend with CUDA fallback. One major piece that's missing is the custom PyTorch op path, but I'll eventually do that as well. I heavily leaned on Claude Code to do the CUDA → Mojo migration, and I have no doubt it's not the cleanest or most idiomatic, BUT it's way more readable for me and helps me better understand the underlying algorithm. This was a ton of work, and a large part of why I'm doing it is how the monorepo compounds. This becomes an artifact for the next example I want to build with Claude that I can point to, which will make it even faster to implement. The compounding nature of this is really interesting and part of why I'm spending so much time trying to make things nice and readable.
Pablo Vela@pablovelagomez1

We have HOT3D! I've started using Claude to port more datasets into @rerundotio and exoego-forge. I'd been meaning to bring in the HOT3D dataset from Meta for a while, but with Claude, it's way easier. My goal is to take any egocentric, exocentric, or both datasets and ingest them into a standardized schema. Getting everything into Rerun means we can easily query and transform data via the in-memory OSS server. This lets us generate SQL-like queries such as: "Find me all frames that only contain left hands in the leftmost camera view." Most people think of Rerun as a viewer, but this is the actual superpower. So far we have: 1. HOT3D 2. Hocap 3. UmeTrack 4. Assembly101 5. EgoDex Planning to add more, and with every addition, it gets easier as we build up agent skills and better code examples. Hoping to make it almost fully automatic for adding new datasets. The next few I'm looking at are Harmony4D and Aria Pilot Gen2 After we have enough samples, I'll work on bringing in all the different algorithms I've worked on to transform the data 🙂

English
6
24
212
16K
ʞooH ɯlǝsu∀ retweetledi
Linus ✦ Ekenstam
Linus ✦ Ekenstam@LinusEkenstam·
Architects are AI maxxing. Turning rubble into precious building material in no time at all. by Philmann Architects and Taaod Studio.
English
28
46
456
48.1K
ʞooH ɯlǝsu∀ retweetledi
How To AI
How To AI@HowToAI_·
🚨 MIT proved you can delete 90% of a neural network without losing accuracy. Researchers found that inside every massive model, there is a "winning ticket”, a tiny subnetwork that does all the heavy lifting. They proved if you find it and reset it to its original state, it performs exactly like the giant version. But there was a catch that killed adoption instantly.. you had to train the massive model first to find the ticket. nobody wanted to train twice just to deploy once. it was a cool academic flex, but useless for production. The original 2018 paper was mind-blowing: But today, after 8 years… We finally have the silicon-level breakthrough we were waiting for: structured sparsity. Modern GPUs (NVIDIA Ampere+) don’t just “simulate” pruning anymore. They have native support for block sparsity (2:4 patterns) built directly into the hardware. It’s not theoretical, it’s silicon-level acceleration. The math is terrifyingly good: a 90% sparse network = 50% less memory bandwidth + 2× compute throughput. Real speed.. zero accuracy loss. Three things just made this production-ready in 2026: - pruning-aware training (you train sparse from day one) - native support in pytorch 2.0 and the apple neural engine - the realization that ai models are 90% redundant by design Evolution over-parameterizes everything. We’re finally learning how to prune. The era of bloated, inefficient models is officially over. The tooling finally caught up to the theory, and the winners are going to be the ones who stop paying for 90% of weights they don’t even need. The future of AI is smaller, faster, and smarter.
How To AI tweet media
English
176
828
5.4K
329.3K
ʞooH ɯlǝsu∀ retweetledi
Project PLATEAU
Project PLATEAU@ProjectPlateau·
【整備都市が300都市を越えました!】 25年度 #PLATEAU で整備した全国の3D都市モデルをG空間情報センターにオープンデータとして公開しています。 今年度もさらなるデータ整備・更新を進めてまいります。ぜひご活用ください! ▼オープンデータポータル front.geospatial.jp/plateau_portal…
Project PLATEAU tweet mediaProject PLATEAU tweet media
日本語
1
68
449
20.9K
ʞooH ɯlǝsu∀ retweetledi
himanshu
himanshu@himanshustwts·
Based on everything explored in the source code, here's the full technical recipe behind Claude Code's memory architecture: [shared by claude code] Claude Code’s memory system is actually insanely well-designed. It isn't like “store everything” but constrained, structured and self-healing memory. The architecture is doing a few very non-obvious things: > Memory = index, not storage + MEMORY.md is always loaded, but it’s just pointers (~150 chars/line) + actual knowledge lives outside, fetched only when needed > 3-layer design (bandwidth aware) + index (always) + topic files (on-demand) + transcripts (never read, only grep’d) > Strict write discipline + write to file → then update index + never dump content into the index + prevents entropy / context pollution > Background “memory rewriting” (autoDream) + merges, dedupes, removes contradictions + converts vague → absolute + aggressively prunes + memory is continuously edited, not appended > Staleness is first-class + if memory ≠ reality → memory is wrong + code-derived facts are never stored + index is forcibly truncated > Isolation matters + consolidation runs in a forked subagent + limited tools → prevents corruption of main context > Retrieval is skeptical, not blind + memory is a hint, not truth + model must verify before using > What they don’t store is the real insight + no debugging logs, no code structure, no PR history + if it’s derivable, don’t persist it
himanshu tweet media
English
154
690
6.3K
822.6K
ʞooH ɯlǝsu∀ retweetledi
asim ᯅ
asim ᯅ@asimahmed·
high-fidelity gaussian splat i took a few weeks ago. less than 4 mins scanning with 360° camera. powered by @nianticspatial.
English
9
25
385
38.3K
ʞooH ɯlǝsu∀ retweetledi
Gabriele Romagnoli
Gabriele Romagnoli@GabRoXR·
Big announcements from @NianticSpatial today 👇 • @scaniverse adds a web portal, cloud-based project management, multi-user collaboration, support for 360 camera inputs, and tighter integration with VPS and downstream workflows. • VPS 2.0 now works globally without pre-scanning. It combines high-precision 6DoF in mapped areas with GPS correction elsewhere, making localization continuous instead of location-dependent. • NSDK 4.0 (coming soon) connects directly to Scaniverse and VPS, expanding support to Unity, Swift, Android, and ROS 2 for both apps and robotics use cases.
English
0
11
117
8.4K
ʞooH ɯlǝsu∀ retweetledi
Felix Heide
Felix Heide@_FelixHeide_·
WorldFlow3D: Unbounded 3D World Generation 🌍 by Flow Through Hierarchical Distributions, without VAEs ! We reformulate 3D generation as flowing through sequentially finer 3D distributions, cutting training time by more than half ⏱️ compared to existing approaches! Vectorized map layouts provide full scene controllability 🗺️, and a novel flow-field alignment process enables causally coherent, spatially unbounded generation 🌍. This generative method generalizes across both real and synthetic data distributions! Project: light.princeton.edu/worldflow3d Project led by @amogh7joshi and Julian Ost — will be super fun to build on this! 🔥
English
2
27
235
19K
ʞooH ɯlǝsu∀ retweetledi
Philosophy Of Physics
Philosophy Of Physics@PhilosophyOfPhy·
Visualization of Hooke’s Law (F = -kx) onto human movement by treating joints as anchors in a spring-mass lattice, every extension generates real-time tension, radiating force field vectors that turn the stage into a living physics engine.
English
22
131
789
61.1K
ʞooH ɯlǝsu∀ retweetledi
Ben Sigman
Ben Sigman@bensig·
30 second explanation of the MemPalace by Milla Jovovich. By day she’s filming action movies, walking Miu Miu fashion shows, and being a mom. By night she’s coding. She’s the most creative, brilliant, and hilarious person I know. I’m honored to be working with her on this project… more to come.
English
220
555
5.3K
1.4M
ʞooH ɯlǝsu∀ retweetledi
sasaki@engineer
sasaki@engineer@rsasaki0109·
City2Graph: GeoAI with Graph Neural Networks (GNNs) and Spatial Network Analysis - Graph Construction for GeoAI: Build graphs from diverse urban datasets, including buildings, streets, and land use, to power GeoAI and GNN applications. - Transportation Network Modeling: Query GTFS feeds through DuckDB and construct detailed transit graphs for accessibility and service analysis. - Proximity and Contiguity Analysis: Create graphs based on spatial proximity and adjacency, including multi-center distance filtering and layered isochrones. - Mobility Flow Analysis: Model and analyze urban mobility patterns from various data sources like bike-sharing, migration, and pedestrian flows. - PyTorch Geometric Integration: Seamlessly convert geospatial data into PyTorch tensors for GNNs.
sasaki@engineer tweet media
English
2
56
293
14.4K
ʞooH ɯlǝsu∀ retweetledi
Yohan
Yohan@yohaniddawela·
A team of researchers just traced the supply chain of every building and bridge across 1,000 cities. The resulting carbon data makes our current housing targets look mathematically impossible. For years, we lacked a way to calculate the true environmental cost of urban growth. Global economic models track the flow of money and ecological impacts at the national level. We knew exactly how much carbon a country emitted making steel and concrete. We just couldn't trace those materials down to the specific metropolitan areas consuming them. So a team built a top-down allocation model to find out. They took 20 years of global input-output data and merged it with local economic proxies like construction employment and regional GDP. They effectively generated an itemised receipt for the embodied carbon of every highway, pipeline, and residential tower built in major global cities. Then they calculated strict cumulative carbon budgets. They took the remaining global carbon allowance required to stay below 2°C of warming and divided it up. They allocated shares to specific sectors and then distributed those shares to individual cities based on population and historic emission rates. A city like Montréal gets a hard mathematical limit on how much carbon its construction sector can emit from 2020 onwards. This is where the climate data collides with the housing crisis. Most major cities are projecting massive population growth and authorising immense residential construction programmes to match. Toronto plans to build hundreds of thousands of new units by 2031. The researchers calculated the material intensity of this future housing stock. They pulled data on the concrete, steel, brick, and glass required for different building types. They then ran complex simulations to model the life-cycle emissions of manufacturing those exact materials. When you multiply the required new floor space by the embodied carbon of standard construction materials, the budgets immediately fail. Building enough housing to meet projected population growth using our current supply chains guarantees a massive carbon overshoot. The gap between these two realities forces a brutal choice. We either accept that we will blow past our remaining global carbon budget, or we drastically change how we build. Meeting future housing demands within the required climate limits demands an immediate shift away from high-emission concrete and steel toward timber and radical material efficiency. The maths is completely unforgiving. You can have the necessary housing, or you can use traditional building methods. You can't have both. Link to article: nature.com/articles/s4428…
Yohan tweet media
English
14
79
325
29.4K
ʞooH ɯlǝsu∀ retweetledi
Corey Ganim
Corey Ganim@coreyganim·
Best breakdown of Karpathy's "second brain" system I've seen. My co-founder turned it into an actual step-by-step build. The 80/20: 1. Three folders: raw/ (dump everything), wiki/ (AI organizes it), outputs/ (AI answers your questions) 2. One schema file (CLAUDE.md) that tells the AI how to organize your knowledge. Copy the template in the article. 3. Don't organize anything by hand. Drop raw files in, tell the AI "compile the wiki." Walk away. 4. Ask questions against your own knowledge base. Save the answers back. Every question makes the next one better. 5. Monthly health check: have the AI flag contradictions, missing sources, and gaps. 6. Skip Obsidian. A folder of .md files and a good schema beats 47 plugins every time. He includes a free skill that scaffolds the whole system in 60 seconds.
Nick Spisak@NickSpisak_

x.com/i/article/2040…

English
71
309
3.3K
578.5K
ʞooH ɯlǝsu∀
ʞooH ɯlǝsu∀@anselm·
@karpathy Simplifying data and increasing transparency is nice but we can do more. Governments and corporations have the power to model or simulate near term outcomes of land use law and policy. Civic models could let anybody playtest the future - bring more minds to bear on big issues.
English
0
0
0
469
ʞooH ɯlǝsu∀ retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
Something I've been thinking about - I am bullish on people (empowered by AI) increasing the visibility, legibility and accountability of their governments. Historically, it is the governments that act to make society legible (e.g. "Seeing like a state" is the common reference), but with AI, society can dramatically improve its ability to do this in reverse. Government accountability has not been constrained by access (the various branches of government publish an enormous amount of data), it has been constrained by intelligence - the ability to process a lot of raw data, combine it with domain expertise and derive insights. As an example, the 4000-page omnibus bill is "transparent" in principle and in a legal sense, but certainly not in a practical sense for most people. There's a lot more like it: laws, spending bills, federal budgets, freedom of information act responses, lobbying disclosures... Only a few highly trained professionals (investigative journalists) could historically process this information. This bottleneck might dissolve - not only are the professionals further empowered, but a lot more people can participate. Some examples to be precise: Detailed accounting of spending and budgets, diff tracking of legislation, individual voting trends w.r.t. stated positions or speeches, lobbying and influence (e.g. graph of lobbyist -> firm -> client -> legislator -> committee -> vote -> regulation), procurement and contracting, regulatory capture warning lights, judicial and legal patterns, campaign finance... Local governments might be even more interesting because the governed population is smaller so there is less national coverage: city council meetings, decisions around zoning, policing, schools, utilities... Certainly, the same tools can easily cut the other way and it's worth being very mindful of that, but I lean optimistic overall that added participation, transparency and accountability will improve democratic, free societies. (the quoted tweet is half-ish related, but inspired me to post some recent thoughts)
Harry Rushworth@Hrushworth

The British Government is a complicated beast. Dozens of departments, hundreds of public bodies, more corporations than one can count... Such is its complexity that there isn't an org chart for it. Well, there wasn't... Introducing ⚙️Machinery of Government⚙️

English
391
701
5.8K
849.4K
ʞooH ɯlǝsu∀ retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
Wow, this tweet went very viral! I wanted share a possibly slightly improved version of the tweet in an "idea file". The idea of the idea file is that in this era of LLM agents, there is less of a point/need of sharing the specific code/app, you just share the idea, then the other person's agent customizes & builds it for your specific needs. So here's the idea in a gist format: gist.github.com/karpathy/442a6… You can give this to your agent and it can build you your own LLM wiki and guide you on how to use it etc. It's intentionally kept a little bit abstract/vague because there are so many directions to take this in. And ofc, people can adjust the idea or contribute their own in the Discussion which is cool.
Andrej Karpathy@karpathy

LLM Knowledge Bases Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge (stored as markdown and images). The latest LLMs are quite good at it. So: Data ingest: I index source documents (articles, papers, repos, datasets, images, etc.) into a raw/ directory, then I use an LLM to incrementally "compile" a wiki, which is just a collection of .md files in a directory structure. The wiki includes summaries of all the data in raw/, backlinks, and then it categorizes data into concepts, writes articles for them, and links them all. To convert web articles into .md files I like to use the Obsidian Web Clipper extension, and then I also use a hotkey to download all the related images to local so that my LLM can easily reference them. IDE: I use Obsidian as the IDE "frontend" where I can view the raw data, the the compiled wiki, and the derived visualizations. Important to note that the LLM writes and maintains all of the data of the wiki, I rarely touch it directly. I've played with a few Obsidian plugins to render and view data in other ways (e.g. Marp for slides). Q&A: Where things get interesting is that once your wiki is big enough (e.g. mine on some recent research is ~100 articles and ~400K words), you can ask your LLM agent all kinds of complex questions against the wiki, and it will go off, research the answers, etc. I thought I had to reach for fancy RAG, but the LLM has been pretty good about auto-maintaining index files and brief summaries of all the documents and it reads all the important related data fairly easily at this ~small scale. Output: Instead of getting answers in text/terminal, I like to have it render markdown files for me, or slide shows (Marp format), or matplotlib images, all of which I then view again in Obsidian. You can imagine many other visual output formats depending on the query. Often, I end up "filing" the outputs back into the wiki to enhance it for further queries. So my own explorations and queries always "add up" in the knowledge base. Linting: I've run some LLM "health checks" over the wiki to e.g. find inconsistent data, impute missing data (with web searchers), find interesting connections for new article candidates, etc., to incrementally clean up the wiki and enhance its overall data integrity. The LLMs are quite good at suggesting further questions to ask and look into. Extra tools: I find myself developing additional tools to process the data, e.g. I vibe coded a small and naive search engine over the wiki, which I both use directly (in a web ui), but more often I want to hand it off to an LLM via CLI as a tool for larger queries. Further explorations: As the repo grows, the natural desire is to also think about synthetic data generation + finetuning to have your LLM "know" the data in its weights instead of just context windows. TLDR: raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki, and all of it viewable in Obsidian. You rarely ever write or edit the wiki manually, it's the domain of the LLM. I think there is room here for an incredible new product instead of a hacky collection of scripts.

English
1K
2.7K
25.7K
6.4M
ʞooH ɯlǝsu∀ retweetledi
Alex Prompter
Alex Prompter@alex_prompter·
Holy shit. Stanford just showed that the biggest performance gap in AI systems isn't the model it's the harness. The code wrapping the model. And they built a system that writes better harnesses automatically than humans can by hand. > +7.7 points. 4x fewer tokens. > #1 ranking on an actively contested benchmark. The harness is the code that decides what information an AI model sees at each step what to store, what to retrieve, what context to show. Changing the harness around a fixed model can produce a 6x performance gap on the same benchmark. Most practitioners know this empirically. What nobody had done was automate the process of finding better harnesses. Stanford's Meta-Harness does exactly that: it runs a coding agent in a loop, gives it access to every prior harness it has tried along with the full execution traces and scores, and lets it propose better ones. The agent reads raw code and failure logs not summaries, not scalar scores and figures out why things broke. The key insight is about information. Every prior automated optimization method compressed feedback before handing it to the optimizer. > Scalar scores only. > LLM-generated summaries. > Short templates. Stanford's finding is that this compression destroys exactly the signal you need for harness engineering. A single design choice about what to store in memory can cascade through hundreds of downstream steps. You cannot debug that from a summary. Meta-Harness gives the proposer a filesystem containing every prior harness's source code, execution traces, and scores up to 10 million tokens of diagnostic information per evaluation and lets it use grep and cat to read whatever it needs. Prior methods worked with 100 to 30,000 tokens of feedback. Meta-Harness works with 3 orders of magnitude more. The TerminalBench-2 search trajectory reveals what this actually looks like in practice. The agent ran for 10 iterations on an actively contested coding benchmark. In iterations 1 and 2, it bundled structural fixes with prompt rewrites and both regressed. In iteration 3, it explicitly identified the confound: the prompt changes were the common failure factor, not the structural fixes. It isolated the structural changes, tested them alone, and observed the smallest regression yet. Over the next 4 iterations it kept probing why completion-flow edits were fragile citing specific tasks and turn counts from prior traces as evidence. By iteration 7 it pivoted entirely: instead of modifying the control loop, it added a single environment snapshot before the agent starts, gathering what tools and languages are available in one shell command. That 80-line additive change became the best candidate in the run and ranked #1 among all Haiku 4.5 agents on the benchmark. The numbers across all three domains: → Text classification vs best hand-designed harness (ACE): +7.7 points accuracy, 4x fewer context tokens → Text classification vs best automated optimizer (OpenEvolve, TTT-Discover): matches their final performance in 4 evaluations vs their 60, then surpasses by 10+ points → Full interface vs scores-only ablation: median accuracy 50.0 vs 34.6 raw execution traces are the critical ingredient, summaries don't recover the gap → IMO-level math: +4.7 points average across 5 held-out models that were never seen during search → IMO math: discovered retrieval harness transfers across GPT-5.4-nano, GPT-5.4-mini, Gemini-3.1-Flash-Lite, Gemini-3-Flash, and GPT-OSS-20B → TerminalBench-2 with Haiku 4.5: 37.6% #1 among all reported Haiku 4.5 agents, beating Goose (35.5%) and Terminus-KIRA (33.7%) → TerminalBench-2 with Opus 4.6: 76.4% #2 overall, beating all hand-engineered agents except one whose result couldn't be reproduced from public code → Out-of-distribution text classification on 9 unseen datasets: 73.1% average vs ACE's 70.2% The math harness discovery is the cleanest demonstration of what automated search actually finds. Stanford gave Meta-Harness a corpus of 535,000 solved math problems and told it to find a better retrieval strategy for IMO-level problems. What emerged after 40 iterations was a four-route lexical router: combinatorics problems get deduplicated BM25 with difficulty reranking, geometry problems get one hard reference plus two raw BM25 neighbors, number theory gets reranked toward solutions that state their technique early, and everything else gets adaptive retrieval based on how concentrated the top scores are. Nobody designed this. The agent discovered that different problem types need different retrieval policies by reading through failure traces and iterating on what broke. The ablation table is the most important result in the paper. > Scores only: median 34.6, best 41.3. > Scores plus LLM-generated summary: median 34.9, best 38.7. > Full execution traces: median 50.0, best 56.7. Summaries made things slightly worse than scores alone. The raw traces the actual prompts, tool calls, model outputs, and state updates from every prior run are what drive the improvement. This is not a marginal difference. The full interface outperforms the compressed interface by 15 points at median. Harness engineering requires debugging causal chains across hundreds of steps. You cannot compress that signal. The model has been the focus of the entire AI industry for the last five years. Stanford just showed the wrapper around the model matters just as much and that AI can now write better wrappers than humans can.
Alex Prompter tweet media
English
38
100
767
109.2K
ʞooH ɯlǝsu∀ retweetledi
Science girl
Science girl@sciencegirl·
Creative games in Japan helping elderly residents stay active and engaged
English
161
2.9K
19.7K
1.8M
ʞooH ɯlǝsu∀ retweetledi
Julien Chaumond
Julien Chaumond@julien_c·
Just do this: brew install llama.cpp --HEAD Then; llama-server -hf ggml-org/gemma-4-26B-A4B-it-GGUF:Q4_K_M
Julien Chaumond tweet media
English
52
227
2.4K
191.1K
ʞooH ɯlǝsu∀ retweetledi
andy nguyen
andy nguyen@kevinnguyendn·
Karpathy just validated the exact architecture we open-sourced today. Markdown vaults are the endgame for AI memory. But instead of running manual LLM compilation steps, we built ByteRover to handle it automatically. It gives you the human-readable files of Obsidian, but the backend automatically creates nodes, links, and context graphs for agents (OpenClaw, Claude Code, etc.) to use natively. If you want this "second brain" out of the box, we just open-sourced it: x.com/kevinnguyendn/…
Andrej Karpathy@karpathy

LLM Knowledge Bases Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge (stored as markdown and images). The latest LLMs are quite good at it. So: Data ingest: I index source documents (articles, papers, repos, datasets, images, etc.) into a raw/ directory, then I use an LLM to incrementally "compile" a wiki, which is just a collection of .md files in a directory structure. The wiki includes summaries of all the data in raw/, backlinks, and then it categorizes data into concepts, writes articles for them, and links them all. To convert web articles into .md files I like to use the Obsidian Web Clipper extension, and then I also use a hotkey to download all the related images to local so that my LLM can easily reference them. IDE: I use Obsidian as the IDE "frontend" where I can view the raw data, the the compiled wiki, and the derived visualizations. Important to note that the LLM writes and maintains all of the data of the wiki, I rarely touch it directly. I've played with a few Obsidian plugins to render and view data in other ways (e.g. Marp for slides). Q&A: Where things get interesting is that once your wiki is big enough (e.g. mine on some recent research is ~100 articles and ~400K words), you can ask your LLM agent all kinds of complex questions against the wiki, and it will go off, research the answers, etc. I thought I had to reach for fancy RAG, but the LLM has been pretty good about auto-maintaining index files and brief summaries of all the documents and it reads all the important related data fairly easily at this ~small scale. Output: Instead of getting answers in text/terminal, I like to have it render markdown files for me, or slide shows (Marp format), or matplotlib images, all of which I then view again in Obsidian. You can imagine many other visual output formats depending on the query. Often, I end up "filing" the outputs back into the wiki to enhance it for further queries. So my own explorations and queries always "add up" in the knowledge base. Linting: I've run some LLM "health checks" over the wiki to e.g. find inconsistent data, impute missing data (with web searchers), find interesting connections for new article candidates, etc., to incrementally clean up the wiki and enhance its overall data integrity. The LLMs are quite good at suggesting further questions to ask and look into. Extra tools: I find myself developing additional tools to process the data, e.g. I vibe coded a small and naive search engine over the wiki, which I both use directly (in a web ui), but more often I want to hand it off to an LLM via CLI as a tool for larger queries. Further explorations: As the repo grows, the natural desire is to also think about synthetic data generation + finetuning to have your LLM "know" the data in its weights instead of just context windows. TLDR: raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki, and all of it viewable in Obsidian. You rarely ever write or edit the wiki manually, it's the domain of the LLM. I think there is room here for an incredible new product instead of a hacky collection of scripts.

English
33
42
828
112.7K