Poincaré

2.6K posts

Poincaré

@diffset

Do you know my header picture?

انضم Ekim 2014

460 يتبع38 المتابعون

تغريدة مثبتة

Poincaré@diffset·23 Nis

茨威格是个精神活在旧时代的宅男。他也死在了新时代的黎明里。他的观点落后中国古人一个身位的精神境界：中国人讲究读万卷书，行万里路。没有实践的阅读，是脑力工作者的精神自杀。

停雲@tingyun97

#世界读书日读书人共勉。

中文

877

Poincaré@diffset·6h

@ying18474850 别急，再等等😉

中文

101

坏婆娘🇺🇸@ying18474850·9h

拜登同时期加州油价最高8块多，现在最高6块多，我很欣慰😌

未来昔日Future Past@NOone43475505

@ying18474850 @WeipingQin 这些二逼，平时口口声声公平正义自由民主爱国爱民爱世界，大义凛然。一旦触碰到一点儿个人利益，就嗷嗷狂吠，都神马东西!

中文

4.6K

Poincaré أُعيد تغريده

mohit@mohitwt_·14h

x.com/i/article/2036…

ZXX

167

9.8K

Poincaré أُعيد تغريده

Tw93@HiTw93·15h

x.com/i/article/2040…

ZXX

135

735

108.8K

Poincaré أُعيد تغريده

Andrej Karpathy@karpathy·1d

LLM Knowledge Bases Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge (stored as markdown and images). The latest LLMs are quite good at it. So: Data ingest: I index source documents (articles, papers, repos, datasets, images, etc.) into a raw/ directory, then I use an LLM to incrementally "compile" a wiki, which is just a collection of .md files in a directory structure. The wiki includes summaries of all the data in raw/, backlinks, and then it categorizes data into concepts, writes articles for them, and links them all. To convert web articles into .md files I like to use the Obsidian Web Clipper extension, and then I also use a hotkey to download all the related images to local so that my LLM can easily reference them. IDE: I use Obsidian as the IDE "frontend" where I can view the raw data, the the compiled wiki, and the derived visualizations. Important to note that the LLM writes and maintains all of the data of the wiki, I rarely touch it directly. I've played with a few Obsidian plugins to render and view data in other ways (e.g. Marp for slides). Q&A: Where things get interesting is that once your wiki is big enough (e.g. mine on some recent research is ~100 articles and ~400K words), you can ask your LLM agent all kinds of complex questions against the wiki, and it will go off, research the answers, etc. I thought I had to reach for fancy RAG, but the LLM has been pretty good about auto-maintaining index files and brief summaries of all the documents and it reads all the important related data fairly easily at this ~small scale. Output: Instead of getting answers in text/terminal, I like to have it render markdown files for me, or slide shows (Marp format), or matplotlib images, all of which I then view again in Obsidian. You can imagine many other visual output formats depending on the query. Often, I end up "filing" the outputs back into the wiki to enhance it for further queries. So my own explorations and queries always "add up" in the knowledge base. Linting: I've run some LLM "health checks" over the wiki to e.g. find inconsistent data, impute missing data (with web searchers), find interesting connections for new article candidates, etc., to incrementally clean up the wiki and enhance its overall data integrity. The LLMs are quite good at suggesting further questions to ask and look into. Extra tools: I find myself developing additional tools to process the data, e.g. I vibe coded a small and naive search engine over the wiki, which I both use directly (in a web ui), but more often I want to hand it off to an LLM via CLI as a tool for larger queries. Further explorations: As the repo grows, the natural desire is to also think about synthetic data generation + finetuning to have your LLM "know" the data in its weights instead of just context windows. TLDR: raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki, and all of it viewable in Obsidian. You rarely ever write or edit the wiki manually, it's the domain of the LLM. I think there is room here for an incredible new product instead of a hacky collection of scripts.

English

1.8K

3.8K

35.2K

8.6M

Poincaré@diffset·1d

@fromorient2023 这个人2023年在参议院推动了一个法案，要求须经绝对多数同意美国才能退出北约，以防止像他一样的某些宵小搞事情。😎

中文

東方來@fromorient2023·3d

這才是最大的新聞！美國國務卿盧比奧接受福克斯新聞的採訪，放出震驚信息； “我們數十年來，投入數十億，上千億美元和駐軍，到頭來卻只能保衛歐洲。而在我們需要時，還沒要他們參加空襲，只是使用一下基地都被拒絕，那麼，我們為什麼還要留在北約”？別猶豫，退出北約，勢在必行！

中文

182

429

3.4K

296.1K

Poincaré@diffset·1d

@A77Z0J8D3WbRAVt @Lgreezx @MarshWatt776 你爸给你妈钱了？你家玩的挺开啊🤭

中文

席主习🇨🇳🇹🇼@A77Z0J8D3WbRAVt·1d

@Lgreezx @MarshWatt776 你妈真可怜，你爸白嫖她不给钱

中文

715

北京小影子（演员@MarshWatt776·2d

离了大谱，今天家里人安排个相亲的，加了微信，第一句就是：”我有处女情结，请问你是处女吗？如果不是我们可以不用往下聊了…”，我真的差点想说老娘睡的人比你不知道强多少倍，人家都没要求我是处女？你哪位？？？？，真想狂喷他的，但想想中间人难做就直接拉黑了

中文

334

195

198.6K

Poincaré@diffset·1d

@afterheater 了解下许当年是为什么离开的央视🤭

中文

2.2K

老A8集散地@afterheater·1d

年轻时候的文涛兄浪费了一个绝佳机会，多美的人啊

中文

156

115.4K

Poincaré@diffset·1d

@wen36881471 @a2013bc 这就过分了吧。人当你还不是maga的人呢。🤭

中文

wen🇺🇦🇺🇦🇺🇦@wen36881471·1d

@a2013bc 鲁比奥这个二五仔，一点骨头都没有

中文

662

波斯沙阿卢比奥🇺🇦@a2013bc·1d

退出北约要过国会，而提出这个法案的人正是三年前的卢比奥。

中文

140

9.9K

Poincaré@diffset·2d

@Balder13946731 参议院版本的卢比奥推过一个必须经参议院绝对多数表决才能退北约的bill，block了国务院的卢比奥完成这件事

中文

956

Balder@Balder13946731·2d

特朗普说打完伊朗就要找北约各国秋后算账，甚至要直接退出北约。普京穷极一切攻打乌克兰只是想让北约东扩的步伐减慢，难道只需要Trump一个Asset就把北约直接搞解散了？？

中文

137

41.6K

Poincaré@diffset·2d

@edenvillager @Thomaskong96638 万一他又想复合了咋办？🤭

中文

牛魔王(互粉互FO)@edenvillager·2d

@Thomaskong96638 我教育儿子，将来他不要的女友可以考虑转让给我。

中文

Thomas@Thomaskong96638·2d

要说岀国前最遗憾的事，大概就是那张养生中心的会员卡了。当年充了5万，充一送一，等于10万额度。我的点钟是68号技师。 68号居然是全省前射击冠军，婷婷袅袅的，目光清澈干净。只要68号当班，周末我必然穿上运动服，假装出门跑步，其实是直奔她那里。结果移民签证提前批下来，会员卡只消费了两三万就走了。我不是心疼那点钱，而是心疼那段往事。现在每每孤独的时候想起来，还是挺煎熬的。

中文

97.5K

Poincaré أُعيد تغريده

Xiao Tan@tvytlx·3d

x.com/i/article/2038…

ZXX

695

3.4K

917.9K

Poincaré أُعيد تغريده

艾略特@elliotchen100·3d

看了一下 CC 的 Memory 机制，不过如此嘛。整套记忆系统的核心就是一个 MEMORY.md 文件，不超过 200 行，每次会话启动往上下文里一塞。记忆多了怎么办？后台跑一个叫 AutoDream 的子进程，定期扫描、合并、修剪，确保塞得进去。说白了就是：模型自己记不住，所以用文件系统 + LLM 自我管理来模拟记忆。这个方案工程上很扎实，但有几个本质局限： 1. 存储和检索完全依赖文件系统 + Markdown，无法扩展到跨项目、跨 Agent 的场景，记忆是孤岛式的 2. 没有真正的语义索引，没有基于关联度的动态召回，200 行索引就是硬上限 3. AutoDream 的整合是规则驱动的（扫描、合并、修剪），不是认知驱动的，能去重压缩，但不能从经验中提炼出新认知 4. 没有遗忘曲线，没有记忆强化机制，记忆要么在要么被删，没有中间态做 Memory 做久了你会发现，这类方案的天花板其实不在工程，在架构。只要模型的注意力机制本身不支持大规模历史上下文的高效检索，应用层就永远在打补丁。这也是为什么我们在 EverMind 选了一条不同的路。前阵子发的 MSA（Memory Sparse Attention）就是在 Transformer 注意力层直接做内容感知的稀疏路由，让模型自己学会"想起什么、忽略什么"，而不是靠外部脚本替它决定。 A 社的工程能力毫无疑问是顶级的。但这次泄露恰好说明：Agent Memory 这个问题，远没有被解决。

中文

110

795

118.8K

Poincaré أُعيد تغريده

WquGuru🦀@wquguru·3d

Claude Code源代码泄漏，包含六张核心状态图，： - 主查询状态机，理解主 query loop 的主干 - Tool Execution状态机，理解 tool 调度与并发/中断 - 压缩恢复策略，理解上下文压缩与恢复 - Agent生命周期状态机和SDK会话状态机，分别理解 subagent 生命周期和SDK 会话层 - 权限策略流程图，补齐治理与安全控制逻辑主查询状态机：