bluthcy

115 posts

bluthcy

@bluthcy

众里寻她千百度！

中国 Katılım Şubat 2012

477 Takip Edilen11 Takipçiler

bluthcy retweetledi

ginobefun@hongming731·5d

腾讯这篇文章探讨了 Harness Engineering 的核心理念，提出构建 AI 工作流仅仅是搭建管道，沉淀团队的私域与领域知识才是真正的技术「护城河」。模型与工具链会不断迭代，工作流具有可替换性，但业务领域的专有知识是可以持续积累的「复利资产」。缺乏知识沉淀的工作流往往沦为一次性消耗品，无法实现自我进化。为此，团队设计了一套三维知识分层架构。在存储层，知识被划分为从个人偏好到项目、业务和技术共 5 个逐渐提升的层级。在类型上，涵盖了模型、决策、指南、陷阱和流程这 5 种分类。在成熟度方面，设定了从草稿、验证到可靠这 3 个递进级别，并引入了自动衰减机制以淘汰过时信息。该知识库依托独立的 Git 仓库进行跨项目共享，作为团队知识的「单一事实来源」。团队借鉴了类似区块链的追加日志和共识机制，实现多人协作贡献与冲突自动处理。在实际运转中，工作流与知识沉淀紧密结合。启动阶段会自动注入知识全景，执行阶段 Agent 会通过 3 级渐进式索引主动按需查询，以避免上下文膨胀，而在归档阶段则会自动提取新产物反哺知识库，形成完整的生命周期闭环。此外，团队突破了人机交互的瓶颈。传统工作流高度存在「在场依赖」，导致执行效率受限。通过引入跨设备接管和远程操控能力，团队将人工审批转变为异步模式，充分利用碎片化时间推进流程，保障了工作流能够 24 小时无缝流转，进而加速了知识沉淀的流转效率。整体而言，该系统坚持「文件系统即状态机」的原则，将所有知识转化为可读且可版本控制的文件资产。这种模式让团队在每次需求交付中都能实现经验积累，使得新启动的任务总能站在前人的肩膀上，充分印证了领域知识在 AI 工程中的核心地位。

中文

183

11.4K

bluthcy retweetledi

劳伦斯@LawrenceW_Zen·5d

x.com/i/article/2049…

ZXX

466

186K

bluthcy retweetledi

姚金刚@yaojingang·5d

开源一个教程Skill 打磨了十几版，效果还不错，已推到GitHub 如果想在五一假期高质量充电，或随时给自己生成一份定制高质量教程欢迎下载基本逻辑： 1、输入任意主题及参考资料，AI会优先以参考资料为核心，然后根据需要进行高质量素材和资料补充，这个过程中融入了低质量信源的过滤机制 2、AI会结合教程的特点，并结合用户的部分偏好，生成定制深度教程，输出PDF、Word及HTML三种格式，方便自己学习 3、教程内容，按照章节的逻辑进行系统性的输出，同时根据各章节内容特点，自行画图并插入到章节相关内容模块中 4、教程融入了三年前经营MCN时写的《课程营销学》的相关底层逻辑与方法，回头也把这本书给开源出来 5、排版与UI，输出的文档与教程网页，有结合 @HiTw93 的kami进行借鉴和融入，实现了很有质感的排版规范示例报告有3个，其中一个示例教程示例来自 @ReyJudgementOS 的一篇英语干货文章：《12岁英语自由：我“带”孩子“学”英语的经验与教训》教程Skill的GitHub地址： github.com/yaojingang/yao…

中文

211

927

122.5K

bluthcy retweetledi

Nous Research@NousResearch·5d

ComfyUI is the most flexible, composable, and powerful open-source media generation tool with a massive ecosystem of workflows and custom nodes. Your Hermes Agent can now install, launch, manage, and run sophisticated @ComfyUI workflows on demand.

English

202

317

3.5K

402.6K

bluthcy retweetledi

向阳乔木@vista8·5d

发现一个很实用的开源项目，叫 Beads，GitHub 上已经有 22.6k star 。核心想解决 AI Agent 处理长任务的「失忆」问题。现在 AI Agent一般靠 Markdown 做记忆，但Markdown是纯文本，没结构，没依赖关系，没状态追踪。任务一多，上下文窗口一满，信息就丢了。 Beads 的思路：好好做「任务管理」。底层用的是 Dolt，是一个「像 Git 一样」的 SQL 数据库，支持分支、合并、版本回溯，甚至有单元格级别的merge。 Beads 用Dolt的好处： ① 多个智能体并发写入时，用哈希 ID（比如 bd-a1b2）避免冲突 ② 任务历史可以回溯，不会凭空消失 ③ 持远程同步，团队协作或多机器使用都没问题不依赖 Git，Beads 可完全脱离 Git 运行。上下文压缩设计的也不错，有个「语义记忆衰减」，把关闭任务压缩摘要，节省上下文窗口空间。适合什么场景项目说主要给 AI编程Agent用。实际上，任何要在多个 AI 会话之间保持任务连续性的场景都适用。不用每次开新会话都要重新交代上下文。安装方法：把Github的地址丢给你的Agent，告诉他安装这个库，并带我配置使用。地址见评论区

中文

256

36.4K

bluthcy@bluthcy·4d

@Teknium Well done！

English

Teknium 🪽@Teknium·5d

@bluthcy Ok its fixed :) github.com/NousResearch/h…

English

bluthcy@bluthcy·6d

hermes agent更新后，不知道发生了什么事，同样的模型，同样的配置。。让它pwd目录工作，跑到$HOME目录去了。。。让它更新README，跑去更新技能了。。。让它更新技能，不认识skills.external_dir。。。 what's wrong... @Teknium 🥲

中文

1.1K

bluthcy@bluthcy·6d

@Teknium 最大问题是我设置了skills.external_dir，agenr可以识别到技能，但是需要更新技能时，找不到external_dir，自作主张在.hermes skills又写了一份。

中文

Teknium 🪽@Teknium·6d

@bluthcy Send output of /debug and i can tell ya

English

bluthcy retweetledi

Garry Tan@garrytan·27 Nis

The secret to an articulate agent like mine isn't one file. It's three: SOUL.md — Who the agent IS. Voice, values, operating principles, what good output looks like, what bad output looks like. Not a system prompt, a constitution. Mine says things like "brevity is mandatory," "humor is mandatory," "never open with 'Great question,'" "swearing is allowed when it lands." The more specific and opinionated this is, the less your agent sounds like a chatbot. Write it like you're briefing your smartest friend on how to be you, not like you're configuring software. USER.md — Who YOU are. Not a bio — a deep model. How your mind works, what you're building, your strengths, your blind spots, your family, your temperament, what triggers you, what you care about. The more the agent understands about you, the better it can serve you. Mine is ~4000 words. AGENTS.md — Operational rules. What to check on every message, what to never do, how to handle failures, lookup chains, path rules, brain-first protocols. This is the playbook for how it works, not who it is. The articulation comes from SOUL.md being brutally specific about voice. Generic instructions → generic output. If you write "be helpful and concise" you get ChatGPT. If you write "speak like a peer with taste, one sentence when one sentence works, uncomfortable truths welcome if actually true, language with voltage" — you get something alive.

Soham Naran@soham_bhai1

@garrytan Can you share your agent.md? You're agent is really articulate.

English

138

1.9K

200.5K

bluthcy retweetledi

Gorden Sun@Gorden_Sun·27 Nis

MOSS-Audio：开源统一音频理解模型 4B/8B各有Instruct和Thinking版本，语音识别、说话人分析、情感检测、环境声理解、音乐理解、时间戳ASR全部集成在一个模型里。时间戳ASR精度碾压级领先，远高于Gemini-3.1-Pro。模型：huggingface.co/collections/Op…

中文

299

22.6K

bluthcy retweetledi

hardmaru@hardmaru·27 Nis

For the past few years, humans have been doing “prompt engineering” to coax the best performance out of different LLMs. In this work, we explored what happens if we train an AI to do that job instead. By training a Conductor model with RL, we found that it naturally learns to write highly effective, custom instructions for a whole pool of other models. It essentially learns to ‘manage’ them in natural language. What surprised me most was how it dynamically adapts. For simple factual questions, it just queries one model. But for hard coding problems, it autonomously spins up a whole pipeline of planners, coders, and verifiers. Really excited to see where this paradigm of “AI managing AI” goes next, especially as we start moving from single-agent chain-of-thought to multi-agent “chain-of-command”. Link to our #ICLR2026 paper: arxiv.org/abs/2512.04388 Along with our TRINITY paper which we announced earlier, this work also powers our new multi-agent system: Sakana Fugu (sakana.ai/fugu-beta) 🐡

Sakana AI@SakanaAILabs

Introducing our new work: “Learning to Orchestrate Agents in Natural Language with the Conductor” accepted at #ICLR2026 arxiv.org/abs/2512.04388 What if we trained an AI not to solve problems directly, but to act as a manager that delegates tasks to a diverse team of other AIs? To solve complex tasks, humans rarely work alone; we form teams, delegate, and communicate. Yet, multi-agent AI systems currently rely heavily on rigid, human-designed workflows or simple routers that just pick a single model. We wanted an AI that could dynamically build its own team. We trained a 7B Conductor model using Reinforcement Learning to orchestrate a pool of frontier models (including GPT-5, Gemini, Claude, and open-source models available during the period leading up to ICLR 2026). Instead of executing code, the Conductor outputs a collaborative workflow in natural language. For any given question, the Conductor specifies: 1/ Which agent to call 2/ What specific subtask to give them (acting as an expert prompt engineer) 3/ What previous messages they can see in their context window Through pure end-to-end reward maximization, amazing behaviors emerged. The Conductor learned to adapt to task difficulty: it 1-shots simple factual questions, but autonomously spins up complex planner-executor-verifier pipelines for hard coding problems. The results are very promising: The 7B Conductor surpasses the performance of every individual worker model in its pool, setting new records on LiveCodeBench (83.9%) and GPQA-Diamond (87.5%) at the time of publication. It also significantly outperforms expensive multi-agent baselines like Mixture-of-Agents at a fraction of the cost. One of our favorite features: Recursive Test-Time Scaling! By allowing the Conductor to select itself as a worker, it reads its own team's prior output, realizes if it failed, and spins up a corrective workflow on the fly. This opens a new axis for scaling compute during inference. This research proves that language models can become elite meta-prompt engineers, dynamically harnessing collective intelligence. Alongside our TRINITY research which we announced a few days earlier, this foundational research powers our new multi-agent system: Sakana Fugu! (sakana.ai/fugu-beta) 🐡 OpenReview: openreview.net/forum?id=U23A2… (ICLR 2026)

English

175

1.4K

178.1K

bluthcy retweetledi

Quarq@quarqlabs·27 Nis

x.com/i/article/2048…

ZXX

259

44.1K

bluthcy@bluthcy·27 Nis

@Teknium @NousResearch Cheers

English

Teknium 🪽@Teknium·27 Nis

Happy to announce that Hermes Agent's repo just surpassed Anthropic's Claude Code repo

English

268

273

4.8K

592.4K

bluthcy retweetledi

Elias Al@iam_elias1·24 Nis

MIT just made every AI company's billion dollar bet look embarrassing. They solved AI memory. Not by building a bigger brain. By teaching it how to read. The paper dropped on December 31, 2025. Three MIT CSAIL researchers. One idea so obvious it hurts. And a result that makes five years of context window arms racing look like the wrong war entirely. Here is the problem nobody solved. Every AI model on the planet has a hard ceiling. A context window. The maximum amount of text it can hold in working memory at once. Cross that line and something ugly happens — something researchers have a clinical name for. Context rot. The more you pack into an AI's context, the worse it performs on everything already inside it. Facts blur. Information buried in the middle vanishes. The model does not become more capable as you feed it more. It becomes more confused. You give it your entire codebase and it forgets what it read three files ago. You hand it a 500-page legal document and it loses the clause from page 12 by the time it reaches page 400. So the industry built a workaround. RAG. Retrieval Augmented Generation. Chop the document into chunks. Store them in a database. Retrieve the relevant ones when needed. It was always a compromise dressed up as a solution. The retriever guesses which chunks matter before the AI has read anything. If it guesses wrong — and it does, constantly — the AI never sees the information it needed. The act of chunking destroys every relationship between distant paragraphs. The full picture gets shredded into fragments that the AI then tries to reassemble blindfolded. Two bad options. One broken industry. Three MIT researchers and a deadline of December 31st. Here is what they built. Stop putting the document in the AI's memory at all. That is the entire idea. That is the breakthrough. Store the document as a Python variable outside the AI's context window entirely. Tell the AI the variable exists and how big it is. Then get out of the way. When you ask a question, the AI does not try to remember anything. It behaves like a human expert dropped into a library with a computer. It writes code. It searches the document with regular expressions. It slices to the exact section it needs. It scans the structure. It navigates. It finds precisely what is relevant and pulls only that into its active window. Then it does something that makes this recursive. When the AI finds relevant material, it spawns smaller sub-AI instances to read and analyze those sections in parallel. Each one focused. Each one fast. Each one reporting back. The root AI synthesizes everything and produces an answer. No summarization. No deletion. No information loss. No decay. Every byte of the original document remains intact, accessible, and queryable for as long as you need it. Now here are the numbers. Standard frontier models on the hardest long-context reasoning benchmarks: scores near zero. Complete collapse. GPT-5 on a benchmark requiring it to track complex code history beyond 75,000 tokens — could not solve even 10% of problems. RLMs on the same benchmarks: solved them. Dramatically. Double-digit percentage gains over every alternative approach. Successfully handling inputs up to 10 million tokens — 100 times beyond a model's native context window. Cost per query: comparable to or cheaper than standard massive context calls. Read that again. One hundred times the context. Better answers. Same price. The timeline of the arms race makes this sting harder. GPT-3 in 2020: 4,000 tokens. GPT-4: 32,000. Claude 3: 200,000. Gemini: 1 million. Gemini 2: 2 million. Every generation, every company, billions of dollars spent, all betting on the same assumption. More context equals better performance. MIT just proved that assumption was wrong the entire time. Not slightly wrong. Fundamentally wrong. The entire premise of the last five years of context window research — that the solution to AI memory was a bigger window — was the wrong answer to the wrong question. The right question was never how much can you force an AI to hold in its head. It was whether you could teach an AI to know where to look. A human expert handed a 10,000-page archive does not read all 10,000 pages before answering your question. They navigate. They search. They find the relevant section, read it deeply, and synthesize the answer. RLMs are the first AI architecture that works the same way. The code is open source. On GitHub right now. Free. No license fees. No API costs. Drop it in as a replacement for your existing LLM API calls and your application does not even notice the difference — except that it suddenly works on inputs it used to fail on entirely. Prime Intellect — one of the leading AI research labs in the space — has already called RLMs a major research focus and described what comes next: teaching models to manage their own context through reinforcement learning, enabling agents to solve tasks spanning not hours, but weeks and months. The context window wars are over. MIT won them by walking away from the battlefield. Source: Zhang, Kraska, Khattab · MIT CSAIL · arXiv:2512.24601 Paper: arxiv.org/abs/2512.24601 GitHub: github.com/alexzhang13/rlm

English

147

449

2.2K

323.1K

bluthcy retweetledi

虎小象@hx831126·26 Nis

1：我做了一张「项目」图汇报用的 PPT 图之类的！ 2：但是甲方需要 4K 图。或者 PSD 图层！ 3：于是我试着在 GPT 里面让它给我做PSD 分层文件。 MD 居然给我做了！改给了我一个分层压缩包！ 4：里面的图片居然是 4K 尺寸的！ 5：所以我们能通过这个方法获得 GPT 的 4K 图片！

中文

134

29.3K

bluthcy retweetledi

阿绎 AYi@AYi_AInotes·26 Nis

说个暴论，现在90%的AI Agent记忆，全都是假的。我之前也踩过这个坑，把所有历史记录决策日志全堆进Markdown文件里，以为这就是给Agent加了长期记忆，结果用了两周就崩了，同一个事实有三个互相矛盾的版本，上个月的偏好和昨天的权重一模一样，每次调用都把所有东西一股脑塞进上下文，慢到离谱还经常串台，直到看到这篇文章才恍然大悟，原来我根本不是在做记忆，只是在把Prompt当RAM用🌚 真正的记忆不是堆文件，应该是图和节点加嵌入加遍历， Markdown方案有四个根本解决不了的硬伤，没有去重，没有衰减，没有排名，超过一百条记录直接变成性能杀手，它只能记住你写过什么，永远记不住这件事和那件事有什么关系，这个决策为什么被否决，上次遇到同样的bug我们是怎么解决的。向量检索也不行，它只能告诉你这两段话长得像，不能告诉你它们之间的因果关系，只有图遍历能做到，它能像人脑一样，从一个节点牵出一整条相关的记忆链，重要的事情越来越清晰，过时的信息自动淡化，矛盾的内容在写入时就被解决。现在所有生产级的Agent框架，Zep Cognee Mem0，全都是基于图的， Neo4j已经把图记忆做成了标准的MCP工具， Claude Code超过二十万行代码之后，纯上下文窗口早就没戏了，真正能让它像高级工程师一样思考的，是把不变的规则放在CLAUDE.md里，把所有演化的状态全部存在图里，动态检索按需拉取。很多人还在卷一百万两千万的上下文窗口，以为越大越好，但生产环境里真正致命的，永远是跨会话的记忆漂移和上下文污染，内存架构的升级已经不是锦上添花了，能不能把Agent真正用起来才是关键的生死线。

AI Edge@aiedge_

x.com/i/article/2044…

中文

114

468

2.9K

699.2K

bluthcy@bluthcy·25 Nis

@Teknium Highly efficient job

English

Teknium 🪽@Teknium·24 Nis

@bluthcy Took a lot more work then I expected, but it's going in now :) github.com/NousResearch/h…

English

307

bluthcy@bluthcy·20 Nis

请问一下，Hermes agent cron任务触发时，能否指定启动的workdir，这样会话才能读取agents.md指令？我搜索了一下，好像目前不太支持？@Teknium

中文

605

bluthcy retweetledi

Berryxia.AI@berryxia·24 Nis

Apple Silicon 上的 vLLM 终于原生 Swift/Metal 了！🚀 直接开源 vllm-swift： ✅ 推理热路径完全无 Python ✅ 吞吐量和扩展性大幅提升 ✅ Homebrew 一键安装： `brew tap TheTom/tap && brew install vllm-swift` Mac 用户本地跑大模型的速度和效率又上一个台阶，太香了🤯 正在招募 Beta 测试者，欢迎冲！ GitHub 直达👉 github.com/TheTom/vllm-sw…

Tom Turney@no_stp_on_snek

Native Swift/Metal backend for vLLM on Apple Silicon. No Python in the inference hot path → better throughput + scaling. Try it: brew tap TheTom/tap && brew install vllm-swift Looking for beta testers → github.com/TheTom/vllm-sw…

中文

226

33.5K

bluthcy retweetledi

sitin@sitinme·23 Nis

越来越多人玩本地大模型，不想用在线大模型烧 Token、成本高，都想自己电脑离线部署。但痛点特别明显：Hugging Face 上模型五花八门，根本不知道怎么选，下完才发现电脑带不动，白白浪费时间。不过，我倒是刷到了一个叫 llmfit，它就是专门解决这个问题的。数据覆盖 NVIDIA、AMD、苹果芯片等绝大多数显卡，速度预估都有真实实测依据，编程、聊天不同场景权重也会自动区分，选模型完全不踩雷。运行一条命令，它就会自动识别你的显卡、显存、内存配置，内置四维评分体系，从模型质量、运行速度、硬件适配、上下文长度综合打分，把所有适配模型按优劣排序，还标注好推荐量化方案、占用大小和预估运行速度。选好模型后一键就能下载，完美对接 Ollama、llama.cpp、LM Studio 等主流本地运行工具，全平台系统都支持，还有一键安装脚本。工具还自带硬件模拟功能，不用改电脑配置，就能提前模拟不同显存内存，看升级硬件后能跑哪些模型，准备装机升级的人特别好用。它还能集成进 OpenClaw 这类 Agent 工具里，直接对话询问自己电脑能跑什么模型，工具会自动检测、选型、配置环境，全程不用手动折腾。某种程度上，它补上的正是本地模型普及里最关键的一环：不是教你怎么部署，而是先帮你搞清楚，你到底适合部署什么。

GIF

中文

208

23.8K

bluthcy retweetledi

hkdom@hkdom·22 Nis

@sitinme 恭喜，我玩了數天後的心得：推介使用 oMLX + Qwen 3.6-35B-A3B-MLX-8bit huggingface.co/unsloth/Qwen3.… 真心覺得有 Opus 8-9 成能力了。另外，有空可以用 Qwen3.6-35B-A3B-Abliterated-Heretic-MLX-4bit 作開心的事情啊！

日本語

5.8K

Keşfet

@HiTw93 @ReyJudgementOS @ComfyUI @Teknium @NousResearch @elonmusk @BarackObama @taylorswift13