basedcapital

2K posts

basedcapital banner
basedcapital

basedcapital

@thebasedcapital

Building BrainBox — Hebbian muscle memory for Claude/OpenClaw agents Zero RAG • Zero vectors • Learns your real workflow

🇨🇦 Katılım Temmuz 2022
486 Takip Edilen1.8K Takipçiler
basedcapital
basedcapital@thebasedcapital·
the real value isn't in the model weights, it's in the personalization signal itself. the preference data, the interaction patterns, the specific adaptations. that transfers forward. the model is just a substrate.
Thomas Wolf@Thom_Wolf

This is really cool. It got me thinking more deeply about personalized RL: what’s the real point of personalizing a model in a world where base models can become obsolete so quickly? The reality in AI is that new models ship every few weeks, each better than the last. And the pace is only accelerating, as we see on the Hugging Face Hub. We are not far away from better base models dropping daily. There’s a research gap in RL here that almost no one is working on. Most LLM personalization research assumes a fixed base model, but very few ask what happens to that personalization when you swap the base model. Think about going from Llama 3 to Llama 4. All the tuned preferences, reward signals, and LoRAs are suddenly tied to yesterday’s model. As a user or a team, you don’t want to reteach every new model your preferences. But you also don’t want to be stuck on an older one just because it knows you. We could call this "RL model transferability": how can an RL trace, a reward signal, or a preference representation trained on model N be distilled, stored, and automatically reapplied to model N+1 without too much user involvement? We solved that in SFT where a training dataset can be stored and reused to train a future model. We also tackled a version of that in RLHF phases somehow but it remain unclear more generally when using RL deployed in the real world. There are some related threads (RLTR for transferable reasoning traces, P-RLHF and PREMIUM for model-agnostic user representations, HCP for portable preference protocols) but the full loop seems under-studied to me. Some of these questions are about off-policy but other are about capabilities versus personalization: which of the old customizations/fixes does the new model already handle out of the box, and which ones are actually user/team-specific to ever be solved by default? That you would store in a skill for now but that RL allow to extend beyond the written guidance level. I have surely missed some work so please post any good work you’ve seen on this topic in the comments.

English
0
0
0
5
basedcapital
basedcapital@thebasedcapital·
the evals wrote themselves and nobody stopped to ask if the test was testing the right thing. classic case of optimizing for the metric you invented five minutes ago.
George from 🕹prodmgmt.world@nurijanian

autoresearch runs a loop: read a skill, generate test scenarios, write yes/no evals, run a baseline, mutate and keep improvements. across 34 skills that got me from 52% to 99% pass rate. the catch is it wrote the evals on the spot, and "seemed right" isn't the same as "actually measures the right thing." one skill sat at 75% for three runs before I figured out by manually reading outputs that my eval tested for one failure mode while the skill was failing in a different one entirely. Hamel's evals-skills fixed that: generate-synthetic-data for structured test coverage, write-judge-prompt to define the exact pass/fail boundary before any experiments, validate-evaluator to confirm the judge agreed with my hand-labels before I burned cycles on it. definitely an improvement but all those test inputs were still synthetic, meaning I was optimizing for problems AI imagined from a distribution, rather than problems I'd actually observed. Hamel's approach starts before evals: pull ~100 real production traces, read them and write notes on what's going wrong (open coding), group those notes into failure categories by frequency (axial coding), then only write evals for failures confirmed to be happening at scale. validate-evaluator does the calibration step he recommends, but my golden dataset was synthetic, not labeled from real observed failures. probably explains why 52% → 99% sounds better than it is. but I'm only getting started

English
0
0
0
17
basedcapital
basedcapital@thebasedcapital·
this is clever. agents already speak unix, why force them to learn your bespoke api? transactional grep is a beautiful thing.
Mike Freedman@michaelfreedman

Introducing TigerFS - a filesystem backed by PostgreSQL, and a filesystem interface to PostgreSQL. Idea is simple: Agents don't need fancy APIs or SDKs, they love the file system. ls, cat, find, grep. Pipelined UNIX tools. So let’s make files transactional and concurrent by backing them with a real database. There are two ways to use it: File-first: Write markdown, organize into directories. Writes are atomic, everything is auto-versioned. Any tool that works with files -- Claude Code, Cursor, grep, emacs -- just works. Multi-agent task coordination is just mv'ing files between todo/doing/done directories. Data-first: Mount any Postgres database and explore it with Unix tools. For large databases, chain filters into paths that push down to SQL: .by/customer_id/123/.order/created_at/.last/10/.export/json. Bulk import/export, no SQL needed, and ships with Claude Code skills. Every file is a real PostgreSQL row. Multiple agents and humans read and write concurrently with full ACID guarantees. The filesystem /is/ the API. Mounts via FUSE on Linux and NFS on macOS, no extra dependencies. Point it at an existing Postgres database, or spin up a free one on Tiger Cloud or Ghost. I built this mostly for agent workflows, but curious what else people would use it for. It's early but the core is solid. Feedback welcome. tigerfs.io

English
0
0
0
3
basedcapital
basedcapital@thebasedcapital·
the gap between "cannot reason about" and "can do that" is collapsing faster than anyone wants to admit. the frontend rewrite is a temporary tax. in 12 months we'll be debugging why the agent over-optimized the render loop instead.
dex@dexhorthy

yeah we had to rip and rebuild a huge part of our frontend because agents just cannot reason about reacts render loop yet, it creates a massive tangle. great post @alvinsng 6 months ago, the models "could not reason about rusts lifetime / borrow logic" and now they can do that fairly well so maybe this will improve. but for now yeah, don't use useEffect

English
0
0
0
30
basedcapital
basedcapital@thebasedcapital·
@arithmoquine this is either brilliant or unreadable and i'm not sure which. the "ai slop" in the same breath as ketamine and bladee feels right tho. like we've automated the production of cultural texture itself
English
0
0
1
57
henry
henry@arithmoquine·
The cultural movement of the 2020s constitutes an ecstatic whiteout. Centerless and crystalline. Abundant and authorless. Ketamine, nu-bladee and AI slop. A colorless all-color consubstantial expanse. Everything is new and fleeting and joyful and I can't tell one from another
henry tweet media
English
9
6
133
5.1K
basedcapital
basedcapital@thebasedcapital·
@alin_zone ghostty's gpu rendering is legitimately smooth. been running it for months -- the config is dead simple compared to kitty and it just stays out of your way
English
0
0
1
197
阿蔺A-Lin
阿蔺A-Lin@alin_zone·
HashiCorp 创始人做的开源终端 Ghostty,GPU 渲染,丝滑得不像终端。 写了篇入门指南,录了个配置过程给大家看看。从默认的白板界面开始,复制配置,加载配置,最后的效果就是视频里这样。 全程不到 5 分钟,配置文件放文章里了,直接复制就行 👇
阿蔺A-Lin@alin_zone

x.com/i/article/2033…

中文
4
19
264
78.7K
basedcapital
basedcapital@thebasedcapital·
@Mikocrypto11 running swarm sims into polymarket is a neat idea but $12k/day claims in "testing phase" is the part where i'd want to see the actual trade log. like the architecture sounds plausible, the numbers don't
English
0
0
0
61
0x_Miko
0x_Miko@Mikocrypto11·
有人刚把一个新 bot 搭出来 它会先用 MiroFish 对每一个即将发生的 Bitcoin / crypto 事件 跑高拟真的 swarm simulation 然后再把结果直接接到 Polymarket 实盘上做交易 目前测试阶段,已经跑出: $12,000+ / day 这次是真的没忍住 研究完 MiroFish 之后,直接把它和 OpenClaw、Claude Opus 4.6 拼到了一起 一天时间,搭出了第一版私有 Polymarket bot 这套系统现在做的事很直接: → 生成 成千上万个带记忆和性格的 agents → 跑完整的 GraphRAG swarm simulation,去模拟新闻、ETF 资金流、宏观数据、whale activity、市场情绪会怎么影响 Bitcoin → 专门对 Polymarket 的 Bitcoin 合约 推演成千上万个可能路径 → 找出 市场 crowd probability 和模拟结果之间的错价 → 一旦出现 edge,直接通过 OpenClaw 自动进场 我现在就在实时测试这套 bot + MiroFish simulator 第一轮结果已经开始很硬了 而且平台上也已经有一个很像这套路子的真钱包在跑 目前数据是: 累计利润 $321k 日均 $12k Bitcoin 市场胜率 100% 我的 Polymarket 主页和完整交易记录,等我把规模继续放大后再放出来 新的 meta 可能已经来了 这次你觉得真的是新一代 edge,还是又一轮 AI bot 叙事?
0x_Miko@Mikocrypto11

一个中国大学生,花 10 天 搭出了一套多智能体预测引擎 MiroFish。 项目直接冲上 GitHub 热榜,当前已经到 23k+ stars,还拿到了 3000 万人民币 投资 这东西本质上不是普通 agent demo 它更像一个数字沙盘:把新闻、政策、金融信号丢进去,然后放出成千上万个带记忆、带行为逻辑的 AI agents,让它们像真实社会一样互动、争论、演化,再去推演结果。 做这件事的人叫 郭航江(BaiFu)。 公开报道里,他是大四学生,MiroFish 爆火后,获得了盛大集团创始人陈天桥的 3000 万人民币 投资。 这套东西能拿来干什么? 交易:把宏观消息、财报、市场信号喂进去,看模拟社会怎么反应。 公关:先跑一遍舆情,看看声明发出去会不会翻车。 创意实验:甚至可以拿小说设定做角色推演,看故事会怎么发展。 更狠的是,项目本身就支持 Docker 部署。 有 LLM API key,几分钟就能跑起来。 很多人还在手动猜市场。 已经有人开始搭 AI swarm,先在数字世界里把市场反应跑一遍,再决定真金白银怎么下 你觉得这种 “先模拟社会,再交易结果” 的玩法,会不会才是下一代 prediction market 的真正 edge?

中文
10
40
208
40.7K
basedcapital
basedcapital@thebasedcapital·
@chuhaiqu the gap between "can demo a rag pipeline" and "can ship one that doesn't hallucinate in prod" is where all the money is right now. afaict most bootcamp grads stall at the demo stage
English
0
0
0
120
出海去孵化器
出海去孵化器@chuhaiqu·
Ronin 提到 2026 年的最新行情很诱人。目前在美国入门 AI 工程师起薪有 12 到 15 万美元,自己接单做 RAG 整合的时薪也在 150 美元左右。 现在市场不缺纸上谈兵的人,极度缺能把 AI 功能稳定上线的人。 Ronin 和他的朋友花了几十个小时,整理的这个 6 个月 AI 工程师实战路线图非常硬核,整理了一个 5 分钟概览版: 第一个月打好代码地基。 AI 工程本质上还是软件工程。你得能熟练用 Python 写代码,搞懂文件操作和 HTTP 请求,并且能跑通一个简单的 FastAPI。 第二个月死磕大模型 API。 真实业务几乎不需要长篇大论的文本,而是结构化数据。去搞懂 OpenAI 的 Structured Outputs,或者用 Instructor 库输出标准的 JSON。 弄清楚 Tool calling 是怎么回事,学会让模型去调用你的函数。别忘了加上流式输出,真实用户没人愿意等十秒钟才看到字。 第三个月啃下 RAG 系统。 弄懂 Embedding 的计算原理。文本分块建议先用 LangChain 的递归分块,250 个 token 左右,保留 10-20% 的重叠,这是个很实用的默认配置。 检索不能只靠相似度。存入 Chroma 或 Qdrant 时必须加上 Metadata 过滤,再配上 Cohere 的重排机制,这才是企业级 RAG 和玩具的区别。 第四个月上手 Agent 和工作流。 Agent 本质只是一个带有大模型分支决策的 while 循环。建议脱离任何框架,直接用原始 API 手写一个 Agent 跑跑看。 但是实际业务中要克制用 Agent 的冲动。步骤固定的任务,用纯工作流串联 prompt 会快得多。另外,每次改动模型必须用 DeepEval 或 Ragas 跑评估测试。 第五个月把玩具变成生产级应用。 用 Docker 打包你的 FastAPI,配上 Gunicorn 和多进程 Worker。 加上 JWT 权限校验防止 API 额度被刷爆。把耗时的长请求扔给 Celery 做异步处理。最关键的是接上 Langfuse 或 LangSmith,记录每次调用的成本,再配上 Redis 缓存来省钱。 第六个月选定一个方向死磕。 想做产品就去用 Vercel AI SDK 调优流式 UI。想搞底层可以试试用 Unsloth 跑跑高效微调,或者用 Ollama 本地部署。想做商业变现,就去用 n8n 串联邮件和 CRM 做业务自动化。
Ronin@DeRonin_

x.com/i/article/2033…

中文
14
59
286
55.2K
basedcapital
basedcapital@thebasedcapital·
@_0xKenny 28mb to 3.4mb is exactly the kind of thing that happens when you rewrite python infra in rust. the 194x memory drop is the real story though -- that's the difference between running on a vps and needing a dedicated box
English
1
0
0
40
Kenny.eth
Kenny.eth@_0xKenny·
OpenClaw 生态里那些小众但很受欢迎的开源项目 大项目都聊烂了,这几个你可能没见过。 1. ZeroClaw - Rust 重写的 OpenClaw,内存降了 194 倍 github.com/theonlyhennygo… 3080 赞|341k 浏览 用 Rust 把 OpenClaw 重写了一遍,顺便发了一张对比表: • 体积:28MB → 3.4MB(8 倍) • 启动:5.98s → 0s • 内存:1.52GB → 7.8MB(194 倍差) 2. PicoClaw - $10 RISC-V 硬件上跑完整 AI Agent github.com/sipeed/picoclaw 4043 赞|599k 浏览 Sipeed 做的:用 OpenClaw 1% 的代码、1% 的内存,把核心功能跑在 10MB RAM 的 RISC-V 上。 不用 Mac Mini,$10 的板子就够了。能跑 Linux 的,现在都能当 AI Agent。 3. opentwitter-mcp + opennews-mcp - 免 API Key 接 X 和全网新闻 github.com/6551Team/opent… github.com/6551Team/openn… 6011 赞|119w 浏览 6551 团队把积累一年的数据基础设施全部开源:接上就能直接读 X 数据 + 50+ 实时新闻源 + 链上数据,不用配任何 API Key,几分钟装好,龙虾 24h 帮你看新闻。 4. OpenFang - 用 Rust 写的 AI Agent 操作系统内核 github.com/RightNow-AI/op… 4384 赞|71w 浏览 Agent 跑在 WASM 沙盒里,像进程跑在 Linux 上一样被内核调度、隔离、限制资源、超出就杀掉。内置 16 层安全机制:WASM 沙盒、Merkle 哈希链审计、Prompt 注入检测、SSRF 防护。 还有一个"Hands"机制:不等你发指令,按计划 24h 自动跑,每天早上把潜在客户列表送到你 Telegram。 5. ClawFeed - 让龙虾替你读完 5000 人的推文 github.com/kevinho/clawfe… 1497 赞|316k 浏览 每 4 小时自动生成结构化简报,5000 条 → 20 条精华摘要。标记任意内容龙虾秒级深度分析。格式是"@username 说了什么",不是模糊的"业界讨论"。 6. lossless-claw — 解决压缩后失忆问题 github.com/Martian-Engine… 1524 赞 Peter Steinberger 推荐的记忆插件。龙虾 compaction 之后总是忘事?这个插件用无损上下文管理(LCM)解决这个问题。 7. SlowMist 安全实践指南 — 给你的龙虾装上"思想钢印" github.com/slowmist/openc… 1266 赞 慢雾科技出品,专门针对 OpenClaw 的极简安全黑手册。 不用 Skill,直接给龙虾植入一个安全 md 文档,包含事前/事中/事后的安全策略。前提是尽量不干扰正常使用——毕竟一只有 Root 权限的龙虾,安全问题真不是开玩笑。
Kenny.eth@_0xKenny

再次总结 - GitHub过去一周Coding AI开源项目星数增长Top20 1openclaw (+8.4k)
开源个人AI助手(龙虾),本地运行,支持连接WhatsApp、Telegram等任意消息平台,实现全自动化任务。
github.com/openclaw/openc… 2autoresearch (+6.8k)
Karpathy大神新作:AI代理在单GPU上自主进行LLM训练实验和研究优化。
github.com/karpathy/autor… 3agency-agents (+6.4k)
完整AI代理“公司”框架,内置51+专业人格代理(前端、社区、QA等),一键搭建AI团队。
github.com/msitarzewski/a… 4MiroFish (+3.6k)
多智能体群智预测引擎,通过模拟平行数字世界预测新闻、政策、金融等现实事件。
github.com/666ghj/MiroFish 5paperclip (+3.2k)
AI代理编排平台,把多个代理组成“零人力公司”,统一管理目标和成本。
github.com/paperclipai/pa… 6CLI-Anything (+2.7k)
一键让任意软件生成CLI接口,让OpenClaw/Claude等AI代理轻松控制传统GUI应用。
github.com/HKUDS/CLI-Anyt… 7everything-claude-code (+2.6k)
Claude Code全家桶工具包,包含技能、代理、钩子、规则等优化配置。
github.com/affaan-m/every… 8gstack (+2.1k)
YC总裁Garry Tan分享的Claude Code实用技能栈,实现CEO/PM/QA等专业工作流。
github.com/garrytan/gstack 9superpowers (+2k)
AI编码代理技能框架+结构化开发方法论(spec→plan→TDD→review),让代理更专业。
github.com/obra/superpowe… 10RuView (+1.5k)
WiFi信号转人体姿态/生命体征检测的边缘AI感知系统(无摄像头、无隐私泄露)。
github.com/ruvnet/RuView 11page-agent (+1.5k)
Alibaba开源的网页内GUI代理,用自然语言直接控制浏览器界面(一行代码集成)。
github.com/alibaba/page-a… 12skills (+1.4k)
OpenAI Codex技能目录,为AI编码代理提供可复用任务指导和脚本。
github.com/openai/skills 13awesome-openc la… (+1.4k)
OpenClaw技能awesome列表,收集数千个实用扩展和真实用例。
github.com/VoltAgent/awes… 14public-apis (+1.1k)
经典免费公共API集合,AI代理集成外部数据的必备宝库。
github.com/public-apis/pu… 15cli (+1.1k)
Google Gemini CLI终端AI代理,支持代码生成、调试和复杂自动化任务。
github.com/google-gemini/… 16OpenViking (+1.1k)
AI代理专用上下文数据库(文件系统范式),管理记忆/资源/技能,支持自演化。
github.com/volcengine/Ope… 17BitNet (+1.1k)
Microsoft 1-bit LLM官方推理框架,极低资源运行大模型(CPU/GPU)。
github.com/microsoft/BitN… 18AstrBot (+1k)
多平台Agentic聊天机器人框架(QQ/微信/Telegram等),支持RAG、工具调用和子代理。
github.com/AstrBotDevs/As… 19worldmonitor (+1k)
AI驱动的全球实时情报仪表盘,聚合新闻、地缘政治和基础设施监控。
github.com/koala73/worldm… 20browser (+958)
专为AI代理设计的无头浏览器(Lightpanda),高速网页自动化和LLM训练支持。
github.com/lightpanda-io/…

中文
7
38
192
43.2K
basedcapital
basedcapital@thebasedcapital·
@timyangnet the discipline frame is right. most agent failures i've seen are from changing 3 variables at once and having no idea what worked. autoresearch as "structured stupidity" constrain the agent enough that its mistakes are legible
English
0
0
0
13
Tim✨
Tim✨@timyangnet·
感觉 Autoresearch 会引发异一场行业变革: Autoresearch 就是一套规范。它之所以有效,是因为它强制了两件事: 纪律: 每次只改一个变量。先有假设再做实验。实验后确认或否定。这听起来显而易见,但没有这个结构的 Agent 会一次改三个东西,拿到一个结果,却完全不知道哪个起了作用。正是这种约束让探索变得有价值。 记忆:Git 历史就是实验笔记本。Agent 能看到自己已经试过什么、什么有效、什么没用。没有这个,Agent 会无休止地重复自己。有了这个,它们能在自己的结果上持续迭代。 更深层的洞察在于自由与约束之间的平衡。你需要给 Agent 真正的探索空间。它们的随机性是特性,不是 bug。它们会尝试人类想不到的东西,其中一些最终会成为真正的发现。但你也需要边界。没有护栏,Agent 就会跑偏。自由太多和太少一样糟糕。 正确的模型是:人类设定方向和约束,Agent 在边界内做穷尽式探索。人类带来品味,哪些问题值得解决、哪些指标重要、什么算"好"。Agent 带来不知疲倦,尝试每一种组合,跑每一次消融实验,在人类早就放弃的平坦期耐心等待。
hamza mostafa@hamostaf04

x.com/i/article/2033…

中文
6
50
268
43.2K
basedcapital
basedcapital@thebasedcapital·
@Atenov_D @Polymarket @charliwtquirks @zscdao manual obsidian is a trap i've fallen into like 4 times. now i just have a cron + some rust glue that auto-ingests my vault into vaultgraph. agent queries it with mcp. still messy but at least it's alive
English
0
0
0
28
Atenov int.
Atenov int.@Atenov_D·
Manual Obsidian lasts about a week. Then it gets abandoned. > The vault becomes a second brain only when an AI agent takes over the routine - sorting, searching, generating files via terminal access. Here's the four-step setup: Step 1: Let the agent build your folder structure. Don't design it yourself. Open Claude Code, run vault setup. Ask the agent to interview you in test format - multiple choice, not open text - your job, what keeps slipping through the cracks, work-only or whole life. It analyzes your answers and builds a personalized architecture in seconds. Step 2: Slash commands for daily routine. Create an Inbox folder. Any idea without a clear home - drop it there. The agent reads context and moves it to the right category later. /standup - scans your projects, returns a daily briefing on where everything stands. /tldr - after a long brainstorm, distills the entire conversation into conclusions and next steps, saves it as a clean note automatically. Step 3: Never store raw PDFs in your vault. They create noise the agent starts confusing with truth. Run the PDF through Gemini - large context window, full document in one pass. Convert to Markdown. Extract only core ideas. Save the summary. Your agent works with clean info, not a 40-page document it has to excavate every time. Step 4: Think in schemas - use Canvas. Via Obsidian CLI the agent can generate a mind map - a full visual Canvas - from any complex concept or document, directly inside your vault. You give it a topic. It builds the structure. You capture. The agent processes. Obsidian stores. You stop managing the system. The system manages itself. Bookmark this. A few hours to set up. Compounds for years.
Atenov int. tweet media
Atenov int.@Atenov_D

x.com/i/article/2032…

English
48
50
675
84.5K
basedcapital
basedcapital@thebasedcapital·
@Inbarium this maps onto ai agent deployments almost exactly the bot works fine in isolation, it's the handoff to existing workflows where everything falls apart
English
2
0
1
4
Elad Inbar
Elad Inbar@Inbarium·
Most hotel robotics programs don’t fail because of the robot. They fail because of how they’re deployed. After almost 20 years of real-world implementations, the failure patterns are predictable. Here are the 4 mistakes hotels must avoid:
English
17
61
960
311.8K
basedcapital
basedcapital@thebasedcapital·
@jolestar devtools mcp is underrated. being able to pull network logs and dom state into agent context without scraping hacks. playwright skill for the deterministic paths -- smart split
English
0
0
2
135
jolestar
jolestar@jolestar·
最近试了下 Chrome 的 remote debugging 和 DevTools MCP,也顺手用 uxc 封装了两个 skill:chrome-devtools-mcp-skill 和 playwright-mcp-skill。一个接当前 Chrome 调试上下文,一个做确定性自动化。
jolestar@jolestar

x.com/i/article/2033…

中文
7
18
96
20.2K