herrkaefer

1.7K posts

herrkaefer

@herr_kaefer

Becoming an AI-oriented maker.

เข้าร่วม Mayıs 2009

476 กำลังติดตาม84 ผู้ติดตาม

herrkaefer@herr_kaefer·10h

Electron? No no no no 摇一百个头

宝玉@dotey

如果是 TypeScript 技术栈，做 Agent 开发首选 pi-mono，功能强，调用方便。其次是 vercel 的 aisdk 也还可以。 claude agent sdk 不那么推荐了，主要是绑死了 claude，但目前还有一个不可替代的优势，就可以共享 Claude Max 订阅，开发阶段会比较方便，能用多久不清楚。应用层的话，electron 还是首选，稳定可靠，AI 训练预料足够多，主要问题是应用程序体积略大。但刚开始写 Agent，建议从 cli 开始写，不需要一开始就做界面，这样可以聚焦在 Agent 本身，除非你核心就是 UI。推荐一个开源的项目 craft-agents-oss，TypeScript + pi-mono + Electron + React + claude agent sdk，很好的学习参考。 github.com/lukilabs/craft…

中文

herrkaefer รีทวีตแล้ว

宝玉@dotey·23h

browser-use 团队开源了一个叫 video-use 的 Claude Code 技能，让你对着摄像头录完素材，跟 Claude Code 聊两句，就能拿到剪好的成品视频。听起来像个噱头，但它解决的问题很实际：你录了一堆素材，里面全是“嗯”“呃”和重录的片段，传统流程是打开剪辑软件一刀一刀切。video-use 的做法是你把素材丢进文件夹，告诉 Claude：“把这些剪成一个发布视频”，它会自动裁掉口头语和空白段、调色、加字幕、甚至用 Manim 或 Remotion 生成动画叠加层，最后输出 final.mp4。技术上有个巧妙的地方：大模型从头到尾不“看”视频。它读的是 ElevenLabs 转写出来的逐词时间戳文本，整个素材压缩成大约 12KB 的文本文件。只有在需要做判断的节点，比如不确定某个停顿该不该切，才会调用一张时间轴合成图来辅助决策。按项目作者的算法，直接把帧喂给模型要烧掉 4500 万 token，而这套方案只需要一份文本加几张图。思路跟 browser-use 做网页代理一样，给模型结构化的 DOM 而不是截图。渲染完还有一轮自检：在每个剪切点上重新生成时间轴视图，检查画面跳变、音频爆音、字幕遮挡，通过了才给你看预览。最多自动修三轮。项目完全开源免费，装好 ffmpeg 和 Python 依赖后把仓库软链接到 Claude Code 的技能目录就能用，不过转写部分依赖 ElevenLabs API，需要自己配 key。对于经常录屏、录教程、拍 vlog 但又嫌剪辑软件太重的人来说，可以尝试下。项目地址：github.com/browser-use/vi…

Gregor Zunic@gregpr07

Introducing: Video Use. Edit videos with Claude Code. 🫡 I got tired of paying for video editors, so I made a Claude Code skill that does it for me. > Talk to camera, get final.mp4 > Auto cuts fillers, color grades, adds subtitles > Adds Manim and Remotion animations > Self evals the render before you see it 100% open source, 100% free.

中文

514

52.8K

herrkaefer@herr_kaefer·10h

作为开发者呢，如果2小时能做好的东西，现在第一想法我觉得是得提醒自己: 别做啦。这是去年年底那会儿玩的事啦。现在应该是给非技术人的游乐场。当然也不绝对哈。

刘小排@bourneliu66

我认为，我们应该多关注Builders（动手做事的人），少关注Influencers（只做自媒体的人）。根据此原则，我开放了一套日报，解决三个问题： - 昨天，全世界的独立开发者发布了什么好玩的？ - 昨天，有啥值得独立开发者关注的新闻？ - 今天我有2小时，我可以做点啥？ news.ycombinator.com/item?id=477898…

中文

herrkaefer@herr_kaefer·11h

@goldengrape 可能对数字，它在中英文间一犹豫就念错了~

中文

goldengrape@goldengrape·11h

@herr_kaefer 写的小说：夜里九点四十七，Coldwater 9 的员工停车场还在往外冒热气。白天晒过的柏油没凉下来，鞋底踩上去，像隔着一层硬皮踩在锅边。Matteo 从旧皮卡里下来，把车门关轻了些，怕惊醒后座那只空着的塑料药袋。药袋是下午从药房拿来的，里头只剩父亲这个星期要吃的量，再多一盒，得等保险过账。

中文

goldengrape@goldengrape·16h

感觉洋人开发的TTS还是没法用啊， gemini-3.1-flash-tts-preview，柏油念成了薄油，皮卡念成了皮ka1，数字也能念错Coldwater 9念成Coldwater ten

中文

765

herrkaefer@herr_kaefer·11h

Is Qwen3.6-35B-A3B really good?

Simon Willison@simonw

Shocking result on my pelican benchmark this morning, I got a better pelican from a 21GB local Qwen3.6-35B-A3B running on my laptop than I did from the new Opus 4.7! Qwen on the left, Opus on the right

English

herrkaefer@herr_kaefer·11h

@goldengrape 这个太新了。一直用2.5，没感觉出过这种低级错误啊

中文

goldengrape@goldengrape·12h

@herr_kaefer 就还真错了，我本来想做个有声书，看来是没希望了

中文

herrkaefer@herr_kaefer·12h

@dotey gpt-image-1.5 早出来了吧？2？

中文

290

宝玉@dotey·22h

Codex 大更新：从写代码工具变成能操作你电脑的助手 OpenAI 给 Codex 推了一次重大升级，把这个每周被 300 多万开发者使用的编程助手，从写代码的扩展成了能操作整台电脑的工作伙伴。最核心的变化是后台电脑操作。Codex 现在能自己看屏幕、自己点鼠标、自己敲键盘，在 Mac 上跑多个 agent 并行干活，而且不会抢占你正在用的其他窗口。这对于那些没开放 API 的软件特别有用——以前 agent 碰到这类应用就歇菜，现在直接像人一样手动操作。桌面 App 内置了浏览器，你可以直接在网页上圈点批注来给 agent 下指令，目前主要用于前端开发和游戏调试。图像生成也接进来了，用的是 OpenAI 新的 gpt-image-1.5 模型，做产品概念图、UI mock、游戏素材可以和写代码在同一个流程里完成。另外新增 90 多个插件，把 JIRA、GitLab、CircleCI、微软全家桶、Databricks 旗下的 Neon 等工具都接了进来。桌面 App 本身也加了处理 GitHub review 评论、多终端标签、通过 SSH 连远程 devbox（开发沙箱）等功能，PDF、表格、幻灯片可以直接在侧边栏预览。还有两个值得注意的功能。一个是记忆：Codex 会记住你的偏好、过往的纠正、花时间收集来的上下文，以后做类似任务不用每次都从头教。另一个是“自我排程”：它可以给自己安排未来的任务，自动在几天甚至几周后醒过来继续推进一件长期工作，团队已经在用它来追踪 Slack、Gmail、Notion 里没收尾的事情。可用性方面，更新从今天开始推送给用 ChatGPT 账号登录桌面 App 的用户。电脑操作功能先上 macOS，欧盟和英国稍后开放；记忆和上下文感知推荐功能，企业版、教育版以及欧盟英国用户要再等等。这一波更新的方向很清楚：Claude Code、Cursor 这些竞品都在往通用Agent 的方向走，OpenAI 要把 Codex 从编辑器里的编程助手，变成一个能跨应用、跨时间、跨工具链持续干活的数字同事。

OpenAI@OpenAI

Codex for (almost) everything. It can now use apps on your Mac, connect to more of your tools, create images, learn from previous actions, remember how you like to work, and take on ongoing and repeatable tasks.

中文

538

110.4K

herrkaefer@herr_kaefer·12h

今晚检验三岁小娃学校学习情况让她写出会写的字母第一个 A 第二个 I 后面写的啥不重要了为父已恍惚

中文

herrkaefer@herr_kaefer·1d

Many great decisions have been made from my own thoughts during planning. AI even never mentions those options. But those suggestions and comparisons made by AI are so helpful to either explore the options quickly, or inspire the ones out of them.

herrkaefer@herr_kaefer

Vibe planning / architecture design is one of the few joys left for programmers today, because it still depends on your domain knowledge and experience. Cherish it. User your brain to hold onto that thread. Don’t let AI run wild on its own. Output comes second. Waiting & coding: a minus. That’s why I prefer models that don’t overthink but give fast feedback. Faster pace, even if slightly off, helps you remain actively engaged in the developing loop. That’s why codex 5.4 high feels just right.

English

herrkaefer@herr_kaefer·2d

herrkaefer@herr_kaefer

My current take: vibe coding still has limits. You need a human in the loop to keep complexity under control and to handle the details efficiently. But vibe planning is real productivity. Every developer has limits, and can't be an expert at everything. Traditionally, we tend to frame problems and look for solutions within the space we’re already familiar with. But AI has seen far more patterns and possibilities, and can explore a much larger solution space. That’s why it often comes back with better or unexpected options. Our job is to be good at asking it for directions and suggestions — and (for now) keep the final decision in our hands. Also we have to learn quickly during the planning stage, with the help of AI of course.

English

herrkaefer@herr_kaefer·3d

Programmers 已经“沦落”到与xxx相提并论了吗😂

Eric Xu (e/Mettā)@xleaps

Programmers worried about AI agents should take a page from OnlyFans. OnlyFans was revolutionary. It made creation trivial and removed gatekeepers from idea to income. OnlyFans has been the Claude Code for the people who vibe create their way into the adult industry. So the great flood of supply came, ie those who had never been in the industry suddenly found themselves in it. What happened to those who were already in the industry? The reality is that they remained successful. The audience only increases by # of people turn 18 each year, so all things considered attention simply spread thinner and became harder to hold. So what is the underlying dynamics? Well the oversupplied market treats average work the way the internet treats a slow page. The world moves on without ceremony on "average". Skill issue is a harsh synonym for not good enough. So who made money on onlyfans? Those who are oddly specific. Software is walking into similar equilibrium: It helps to be known for something oddly specific. And of course if you already have an audience, try to keep it close and give them a reason to stay. When creation is easy, skill stops being scarce. Being wanted and being in others desire does.

中文

herrkaefer@herr_kaefer·6d

@jesselaunz 我试了oracle的CDP 方案也没问题了。之前python的没成功也可能是别的原因。

中文

Jesse Lau 遁一子@jesselaunz·6d

@herr_kaefer chrome chatgpt登录后没啥检测吧。我claude cowork每天要调用chatGPT生成一些图片，基本没有让我重新输入的

中文

144

Jesse Lau 遁一子@jesselaunz·10 Nis

pro 跑了几个PM bots的分析，确实能提供一些新的视角但每次需要从cc codex总结一段文本帖给chatGPT也蛮烦的找到了龙虾哥一个老的git：oracle github.com/steipete/oracle 初始目的很简单，就是解决codex不能调用GPT 5.4-Pro模型的问题装个skill，然后codex调用oracle启动chatGPT浏览器，自动贴上prompt，再返回结果到codex 当然这个做成skill了，Claude code也能直接调用GPT 5.4-Pro了

Jesse Lau 遁一子@jesselaunz

100刀pro初印象不佳 codex没有pro模型，故此我先让codex导出细节+数据为2个文本文件然后提交到chatGPT窗口pro研究等了半天，说finished reasoning。但啥内容也没有，我就追问ta完成了吗？结果又开始跑reasoning了感觉被奥特曼这家伙忽悠了

中文

4.3K

herrkaefer@herr_kaefer·6d

@xiaokedada

QME

nazha@xiaokedada·10 Nis

#分享 Anthropic 的 Claude Managed Agents，我自己也在内部做类似的事情（竟然如此殊途同归），从 Openclaw 爆火以来就可以预见了这样的发展：普通用户在可便捷地创建 Skills 之后，就需要可便捷地自由创建 Agent 把自由创建 Agent 的能力下放给普通用户。但 Claude Managed Agents 似乎还不够，还不够简单。设想中的 Agent 标准规范，它应该是一个类似于下面这种结构的，可打包、可分发的 zip 包。

中文

813

herrkaefer รีทวีตแล้ว

herrkaefer@herr_kaefer·10 Nis

@jesselaunz 这个有意思，它的反检测做得好啊。我做的语音输入工具也是去自动化chatgpt网页，开始尝试过CDP但是失败了，被chatgpt识别后直接登出。只好用了AppleScript. 可以参考这个再实现一下。CDP要是可行效率应该高一些。

中文

herrkaefer@herr_kaefer·10 Nis

@jesselaunz @goldengrape 原来5.4 pro是和5.4 上月一起出来的，都没怎么用过 x.com/openai/status/…

OpenAI@OpenAI

GPT-5.4 Thinking and GPT-5.4 Pro are rolling out now in ChatGPT. GPT-5.4 is also now available in the API and Codex. GPT-5.4 brings our advances in reasoning, coding, and agentic workflows into one frontier model.

中文