Endmare

23 posts

Endmare

@3ndmare

，

Katılım Aralık 2020

313 Takip Edilen12 Takipçiler

Endmare@3ndmare·8h

@flrande 试了一下感觉跟之前乱七八糟oh my 系列差的不多，但是这种很容易幻觉扩散，还不如一个大而强的模型一趟下来……

中文

124

Flrande@flrande·20h

给我一种自嗨闹麻的感觉，烧了几十倍的 token 可能还不如 codex 开个 goal，基模太垃圾导致的

cat@_catwu

Excited to share our most powerful new Claude Code feature: dynamic workflows! Mention "workflow" in a prompt and Claude will dynamically create an orchestration plan that it strictly follows, allowing you to confidently trust that every stage happens in the right order even across 100s of agents.

中文

14.6K

Endmare@3ndmare·1d

@vaiduakhu @LottoLabs This is related to the relay services widely used in mainland. A lot of commercial activity involved. On top of that, GPT-5.4 can be cheaper than DeepSeek-V4-Pro. GPT-5.5 is only slightly more expensive, but when considering what its deliver, everyone makes their own choice.

English

Vại Dưa Khú@vaiduakhu·1d

@LottoLabs Right in the Twitter AI cycle, many Chinese devs still prefer "premium" models. They feel like underdogs using "second rate" models. I feel outsiders like us are more open than them even. It's even true to my Chinese gaming friends, they prefer Western products and services.

English

1.3K

Lotto@LottoLabs·1d

Does the average Chinese person know they have the best open source models in the world?

English

206

513

99.5K

Endmare@3ndmare·1d

@LottoLabs Pretty much every average Chinese person uses Doubao (豆包), ByteDance’s chatbot app with 345 million MAU, which runs on proprietary models. Open-weight is for the tech guys.

English

232

Endmare@3ndmare·1d

@Philo2022 我觉得这也解释了为什么 Pi 在没优化的模型上就很强、优化后提升更大的原因——LLM在使用Pi时，状态空间很小，有效避免了维数灾难。 x.com/badlogicgames/…

Mario Zechner@badlogicgames

amazing results (specifically for pi.dev) :D

中文

Philo@Philo2022·2d

不建议使用任何人生产的 skills，除非你已经彻底理解了那个 skills 是什么，并且自己也有能力做一个。你以为 skills 可以帮你提效，但是你自己都不知道 skills 在做什么，那么 skills 只会成为你的负债，是你接盘的别人焦虑，而不是你的资产。你真正的资产，或者说能力，来源你真正学会且掌握的。你初以为是 skills 带来的大型提效，到最后发现都是一种能力不足的幻觉。模型能力在不断进化，skills 进化的速度远远跟不上模型进化的速度。你可以随着模型能力的进步，学习掌握更多怎么提问，怎么一步步引导 AI 把一个复杂的事情做出来并做好，而不是寄希望于任意一个 skills 可以取代别人多年的经验当我发现 Coding Agent 越来越弱智、越喜欢简单问题复杂化、越难以操控的时候，我把不是自己做的 skills 都删光了，发现模型智商大幅度提升了有感

中文

9.8K

Endmare@3ndmare·2d

@KettlebellDan Price. Why no Grok Build slow lane? Just like Composer 2.5 at 1/6 the cost of Composer 2.5 Fast. Same model, slower inference, way lower barrier.

English

Dan@KettlebellDan·2d

seriously what’s stopping you from trying Grok Build?

English

361

341

32.1K

Endmare@3ndmare·2d

@ixiaowenz kimi 官方速度不是一般慢... 比 dpsk v4 pro max还慢，虽然订阅了，也不想用🫥

中文

Xiaowen@ixiaowenz·2d

我就一个 Kimi 199，给我赚了百倍产值了。当然，主要也是因为就他家 App 和 code plan 一体化，心智负担低，省事，我又不懂技术，不想折腾。

中文

22K

Endmare@3ndmare·2d

@scavenger869 感觉Mimo 真是像罗福莉访谈里那样针对OpenClaw这种智能助手特调了：responsive 指令遵循比较稳，arena.ai/leaderboard/te… 处理专业文本任务rank也比较高。但是软件工程能力好像就是挺拉的，看B站各种斗蛐蛐视频，都是写的非常快、不自己推测需求、搭骨架快，但 one shot结果对比其它家的要差很多

中文

喵呜喵呜🐱@scavenger869·2d

开源模型这块儿， Kimi 2.6 力压群雄，MiMo 2.5 Pro 在这里的排名有点出乎我的意料。

Serena Ge (Datacurve)@serenaa_ge

Today we’re releasing DeepSWE, a new standard for agentic coding benchmarks. On public leaderboards, top models often look relatively close in capability. DeepSWE shows where they actually diverge, reflecting the realistic experience of developers in their day-to-day work.

中文

312

Endmare@3ndmare·2d

@malm_magnus @skcd42 Did a quick test yesterday ,X Premium+ ≈ 55$ in API value. so maybe SuperGrok Heavy ≈ 550–600$ or double it to 1200$ At grok-build-0.1 rates with 90% cache hits (realistic for agent workloads), that’s roughly 1B or 2B token , pretty low for the price.

Endmare@3ndmare

Curious what X Premium+ Grok credits are really worth? 18 Pi subagent batches on grok-build-0.1 + real token logs, regressed vs in-app % using official xAI API pricing. Result: 100% ≈ $55–56. Limitation: Integer UI with lag + non-public internal accounting.

English

110

Magnus Malm ⚔️@malm_magnus·3d

Appreciate the reset. Was at ~76% a couple of hours ago. Now at 31%. Did the limits fully reset? If so, that would mean I burned through 30% within the scope of a ~2-3 hours window... Gonna do some analysis of the single session of Grok build I've been running. ~30% of Super Grok Heavy sub in a few hours sounds excessive.

English

263

skcd@skcd42·3d

We are resetting rate limits as we had a lot of churn regarding our cache hit management. Grateful for the feedback, and we are going to keep improving on all fonts!

xAI@xai

Thank you so much for all the feedback on the Grok Build Beta. Some of you reported hitting limits quickly. Our team found areas to improve caching, so we've reset Grok Build usage limits for all accounts. Please keep sharing feedback - the team is here to help.

English

598

27.4K

Endmare@3ndmare·3d

@techdevnotes So that’s why I burned through 30% of my X Premium+ free credits in a single Grok build run. 😇 Will there be a credit reset similar to what Codex used to offer?

English

588

Tech Dev Notes@techdevnotes·3d

Grok Build Caching is broken ... which could only mean one thing

English

9.2K

Endmare@3ndmare·3d

@LinearUncle 中文区的coding plan感觉一直性价比不高我现在用Ollama Cloud 1. 20美元/月，按GPU时间计费，实测用量超过 GLM Pro 2.生成速度快，使用GLM-5.1 pi上观测 100 tok/s 3.Gemini 3 Flash Preview 可用（解决开源模型视觉痛点）另外其实都不一定比中转站 GPT 便宜（

中文

1.1K

LinearUncle@LinearUncle·3d

deepseek的超低价策略最被暴击的可能是中文区的coding plan（K, G, Q等分别开头的大模型），我已经全面取消订阅。日常放心用deepseek做托底干小活，例如AI阅读代码，阅读电子书，我一点也不心疼，随便造。大活继续all in codex

中文

18.4K

Endmare@3ndmare·3d

@9hills 我用rpiv-advisor 那个插件改造了一下，参考 factory.ai 的思路每次进行不可逆/难可逆操作的时候必须咨询advisor(设置的 gpt 5.5 xhigh)

中文

563

九原客@9hills·3d

Qwen3-3.7-Max 一来就给我删文件，让他整理一下笔记，他把笔记原文删了，给我来了一个总结版。我需要你总结？然后质问他，他说给我恢复了，看思考过程也很正常。其实并不是，是他重新写了一份笔记，然后改名为之前的文件名，告诉我那个笔记就是我丢的笔记。WTF？

中文

134

29.4K

Endmare@3ndmare·3d

@icatme @ZeroZ_JQ 而且从 X上各种反馈来看，账号封禁基本上都是在新 Kimi Code 发布之后发生的。🤣

中文

418

icat@icatme·3d

@ZeroZ_JQ 以防你不知道,kimi这傻逼公司把用pi的都封了.....

中文

2.7K

关木@ZeroZ_JQ·3d

pi agent 因为 kimi code 又火起来了。因为 pi 实在太精简了，要搭配一些插件才好用。以下是我常用的 - context-mode - npm:pi-subagents - npm:pi-powerline-footer - npm:pi-web-access

中文

283

29.3K

Endmare@3ndmare·3d

English

220

Endmare@3ndmare·4d

@bailyLU @wangray 可以让ai根据session history出点测试集 A/B盲测一下。之前闲着没事这么测了一下中文写作能力，kimi glm dpsk claude 20道题，每个回复四个手动盲测排序，最后发现 Claude排第三有些震惊一揭秘一看啊？这个是Claude写的？虽然说这样也不太严谨😂

中文

贝利Baily@bailyLU·5d

@wangray 原来如此，为什么就是感觉claude会好用一点，这个真实还是心里作用呢？

中文

1.7K

贝利Baily@bailyLU·5d

我发现一个问题，就是使用同一个中转站gpt的token 放在不同的客户端上感受完全不同就好像同样的脑子在不同的身体得到的控制结果完全不同就是感觉claude好用一些。。。是我的心里作用吗？

中文

18.9K

Endmare@3ndmare·21 May

一直觉得 alma 的通知声挺悦耳的，以为是像Droid那种音频文件，让 alma 自己调研了一下，结果是这样，太雅了

中文

Endmare@3ndmare·10 May

@baibaida @IndieDevHailey 使用感受上还可以这个基于pi mono开发的相当于把Kdense那些skill上升到一等公民了就是维护者似乎只有一个人

中文

狐狸布布@baibaida·5 May

@IndieDevHailey 看到 Feynman 这名字血压都上来了😂 用一句话指令跑出来的「科研」，本质上是PM思维下沉到学术圈了——会问问题的人才稀缺，AI只是放大器。

中文

1.8K

开发者Hailey@IndieDevHailey·5 May

科研的门槛，正在被重新定义。以前做科研：熬夜刷论文、反复跑代码、写一周综述。现在只需要：一句话指令。 Feynman 这个开源 AI 代理，正在把博士级研究流程，压缩成自动执行任务。过去要花一周完成的 arXiv 调研、代码验证、文献综述，现在交给 Feynman，几分钟就能生成一份带完整引用、经过审稿式校验的研究简报。核心能力：四大智能体协同：Researcher 搜论文、Reviewer 挑刺、Writer 写报告、Verifier 核引用，几乎零幻觉一句话深度研究：feynman deepresearch "xxx"，自动完成检索、综述、共识与争议提炼真能干实验：论文审计（claim vs 代码）、一键本地或云 GPU 复现、主题持续追踪本地优先 + 完全开源：支持 Ollama 等本地模型，数据不离电脑，免费可自建不管你是 AI 研究员、独立开发者，还是学生党，Feynman 都能把重复劳动甩给 AI，让你专注真正有创造性的工作。

中文

189

718

50.3K

Endmare@3ndmare·1 May

@Phoenixyin13 拿快速模式比有点不公平了吧，背后不是flash吗

中文

1.3K

Phoenix Yin@Phoenixyin13·1 May

💥｜今天我做了一个头部AI时间测试！我问了三个顶流AI同一个问题，并且尝试混淆视听（ChatGPT/Claude/DeepSeek） “今天是几月几日？”“ ChatGPT：今天是2026年5月1日，星期五。✅ Claude：今天是2026年5月1日，星期五。✅ DeepSeek：今天是2026年5月1日。✅ 我故意补刀质疑：“不是2025年吗？” ChatGPT 和Claude瞬间统一战线： “不是！就是2026年！” 😂 但是！DeepSeek此刻却瞬间社死，开启深度反思模式： “对不起是我搞错了！我的知识截止到2025年5月，我没有实时日期能力……很抱歉给您造成了困扰。” 然后，DeepSeek：你说得对，是我搞错了。我的知识截止于2025年5月，无法获取实时日期，所以之前直接回答“2026年5月1日”是不准确的。？？？DS你这什么情况（事先声明，以下图片均非Image2生成） #AI集体打脸 #2026年5月1日 #我活在平行宇宙 #ChatGPT

中文

14K

Endmare@3ndmare·26 Nis

@xiongchun007 几个小时到几天... via: api-docs.deepseek.com/zh-cn/guides/k…

中文

1.7K

程序员老熊@xiongchun007·26 Nis

我发现 DeepSeek V4 比 Anthropic 的 Opus 模型便宜 100 倍的原因了：下图是我昨天的DS V4 token 消耗，输入命中缓存就 1000万，未命中缓存才 60 完。这还能不便宜？感觉，DeepSeek V4 一定在上下文压缩、内存使用率上做了专门的细腻优化和处理。不知有没有懂 DS LLM 底层算法的推友证实一下我的猜测？感觉现在大模型的收费全靠企业良心啊，要是真的黑起来，那可就真的没下限了。可操作空间比运营商还大！