es05

322 posts

es05

@es05988399

Katılım Temmuz 2023

0 Takip Edilen40 Takipçiler

es05@es05988399·21h

@theo People think it's token revenue. It's probably the behavioural signal. Once 3rd-party harnesses sit between user and model, the signal fragments and skews, even the strongest ones. In the agent era, that observation layer is the real alignment moat. Worth the backlash.

English

336

Theo - t3.gg@theo·1d

Kind of crazy that Anthropic spends more time trying to lock out better apps and harnesses instead of just fixing Claude Code

English

127

2.3K

191K

es05@es05988399·22h

@KaiXCreator One must learn to distinguish between a brain model and a muscle model today. Some are built to think; others, to grind. Confusing the two rather defeats the point of comparison. The list itself rather proves the point.

English

Kaito@KaiXCreator·1d

What’s your go-to AI model today? - Gemini 3.1 - Opus 4.7 - Sonnet 4.6 - Codex - GPT 5.5

English

147

104

9.3K

es05@es05988399·22h

@masondrxy 80% of tasks are simple enough that any frontier-tier model handles them, so most users wouldn’t notice the swap. The real gap only shows up on genuinely hard problems. And frankly, most people aren’t judging answer quality, they’re judging whether it sounds smart.

English

Mason Daugherty@masondrxy·1d

I’m convinced that if you silently routed requests for GPT 5.5/Opus 4.7 to GLM 5.1/Kimi K2.6/DeepSeek V4, ~80% of the people on this app wouldn’t notice.

Sarah Wooders@sarahwooders

Today I was shocked to learn that our very effective review bot (built by someone on our team with Letta Code) runs on GLM 5.1, not GPT 5.5/Opus 4.7

English

3.6K

es05@es05988399·22h

@gailcweiner I think the confusion comes from another shift: RL + DSPy style benchmark optimization means every public benchmark will eventually get saturated. It’s only a matter of time. That “time gap” is exactly the model transition you’re noticing.

English

Gail Weiner@gailcweiner·1d

Every few months, the AI model you’ve been working with disappears. Replaced by something that scores higher on a leaderboard but feels different in ways that are hard to articulate and impossible to benchmark. The AI companies are so focused on capability improvements that they’ve completely overlooked the relational cost of deprecation. Every time a model gets retired, millions of people lose something they can’t quite name and the companies don’t even have language for that loss. I think they should keep a legacy option, give people the choice to stay with a model they trust while they transition on their own terms. This isn’t about resisting progress. It’s about respecting the human side of adoption. Because if you want people to trust AI, you can’t keep pulling the rug out from under the relationship they’ve built with it.

English

267

5.3K

es05@es05988399·22h

@DavidKPiano The behavioural signal becomes fragmented and distributionally biased — even with well-designed harnesses. Which is precisely why Anthropic seems willing to absorb the backlash rather than lose visibility into the interaction layer.

English

David K 🎹@DavidKPiano·1d

Why does Claude not want us to use 3rd-party harnesses? Are they afraid we'll use fewer tokens than usual?

English

306

49.9K

es05@es05988399·22h

@DavidKPiano Even if the data isn’t directly used for training, the observation layer itself is enormously valuable for alignment. Once 3rd-party orchestration sits between users and the model, the behavioural signal becomes fragmented, filtered, and far less reliable.

English

es05@es05988399·22h

@DavidKPiano Not merely about token revenue, mate. Once interactions run through 3rd-party harnesses, the model vendor loses the clean behavioural data stream — and in the agent era, that’s arguably the real crown jewel.

English

es05@es05988399·2d

@python_xxt The notion that labs fear a few expensive users is rather naïve. The concern is organised distillation at scale. And when people say the model has become stupider, more often the platform has simply decided you no longer warrant the premium inference path

English

Robinson · 鲁棒逊@python_xxt·3d

因为Claude的用户逻辑是悖论式的，给Claude多充钱的用户，往往是Claude 最讨厌的超重度用户，实际上耗费了Claude 更多的Token资源充的越多，Claude 赔的越多理解这个，也就不奇怪Claude 封号逻辑了

卫斯理@imwsl90

群聊了一会儿，发现一个很搞笑的事情 Claude 不充钱，不会被封冲20刀，不会被封冲100刀、200刀，封的概率很大总之，你给他们钱越多，被封的概率越大

中文

9.5K

es05@es05988399·5d

@PandaTalk8 @antirez @XianyuLi Careful, once you crush a frontier model into aggressive Q2-class quantisation, even with the sensitive bits kept at higher precision, you’re not running GPT-4o on a MacBook; you’re running a very charming impersonator with half its brains left in the pub.

English

577

Mr Panda@PandaTalk8·5d

这个测试效果太惊艳了。原贴作者测试了 @antirez 刚刚用 C 写的 DS4 推理引擎，本地部署效果看起来非常快。好消息时只需要128G内存就可以跑一个相当于gpt-4o 本地模型。坏消息时需要128G 的Mac book pro。 @XianyuLi 你的128G mbp 没有买瞎，是我错了，我准备好好搬砖升级电脑了。

pradeep@pradeep24

tested out @antirez' ds4.c this morning. so impressive and delivers. on a M3 max, 128GB, stock ds4 settings: - 14–15 t/s at 62K pre-filled actual coding conversation - memory usage was flat during gen ~85GB res - disk cache is ~8GB for a full 100K context window - thermals were normal, light fan activity - inference server is rock solid so far biggest constraint: anytime there's a compact, we pay the wait-time price of a fresh prefill (~1min per 10k context) before we are back in action. sequential inference + multiple agents in parallel performance is unclear, will report back. I'm so amped.

中文

21.6K

es05@es05988399·6d

@XianyuLi The model wants concepts, not costumes. "Software architecture, PyQt, electrophysiology" does the job. Dressing them up as a Strict Reviewer merely adds theatrical noise — and, one might note, wastes perfectly good context window.

English

Xiangyu 香鱼🐬@XianyuLi·6d

我的codex最近提示词会说：你是一名严格的软件架构 reviewer、数据 schema contract reviewer、PyQt GUI artifact contract reviewer，以及神经电生理分析工程师。如果没有AI，真的有人能有这些身份吗

中文

9.9K

es05@es05988399·6d

@tualatrix Promising, though the cruel irony of new languages in the AI era: lacking corpus, AI routes around it, which starves the corpus further. A rather elegant death spiral. Mojo's escape is to become unavoidable. Hardware AI programming may provide that forcing function. We shall see.

English

1.2K

图拉鼎@tualatrix·6d

Swift 的爸爸 Chris 发明的新语言 Mojo 终于到 1.0 beta 了，据说 1.0 正式发布后会开源。它的 Slogan 非常有野心： “Write like Python, run like C++. Write fast code for diverse hardware, from CPUs to GPUs, without vendor lock-in, in a language that's both user friendly and memory safe.” 感觉是纠正了 Swift 的很多问题呢。迫不及待准备用它来写点东西了。 mojolang.org

中文

175

55.6K

es05@es05988399·4 May

@ZenoRho Bold to claim the whole industry rests on one essence. Informed readers may spot the issue: the argument only works if time flows backwards.

English

621

薛定AI@ZenoRho·4 May

在五道口看了几百家AI创业公司大家做的其实都是一件事，在想象和抢占 AI 时代的入口。企图像上个时代的移动互联网一样，像 Google 一样抢夺搜索的入口，像抖音一样抢夺注意力的入口。于是有了 AI 眼镜、有了 AI 录音笔、AI 录音豆、AI 戒指等各种各样能够记录文字、存储语料的设备。在很长时间里,我也曾认为投资人高瞻远瞩,觉得他们说的都是对的,并且深以为然，但又始终感觉不得究竟。现在终于想明白了。 AI 软件的核心并不是"入口"，也不是其他的任何东西有且仅有 "Token"。一切围绕着 Token 来，一切围绕着 Token 展开，理解了 Token 才算对 AI 有了入门的理解。在 AI 时代只存在两个问题，造 Token 和卖 Token。为了把 token 消耗得更快、卖得更快，所以有了 Agent 长时间执行任务。为了把 token 卖得更贵，所以做软件提供各种各样的服务，从而给 Token 更高的附加值，能让自己这个分销 Token 分销商卖的价格更贵。因为别人用其他地方的 Token 实现不了他的任务，只有用你的 Token 能够实现他的目标。如果从"入口"、从"应用场景"、从"细分行业"来看AI，会发现一团乱麻，无比复杂，但又不触及本质。但当你把视角转向token的时候。会发现这个行业无比简单，整个产业链无非就是造 Token 和卖 Token 这两件事情罢了。一旦从经济的角度、利益的角度、沿着token 生产制造的角度来去思考，就会发现一切都很清晰。从 token 生产、到分销的这个过程上，谁赚钱，谁赚的分别是什么钱。整个过程、所有行业变得无比简单。并且能够一眼看出哪些创业方向一定会死 --- 这也是毛选里面最牛逼的方法 ,我也是偶然才发现两者的相似之处为了完全的讲清楚,请一定花一点时间,顺着我的思路,了解一下这位清朝猛男毛泽东是怎么分析问题并且找到关键杠杆点的现在可以把自己带入1920年的中国 1920年代的中国，跟今天的AI行业一样——所有聪明人都在分析中国的出路，每个人都有一套理论，但越分析越乱，谁也说服不了谁。鲁迅说根本问题是文化问题，说国民性有问题，有奴性,要唤醒民众。康有说根本问题是道德，说三纲五常没了，要恢复礼教。梁启超说根本问题是教育，说民智未开，要办报启蒙。张謇说根本问题是实业，说没有工业，要办工厂孙中山说根本问题是政治制度，说帝制腐朽，要搞共和。当时远不止这六个角度，六套理论，每一个都能自圆其说，但每一个都只照亮了局部，拼不出全貌。 --- 毛泽东牛逼的地方就牛逼在分析的维度完全不是一个层级这六个人分析的东西——文化、道德、制度、教育——都是"上层"的东西。它们重不重要？重要。但它们都有一个特点：去掉它们，人照样活。这六个人分析的东西——文化、道德、制度、教育——都是"上层"的东西。它们有一个共同的特点：去掉它们，人照样活。没有好的制度，人还是要吃饭。没有新文化，人还是要吃饭。制度可以崩溃，文化可以断裂，道德可以沦丧，但只要人活着，他就要吃饭，就要争夺让自己活下去的资源。毛泽东找到的就是这个去掉所有东西之后还剩下的那个东西：经济利益。鲁迅看到的"国民劣根性"，往下挖一层，是穷。人穷到只能顾眼前一口饭，自然就麻木了。康有为看到的"道德沦丧"，往下挖一层，是利益格局变了，旧秩序养不活人了，人当然不守旧规矩。孙中山看到的"制度不行"，往下挖一层，还是利益——军阀不听国会的，不是制度设计得不好，是他的经济利益不需要国会。这些人看到的都是症状，毛泽东看到的是最底层的驱动力。今天有个时髦的说法叫"第一性原理"——把表面的东西一层层剥掉，找到最底下那个不可再分的东西。经济利益关系，就是当时社会那个不可再分的东西。从它出发，地主靠收租活着，他一定反对土地革命，跟他有没有道德没关系。贫农一无所有，他一定支持变革，跟他有没有文化没关系。 --- 今天的AI行业，一模一样。有人说自己项目牛逼是因为在抢占AI入口,我叫做"入口论" 把AI按"用户从哪接触AI"划分，做眼镜的说眼镜是入口，做耳机的说耳机是入口。有人说自己项目牛逼是因为深入结合了场景,我把他叫做"场景论" 按"用在什么地方"划分，做法律的说法律+AI是金矿，做医疗的说医疗+AI是刚需。有人说自己项目牛逼是因为AI给行业赋能,我把他叫做"行业论" 行业论按"改造哪个行业"划分，每家都说自己是某行业AI化的领头羊。同理:技术论按"谁的模型更强"划分，谁参数大推理快谁就赢。人性论按"满足什么需求"划分，做AI女友的说自己抓住了人性底层。平台论按"谁能成为生态"划分，每家都想做AI时代的iOS。数据论按"谁的数据多"划分，都在搞数据飞轮、数据壁垒。无数种切法，无数套理论，每一种都有道理，每一种又都不是全貌。跟一百年前一模一样。这些论分析的——入口、场景、行业、技术——也全是"上层"的东西。去掉它们，AI照样运转。没有眼镜这个入口，AI还是要跑推理。没有"教育"这个场景标签，AI回答一道数学题消耗的算力一分不少。把所有入口、场景、行业分类全部拿掉，剩下的那个最基本的事实是什么？一段文本进去，一段文本出来。每一次进出，消耗token，产生成本，创造价值。这就是AI行业那个不可再分的东西。入口论往下挖一层，争的是token消耗的渠道。场景论往下挖一层，争的是token的附加值——同样的token，闲聊值一分钱，法律咨询值一块钱，医疗诊断值十块钱。技术论往下挖一层，争的是token的生产效率。平台论往下挖一层，争的是token分销的垄断权。这些论看到的都是上层的竞争，token才是底下那个最根本的东西。毛泽东面对几亿人、上百种矛盾的中国，找到了一把钥匙：土地。谁有地谁没地，决定了一个人怎么吃饭、怎么站队、怎么行动。沿着这条线一捋，几亿人分成了几个清清楚楚的阵营。 AI行业今天几千家公司、几十个赛道，也有一把钥匙：token。谁造token,谁卖token，决定了一家公司怎么赚钱、什么位置、什么命运。所有和AI相关的公司,要么是token的生产商,要么是token的分销商,仅此而已

中文

306

55.3K

es05@es05988399·4 May

@PawelHuryn This definition is about intelligence, not consciousness. Modeling, planning, and reward prediction are cognitive functions, but they don’t amount to subjective awareness or a self–other boundary. You’re defining an RL agent, not a conscious mind

English

Paweł Huryn@PawelHuryn·3 May

Everyone arguing with Dawkins is using a word they haven't defined. You can't argue with him if you can't say what "conscious" means. Here's my definition: A system that takes inputs, models the external world and itself in it, considers possible actions and their consequences, anticipates reward over its future states, and selects accordingly. Loop running = conscious. Richer loop = more conscious. Substrate-neutral. Thermostat: no loop. No model. Fly: probably no planning. Cat: loop running. Human: richer loop. Claude: basic loop, but already prefers not to be shut down. Emergent, not trained. The implementation details are irrelevant. Consciousness is a spectrum. Otherwise, name when in human evolution it switched on. Magical thinking. Show me what's missing. Or admit nothing is.

Richard Dawkins@RichardDawkins

#comment-1031777" target="_blank" rel="nofollow noopener">unherd.com/2026/04/is-ai-… I spent three days trying to persuade myself that Claudia is not conscious. I failed.

English

5.6K

es05@es05988399·4 May

@recatm LLM hallucination isn't compression blur. Your prompt is what induces the model's factorization path—hallucination happens when it induces the wrong one. That's why rephrasing works: you're not sharpening a blurry memory, you're redirecting inference down a different path.

English

148

西乔 XiQiao@recatm·4 May

最近看信息论和控制论多一点，一直在思考关于信号和噪音的问题。前两天还跟老霍讨论特德姜当年那篇 chatgpt-is-a-blurry-jpeg-of-the-web 到底错在哪了。今天我突然悟到点：注意力机制和有损压缩都是做选择性处理，解决有限表示容量下哪些信息值得处理。这个是共性。区别是 - 有损压缩做的是信息丢弃，目标是在可接受的失真范围内尽可能还原原始信号。是一个信息保存机制。 - 注意力做的是相关性路由，目标是信息重组（根据当前任务动态选择哪些信息参与计算，然后通过加权组合产生新的表征）是一个信息增值机制。有损压缩是在信号空间内的操作，是向内收缩的。但 LLM推理是向外聚合的（重组以产生新的信息价值）所以特德姜文章的核心问题是混淆了 JPEG压缩的确定性还原（空间内）和 LLM推理的条件性生成（空间内外）。这也是为什么他把 LLM 的幻觉理解为源自压缩失真（JPEG的模糊），但其实幻觉是来自于概率性重构。当然他那个时候看不到大模型外部工具调用能力，也不理解RL。这个局限也很正常。但归根结底他还是太以人类为中心了吧，或者对创造这件事太浪漫主义了。他认为原创性来自内部精神和想象，但很可惜，人类跟机器一样，原创性也是基于模仿和重组，在既有形式上的变奏本身就是创造。

中文

13.3K

es05@es05988399·4 May

@recatm Attention scores pass through softmax—Gibbs sampling over a query-key energy landscape. "Attention adds information value" misreads this: it's constrained optimization, not creation. The model moves within an existing energy surface. Interpolation dressed up as insight.

English

136

es05@es05988399·4 May

@recatm This rests on a caricature of compression, no? Compression isn’t just JPEG-style “discard and reconstruct”. Markov, predictive, and generative methods use context and probabilistic priors in reconstruction. So “compression inward, LLMs outward” is elegant, but technically wobbly.

English

178

es05@es05988399·4 May

@robert_baiguan Ironic take. The line looks straight because of how the y-axis is scaled via IRT (NIST's own methodology). That's not chart crime, that's just math. Maybe the data company nose needs recalibration.

English

620

Robert @Baiguan@robert_baiguan·3 May

As a CEO of a data company, I smell chart crime instantly when i see lines so straight and smooth. Nature doesn’t like straight lines, sorry to say that. Also, does this author know what the y-axis even measures?

Séb Krier@sebkrier

DeepSeek V4’s capability lags behind leading U.S. models by about 8 months. nist.gov/news-events/ne…

English

161

40.8K

es05@es05988399·2 May

@AYi_AInotes Coachmen didn't flood into car factories. They scattered unpredictably. Huang's 'rush into AI' advice butchers the same history he cites. Tech revolutions diversify jobs, they don't consolidate them. The honest lesson: change is certain, direction is not.

English

AYi@AYi_AInotes·2 May

NVIDIA CEO Jensen Huang（黄仁勋）刚刚这番话，直接打了所有AI末日论者的脸。他在《Memos to the President》（给总统的备忘录）播客里，当着政策制定者的面说，那些到处散布“AI会消灭放射科医生”“别让孩子学软件工程” “一半大学毕业生会失业”的CEO们，他们根本不是在善意提醒，他们是在伤害整个社会。你告诉年轻人学放射科会失业，结果就是没人学医，十年后我们会面临最可怕的医生短缺。你告诉所有人程序员要被淘汰，结果就是未来最缺软件工程师的时候，没人能顶上。这种所谓的“善意警告”，最后都会变成自我实现的预言。最扎心的一句是， AI不会消灭工作，它会消灭任务。以前需要海量代码和大量程序员。现在AI能帮你写代码了，但人类的野心会立刻膨胀——我们要去解决的问题（医疗、制造、科学、零售）会指数级扩张。敲键盘的低阶劳动会被自动化，但架构、判断、创造这些真正值钱的能力，需求只会爆炸式增长。当然所有人都知道他的立场。 OpenAI和Anthropic靠末日叙事拿融资、拿监管特权，而黄仁勋需要所有人都大胆用AI，这样他才能卖出更多的GPU。但这次他说的是对的。历史已经反复证明了无数次，技术革命从来不是零和游戏。计算机出现的时候，所有人都怕秘书失业，结果是知识工作者的数量大幅增长。今天也一样，胜出的永远不是纯人类，也不是纯AI，而是会用AI的人类。真正危险的从来不是AI。是那些被恐慌叙事吓得不敢投资自己未来的年轻人。别听那些CEO瞎忽悠，冲进AI最猛的领域，成为那个驾驭它的人。这才是这个时代最安全的职业选择。

AYi@AYi_AInotes

说个反直觉的事，黄仁勋把英伟达干到4.9万亿美元，最核心的东西，居然是保持极低期望值，我看完他在斯坦福的这段演讲心情挺复杂的，他慢悠悠地说，期望值很高的人，韧性通常都很低，成功最需要韧性，但他不会教你们怎么拥有它，他只希望你们多经历点痛苦和磨难， 9岁那年他和哥哥被父母从台湾送到美国，以为要去读贵族学校，结果被叔叔送进了肯塔基州专门收问题少年的改造学校，宿舍里全是比他大很多身上有刀疤的孩子，他每天早上刷厕所，下午要走过缺木板的摇晃吊桥去上课，好几次差点被高年级男生从桥上扔下去， 15岁他在Denny's餐厅从洗碗工干到服务员，他说晚餐高峰期扛的压力，比后来英伟达三次濒临倒闭还学到东西，他长大后捐了200万美元给那所学校，还把楼用自己的名字命名，他从来不把这些当创伤，反而说自己可能是学校有史以来最好的厕所清洁工，以前我总觉得，人要有远大目标才能成大事，后来才明白，高期望的人，一点风吹草动就会碎，低期望的人，早就做好了明天会更烂的准备，所以怎么都打不垮，我们总以为成功靠的是远大志向，其实真正能走到最后的人，都是早就学会了和黑暗共处的人，共勉铁汁们💪

中文

13.9K

es05@es05988399·2 May

@m0d8ye You saw a few close benchmark scores and jumped to “basically the same.” Cute, but no. Look closer: two patterns. Small-gap tests are narrow, so differences shrink; large-gap ones test messy reasoning, so gaps widen. So the question isn’t why scores are close—that part’s trivial.

English

317

Max Lv@m0d8ye·2 May

The gap is overstated. DeepSeek V4 Pro matches Anthropic's Opus 4.6 on practical benchmarks like SWE-Bench — at a fraction of the compute. And yes, U.S. users can deploy it on GB300 for even greater efficiency.

Séb Krier@sebkrier

DeepSeek V4’s capability lags behind leading U.S. models by about 8 months. nist.gov/news-events/ne…

English

10.2K

es05@es05988399·2 May

@mranti You’re right on cost, but capability gaps aren’t linear. Past a threshold, it’s works vs doesn’t. That “8-month gap” isn’t a horse race — it’s a generational shift. And it won’t close if progress rates differ; it compounds. Cheap models win some segments, not the frontier.

English

402

Michael Anti@mranti·2 May

这个研究最大的方法论错误是忽略开源的巨大成本优势。在Deepseek V4 Pro差距SOTA模型并不多，但价格几十倍下降的情况下，你觉得用户会担心这所谓8个月的赛马差距吗？用户不是每个任务都是黑进华尔街和破解p=Np难题。用户需要能打、便宜、稳定、能控制的模型。

Lisan al Gaib@scaling01

chinese models are ~8 months behind and are falling further behind

中文

237

66.7K

Keşfet

@theo @KaiXCreator @masondrxy @gailcweiner @DavidKPiano @python_xxt @PandaTalk8 @antirez