Bryce

97 posts

Bryce

@brycezhang

🖖

Earth Katılım Mart 2010

186 Takip Edilen13 Takipçiler

Bryce@brycezhang·18 Nis

为了保持一个支付架构问题的上下文一致，一直在一个会话里反复沟通，期间触发 Codex Context 压缩很多次，结果刚刚一看这一天干掉了13亿token，Pro 的用量也直接告急，以后再也不敢了

中文

Bryce@brycezhang·29 Mar

从来没想过，实践Harness Engineering的最大动力，竟然是干眼症基于Symphony实现的自动化工作流，最大痛点是只能执行边界清晰的任务。新的需求场景，必须先人工拆分需求子任务，才能投入这个自动化流程，最终还需要人工验收。整个过程最大的瓶颈就是人的部分 AI 可以不间断工作，我的眼睛先干废了☠️

中文

Bryce@brycezhang·11 Mar

Idea - PRD - Code 这个非常互联网标准的工作流已经过时。作为创业团队我们 PRD 只有一张示意原型图（Gemini），设计师美化后快速推进 Agent 编码实现（Codex/CC），集成测试没有明显bug即可上线。影响速度的往往在人工参与的部分，AI在各个环节质量已经很高，重点已经是构建AI深度融合的工作流。

Harrison Chase@hwchase17

x.com/i/article/2031…

中文

112

Bryce@brycezhang·9 Mar

@YanyuRensheng 请问使用的什么工具构建的，目前看到OpenAI自己的symphony，但还处于preview阶段

中文

673

S Li@YanyuRensheng·8 Mar

给GPT-5.4匹配一套比较合适的“harness”之后，他是我至今遇到过的极为少见的集优秀项目管理、软件工程和产品管理能力于一身的“人才”。比如一个上百个模块的系统，只需要给他一个包含模块依赖的入口文件，他就能据此对系统展开合理推断，生成一种类似于系统蓝图的文字描述。那种洞察力真的让我感到震惊。

中文

17.7K

Bryce retweetledi

Chris the EVMist@dev_at_EVMist·7 Mar

AI should only help facilitate your learning. Never ask it to do something 100% for you that you are already capable of doing yourself. When you don't understand something, then you interrogate the AI as an opportunity to learn. You can be dependent on AI to teach you and improve yourself, you can't be dependent on AI to produce for you and achieve the same result... you will depreciate. I speculated and made this a while ago HOWEVER its being backed up more and more through social observation and peoples first-hand accounts and research studies.

English

203

32.5K

Bryce@brycezhang·6 Mar

ChatGPT-5.4 Fast 果然够快，一个上午就干掉了接近2亿token🙃

中文

Bryce retweetledi

Naval@naval·20 Şub

Careers are dead. Jobs are dying. Opportunities arising.

English

1.5K

3.3K

37.7K

2.2M

Bryce@brycezhang·7 Şub

Codex 5.3 高强度用了一天，速度快如Claude，指令遵循和完成度却高出一大截，绝大部分任务一次性完成，而之前 5.2 太慢只能用来code review。只是打开用量一看，消耗 1.06 亿token...

中文

Bryce@brycezhang·7 Şub

我的工程体验也类似：AI 可以非常高质量完成编码，但最终一定需要人工 review。不需要逐行 code review，一个有经验的工程师可以快速判断隐藏风险和设计意图是否被破坏。

Andrej Karpathy@karpathy

I tried to use it this way and basically failed, the models aren't at the level where they can productively iterate on nanochat in an open-ended way. (Though one of the primary motivations for me writing nanochat is that I'd very much love for it to be used this way as a benchmark for agents, and I'd love it if it worked over time). I'm open to this just being skill issue. E.g. here some of the things I'd be suspicious about: - the zoo of torch compile flags can knowingly be abused to get +1% gains but often at the cost of +30min compile time. This is why modded-nanogpt prohibits torch compile kwarg engineering and why I haven't done any in nanochat either. i wouldn't reliably expect the model to notice, consider, or flag this kind of an issue or seek clarification. - ns_steps=3 might be a tiny bit of speed, but does the model also volunteer to make sure quality doesn't fall too much? - same thing for deleting .float() cast - sure you can remove it and get VRAM/speed gains but it's there for a clear reason (extra precision in the loss function). Removing it means you absolutely have to make sure that the lower precision is ok validation loss wise, in a highly controlled experiment. Overall I'm still struggling with getting the models to do significantly more basic things. For example, Opus keeps incorrectly "cleaning up" my comments when it doesn't understand them even when it's completely unrelated to the task, rude! It keeps violating and ignoring CLAUDE .md instructions on coding style but when I ask, it correctly points out all the violations. I know, I'm supposed to be using some kind of a /cleanup. Yesterday it gave me a table of results and incorrectly reported which experiment worked best (the table showed xyz=20 was best and it incorrectly claimed that xyz=12 was). Basically - much simpler things still fail routinely than something open-ended like "improve nanochat". (I've been doing a lot of YELLING IN UPPER CASE and I think this could actually be a really good metric for A/B testing instead of the inline survey thing.). Still incredibly net useful with oversight and with clear, well-scoped tasks. I definitely haven't given up on automatic closed-loop experiments with the models. It would be so glorious. I had 2 iterations that basically didn't work but I have ideas for the 3rd.

中文

Bryce@brycezhang·29 Oca

@dotey 人是要来担责的，职业程序员就不可能让项目变成黑盒

中文

287

宝玉@dotey·29 Oca

“离开剂量谈毒性就是耍流氓”，AI 代码要不要架构要不要 Review，也要看场景看应用，Demo、玩具应用、Skills 里面用的小脚本，不重要的功能模块，这些完全可以当黑盒子看输入输出就可以。但是给别人用的，有安全要求、性能要求的，需要长期维护的，就得慎重了，可以用 AI 写，人得把关

马天翼@fkysly

现在有部分人担心的是，AI 把项目改成黑盒，架构自己理解不了了，从而以后不好维护了，所以还需要去读 AI 代码，识别 AI 代码的实现问题。咋说呢，我觉得这也很正常。我甚至也不知道了，就是说 AI 代码到底应不应该去考虑架构、考虑实现。现在更像是一个分水岭，从手工编程时代迈向 AI 编程时代的重要转折点，虽然推特上大多数应该都是毫无疑问的全 AI 编程，但是仍有大部分的群体，是手工编程的。到底怎么看这个问题，我觉得我也没有特别能说服别人的答案。

中文

25.2K

Bryce retweetledi

Matt Pocock@mattpocockuk·12 Oca

Frontend is WAY harder for AI than backend. That's because it's flying blind. It can't test the code in the environment where it's running - the browser. Here's how to hook up AI to your browser:

English

152

189

2.9K

220.6K

Bryce@brycezhang·24 Ara

「继续」，看来我们中断的次数还挺多😂

中文

Bryce@brycezhang·27 Kas

跟我的最近使用体感一致，3家性格各异，还是需要按场景混着用。按需求选型建议： - 要 “一次性做全、做深、生产可用” → 优先 Claude Opus 4.5 - 要 “防御性强、兼容旧系统、自动补坑” → 优先 GPT-5.1 - 要 “严格按规格、少废话、最低成本” → 优先 Gemini 3.0

Tibo@thsottiaux

Codex team is working on a few experimental projects that are starting to shape up and I’m excited to share more about soon. But I’m curious, what would you like to see ship or improved by the end of the year other than better models?

中文

197

Bryce@brycezhang·2 Eki

@llliulllll 可以去这里碰碰运气 soraivideo.com/zh/sora_2_invi…

中文

刘冲｜中级会计师@llliulllll·2 Eki

@brycezhang 这么快就不能用了

中文

Bryce@brycezhang·2 Eki

Sora2 邀请码，需要的朋友自取

中文

505

Bryce@brycezhang·28 Eyl

一个有意思的MBTI测试网站，和自己测的一样 -- INTJ ---- 确实，我脑内CPU都快烧了，现实进度条还在1%，这AI生成的INTJ头像算是把我内心戏给整明白了。👇🔗 mbti.youmind.com/brycezhang

中文

137

Bryce@brycezhang·29 Ağu

@sofish 蔚来蹲都蹲到换电站里面了🤦

中文

Bryce@brycezhang·23 Ağu

@vhvms78qvx @shengxj1 @bonniewinds0 确实，损人不利己只能用愚蠢来形容，碰到这种人要远离，可怜了ta的同事

中文

deepx@vhvms78qvx·23 Ağu

@shengxj1 @bonniewinds0 这种的只能祝福了，我对蠢货的定义很简单：做事既伤害自己，还有损他人利益，即便不是他们自己的选择🤣

中文

128

花果山大圣@shengxj1·22 Ağu

傻逼祝你天天半夜三点被人钉底层互害的工贼

bonniewinds@bonniewinds0

我经常钉别人，凌晨三点把领导钉醒起来审批活动流程。把12个产品经理叫在一起钉钉对线撕逼，一个活动页面需要三个功能，每个功能有3-4个相似赛马产品在竞争使用，12个产品经理能撕两个月没有任何进展。搞活动有时候会要用到100多个内部系统。和产品老大吵架，对方使用钉钉的撤销信息功能来销毁证据。不能怪我凶残，因为给到的都是不可能实现的任务，内耗这么多，做点事经常感觉自己在徒手钻木取火，在荒岛上建海市蜃楼。自己搞钱自己搞人搞资源，然后才能搞点事…大多时候都是你们这些容易有PTSD的同事在制造阻力。

中文

181

34.1K

Bryce@brycezhang·14 Ağu

Tried a small project with Kiro Agent, thought it’d be quick. After hours & 20k+ lines, hit the daily limit with only half done—lots of AI overthinking along the way.

Andrej Karpathy@karpathy

I'm noticing that due to (I think?) a lot of benchmarkmaxxing on long horizon tasks, LLMs are becoming a little too agentic by default, a little beyond my average use case. For example in coding, the models now tend to reason for a fairly long time, they have an inclination to start listing and grepping files all across the entire repo, they do repeated web searchers, they over-analyze and over-think little rare edge cases even in code that is knowingly incomplete and under active development, and often come back ~minutes later even for simple queries. This might make sense for long-running tasks but it's less of a good fit for more "in the loop" iterated development that I still do a lot of, or if I'm just looking for a quick spot check before running a script, just in case I got some indexing wrong or made some dumb error. So I find myself quite often stopping the LLMs with variations of "Stop, you're way overthinking this. Look at only this single file. Do not use any tools. Do not over-engineer", etc. Basically as the default starts to slowly creep into the "ultrathink" super agentic mode, I feel a need for the reverse, and more generally good ways to indicate or communicate intent / stakes, from "just have a quick look" all the way to "go off for 30 minutes, come back when absolutely certain".

English

Keşfet

@YanyuRensheng @dotey @llliulllll @sofish @elonmusk @BarackObama @taylorswift13 @cristiano