Bryce

97 posts

Bryce banner
Bryce

Bryce

@brycezhang

🖖

Earth Katılım Mart 2010
186 Takip Edilen13 Takipçiler
Bryce
Bryce@brycezhang·
为了保持一个支付架构问题的上下文一致,一直在一个会话里反复沟通,期间触发 Codex Context 压缩很多次,结果刚刚一看这一天干掉了13亿token,Pro 的用量也直接告急,以后再也不敢了
Bryce tweet media
中文
0
0
0
56
Bryce
Bryce@brycezhang·
从来没想过,实践Harness Engineering的最大动力,竟然是干眼症 基于Symphony实现的自动化工作流,最大痛点是只能执行边界清晰的任务。新的需求场景,必须先人工拆分需求子任务,才能投入这个自动化流程,最终还需要人工验收。整个过程最大的瓶颈就是人的部分 AI 可以不间断工作,我的眼睛先干废了☠️
中文
0
0
0
66
Bryce
Bryce@brycezhang·
Idea - PRD - Code 这个非常互联网标准的工作流已经过时。作为创业团队我们 PRD 只有一张示意原型图(Gemini),设计师美化后快速推进 Agent 编码实现(Codex/CC),集成测试没有明显bug即可上线。 影响速度的往往在人工参与的部分,AI在各个环节质量已经很高,重点已经是构建AI深度融合的工作流。
Harrison Chase@hwchase17

x.com/i/article/2031…

中文
0
0
0
112
Bryce
Bryce@brycezhang·
@YanyuRensheng 请问使用的什么工具构建的,目前看到OpenAI自己的symphony,但还处于preview阶段
中文
1
0
1
673
S Li
S Li@YanyuRensheng·
给GPT-5.4匹配一套比较合适的“harness”之后,他是我至今遇到过的极为少见的集优秀项目管理、软件工程和产品管理能力于一身的“人才”。 比如一个上百个模块的系统,只需要给他一个包含模块依赖的入口文件,他就能据此对系统展开合理推断,生成一种类似于系统蓝图的文字描述。那种洞察力真的让我感到震惊。
中文
6
2
60
17.7K
Bryce retweetledi
Chris the EVMist
Chris the EVMist@dev_at_EVMist·
AI should only help facilitate your learning. Never ask it to do something 100% for you that you are already capable of doing yourself. When you don't understand something, then you interrogate the AI as an opportunity to learn. You can be dependent on AI to teach you and improve yourself, you can't be dependent on AI to produce for you and achieve the same result... you will depreciate. I speculated and made this a while ago HOWEVER its being backed up more and more through social observation and peoples first-hand accounts and research studies.
Chris the EVMist tweet media
English
14
20
203
32.5K
Bryce
Bryce@brycezhang·
ChatGPT-5.4 Fast 果然够快,一个上午就干掉了接近2亿token🙃
Bryce tweet media
中文
0
0
0
49
Bryce retweetledi
Naval
Naval@naval·
Careers are dead. Jobs are dying. Opportunities arising.
English
1.5K
3.3K
37.7K
2.2M
Bryce
Bryce@brycezhang·
Codex 5.3 高强度用了一天,速度快如Claude,指令遵循和完成度却高出一大截,绝大部分任务一次性完成,而之前 5.2 太慢只能用来code review。只是打开用量一看,消耗 1.06 亿token...
Bryce tweet media
中文
0
0
1
83
Bryce
Bryce@brycezhang·
我的工程体验也类似:AI 可以非常高质量完成编码,但最终一定需要人工 review。不需要逐行 code review,一个有经验的工程师可以快速判断隐藏风险和设计意图是否被破坏。
Andrej Karpathy@karpathy

I tried to use it this way and basically failed, the models aren't at the level where they can productively iterate on nanochat in an open-ended way. (Though one of the primary motivations for me writing nanochat is that I'd very much love for it to be used this way as a benchmark for agents, and I'd love it if it worked over time). I'm open to this just being skill issue. E.g. here some of the things I'd be suspicious about: - the zoo of torch compile flags can knowingly be abused to get +1% gains but often at the cost of +30min compile time. This is why modded-nanogpt prohibits torch compile kwarg engineering and why I haven't done any in nanochat either. i wouldn't reliably expect the model to notice, consider, or flag this kind of an issue or seek clarification. - ns_steps=3 might be a tiny bit of speed, but does the model also volunteer to make sure quality doesn't fall too much? - same thing for deleting .float() cast - sure you can remove it and get VRAM/speed gains but it's there for a clear reason (extra precision in the loss function). Removing it means you absolutely have to make sure that the lower precision is ok validation loss wise, in a highly controlled experiment. Overall I'm still struggling with getting the models to do significantly more basic things. For example, Opus keeps incorrectly "cleaning up" my comments when it doesn't understand them even when it's completely unrelated to the task, rude! It keeps violating and ignoring CLAUDE .md instructions on coding style but when I ask, it correctly points out all the violations. I know, I'm supposed to be using some kind of a /cleanup. Yesterday it gave me a table of results and incorrectly reported which experiment worked best (the table showed xyz=20 was best and it incorrectly claimed that xyz=12 was). Basically - much simpler things still fail routinely than something open-ended like "improve nanochat". (I've been doing a lot of YELLING IN UPPER CASE and I think this could actually be a really good metric for A/B testing instead of the inline survey thing.). Still incredibly net useful with oversight and with clear, well-scoped tasks. I definitely haven't given up on automatic closed-loop experiments with the models. It would be so glorious. I had 2 iterations that basically didn't work but I have ideas for the 3rd.

中文
0
0
0
52
Bryce
Bryce@brycezhang·
@dotey 人是要来担责的,职业程序员就不可能让项目变成黑盒
中文
1
0
1
287
Bryce retweetledi
Matt Pocock
Matt Pocock@mattpocockuk·
Frontend is WAY harder for AI than backend. That's because it's flying blind. It can't test the code in the environment where it's running - the browser. Here's how to hook up AI to your browser:
English
152
189
2.9K
220.6K
Bryce
Bryce@brycezhang·
「继续」,看来我们中断的次数还挺多😂
Bryce tweet media
中文
0
0
0
28
Bryce
Bryce@brycezhang·
跟我的最近使用体感一致,3家性格各异,还是需要按场景混着用。 按需求选型建议: - 要 “一次性做全、做深、生产可用” → 优先 Claude Opus 4.5 - 要 “防御性强、兼容旧系统、自动补坑” → 优先 GPT-5.1 - 要 “严格按规格、少废话、最低成本” → 优先 Gemini 3.0
Tibo@thsottiaux

Codex team is working on a few experimental projects that are starting to shape up and I’m excited to share more about soon. But I’m curious, what would you like to see ship or improved by the end of the year other than better models?

中文
0
0
0
197
Bryce
Bryce@brycezhang·
Sora2 邀请码,需要的朋友自取
Bryce tweet media
中文
1
0
0
505
Bryce
Bryce@brycezhang·
一个有意思的MBTI测试网站,和自己测的一样 -- INTJ ---- 确实,我脑内CPU都快烧了,现实进度条还在1%,这AI生成的INTJ头像算是把我内心戏给整明白了。👇🔗 mbti.youmind.com/brycezhang
中文
0
0
1
137
Bryce
Bryce@brycezhang·
@sofish 蔚来蹲都蹲到换电站里面了🤦
中文
0
0
0
85
deepx
deepx@vhvms78qvx·
@shengxj1 @bonniewinds0 这种的只能祝福了,我对蠢货的定义很简单:做事既伤害自己,还有损他人利益,即便不是他们自己的选择🤣
中文
1
0
3
128
Bryce
Bryce@brycezhang·
Tried a small project with Kiro Agent, thought it’d be quick. After hours & 20k+ lines, hit the daily limit with only half done—lots of AI overthinking along the way.
Andrej Karpathy@karpathy

I'm noticing that due to (I think?) a lot of benchmarkmaxxing on long horizon tasks, LLMs are becoming a little too agentic by default, a little beyond my average use case. For example in coding, the models now tend to reason for a fairly long time, they have an inclination to start listing and grepping files all across the entire repo, they do repeated web searchers, they over-analyze and over-think little rare edge cases even in code that is knowingly incomplete and under active development, and often come back ~minutes later even for simple queries. This might make sense for long-running tasks but it's less of a good fit for more "in the loop" iterated development that I still do a lot of, or if I'm just looking for a quick spot check before running a script, just in case I got some indexing wrong or made some dumb error. So I find myself quite often stopping the LLMs with variations of "Stop, you're way overthinking this. Look at only this single file. Do not use any tools. Do not over-engineer", etc. Basically as the default starts to slowly creep into the "ultrathink" super agentic mode, I feel a need for the reverse, and more generally good ways to indicate or communicate intent / stakes, from "just have a quick look" all the way to "go off for 30 minutes, come back when absolutely certain".

English
0
0
0
66