eddieran

6 posts

eddieran banner
eddieran

eddieran

@roshbuilder1

SRE&SDE | Stay curious. Stay humble. Move fast. This era is different. https://t.co/iAbEY1MDJZ

Singapore Katılım Nisan 2026
111 Takip Edilen2 Takipçiler
eddieran
eddieran@roshbuilder1·
阅读是一件需要专注的事情,大脑的构造天性就是偷懒。冗长的MARKDOWN文档,只适合AI去用,信息密度更高的HTML页面才是为人类服务的。在工作中,重点信息基本都已经通过HTML来输出了。
Thariq@trq212

x.com/i/article/2052…

中文
0
0
0
46
eddieran
eddieran@roshbuilder1·
即便有AI,想要打造一个工业级的产品,所需要付出的努力和时间也是很大的。真正效率的提升倍数,没有想象中那么高
中文
0
0
0
11
eddieran
eddieran@roshbuilder1·
A few things I noticed reading through these: It really does derive from first principles when told to. If you instruct it not to wave at "well-known" results, it won't — it'll re-derive modular inverses from 2·4 ≡ 1 mod 7 or prove the centroid-orthocenter identity on the fly. On hard problems, it reframes what's being asked. One olympiad I saw: given four unit complex numbers summing to zero, find max |∏ pairwise sums|. Opus recognized the product is *identically zero by antipodal-pair rigidity* — not an optimization to solve. That kind of move is the strongest evidence of actual understanding I saw. When it's wrong, it's usually one arithmetic slip inside a long, otherwise-correct chain. The judge caught 5 of these across 2,400 samples (0.2%). It also has a distinct teacher voice that emerges after enough reading: "Here's the whiteboard derivation", "The key move is...", "Setting x = 1+r collapses the problem to...". Less templated than you'd expect, and surprisingly patient.
English
1
0
0
28
eddieran
eddieran@roshbuilder1·
Spent a couple days pulling Opus 4.7's chain-of-thought out of hard STEM problems. 2,405 traces now up on #Huggingface The Anthropic API only returns *summarized* thinking on Opus 4.7 models. The Claude Code CLI streams the full think blocks inline — but even there, Opus sometimes goes into protective-reasoning mode and just returns the polished solution with no thinking shown. So this is specifically the filtered subset where full reasoning came through and passed an LLM-as-judge quality gate. Some numbers from the pull: • 6.7M tokens of Opus 4.7 thinking • think block: ~1,800 chars • 1,557 hard + 848 PhD-level problems • 99.7% judge pass rate • Sources: TheoremQA, MMLU-hard, GPQA, NuminaMath AIME+, MATH-500 lvl 4+
English
1
1
1
78