eddieran

@roshbuilder1

SRE&SDE | Stay curious. Stay humble. Move fast. This era is different. https://t.co/iAbEY1MDJZ

Singapore Katılım Nisan 2026

111 Takip Edilen2 Takipçiler

eddieran@roshbuilder1·10 May

今天让我的@NousResearch 整理Obsidian的Knowledge Base，已经以后让它优先生成可交互的HTML页面，集成到 Obsidian里，可以极大的提高阅读的效率。

eddieran@roshbuilder1

阅读是一件需要专注的事情，大脑的构造天性就是偷懒。冗长的MARKDOWN文档，只适合AI去用，信息密度更高的HTML页面才是为人类服务的。在工作中，重点信息基本都已经通过HTML来输出了。

中文

eddieran@roshbuilder1·10 May

Thariq@trq212

x.com/i/article/2052…

中文

eddieran@roshbuilder1·6 May

即便有AI，想要打造一个工业级的产品，所需要付出的努力和时间也是很大的。真正效率的提升倍数，没有想象中那么高

中文

eddieran@roshbuilder1·21 Nis

huggingface.co/datasets/eddie… #HuggingFace #ClaudeCode

QME

eddieran@roshbuilder1·21 Nis

A few things I noticed reading through these: It really does derive from first principles when told to. If you instruct it not to wave at "well-known" results, it won't — it'll re-derive modular inverses from 2·4 ≡ 1 mod 7 or prove the centroid-orthocenter identity on the fly. On hard problems, it reframes what's being asked. One olympiad I saw: given four unit complex numbers summing to zero, find max |∏ pairwise sums|. Opus recognized the product is *identically zero by antipodal-pair rigidity* — not an optimization to solve. That kind of move is the strongest evidence of actual understanding I saw. When it's wrong, it's usually one arithmetic slip inside a long, otherwise-correct chain. The judge caught 5 of these across 2,400 samples (0.2%). It also has a distinct teacher voice that emerges after enough reading: "Here's the whiteboard derivation", "The key move is...", "Setting x = 1+r collapses the problem to...". Less templated than you'd expect, and surprisingly patient.

English

eddieran@roshbuilder1·21 Nis

Spent a couple days pulling Opus 4.7's chain-of-thought out of hard STEM problems. 2,405 traces now up on #Huggingface The Anthropic API only returns *summarized* thinking on Opus 4.7 models. The Claude Code CLI streams the full think blocks inline — but even there, Opus sometimes goes into protective-reasoning mode and just returns the polished solution with no thinking shown. So this is specifically the filtered subset where full reasoning came through and passed an LLM-as-judge quality gate. Some numbers from the pull: • 6.7M tokens of Opus 4.7 thinking • think block: ~1,800 chars • 1,557 hard + 848 PhD-level problems • 99.7% judge pass rate • Sources: TheoremQA, MMLU-hard, GPQA, NuminaMath AIME+, MATH-500 lvl 4+

English

Keşfet

@NousResearch @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine