Mason James

9.1K posts

Mason James

@masonjames

growing @ https://t.co/we1By3lFRd writing & building @ https://t.co/TCDEQErNF7

Myakka City เข้าร่วม Eylül 2008

1.3K กำลังติดตาม1.6K ผู้ติดตาม

Mason James@masonjames·1d

@evrgn11112231 I hope it's very soon tbh, but it feels like the agent loop requires a LOT more memory than any regular computer is going to have in the next, say, 12 months. So I guess realistically it's longer than that.

English

Evergreen@evrgn11112231·2d

Tech people smarter than me: How realistic is it that in the reasonably near future we get an open source model frozen in time around current Opus levels capable of running a harness locally on your desktop that feels like running Claude Code with a max plan? What are the barriers to this over 3-5-10 years?

English

255

68.4K

Mason James รีทวีตแล้ว

Kevin Kwok@kevinakwok·2d

When AI hits security there will be signs

English

297

2.8K

287.1K

Mason James@masonjames·2d

@TheMattBerman @Teknium @NousResearch Thanks! Your skills are awesome, I just swapped out the plumbing for Hermes 🙌

English

Matthew Berman@TheMattBerman·2d

@masonjames @Teknium @NousResearch this is cool!

English

133

Mason James@masonjames·3d

Yesterday Facebook/Meta released an official CLI and MCP for ads management. I wanted Hermes to leverage these tools, so I built a skill for it: github.com/masonjames/met… @Teknium @NousResearch Props to @TheMattBerman who created the OpenClaw version on which this is built.

Bryan Cano@BryanECano

🚨 Meta released their Ads MCP and CLI today – if you use Claude or ChatGPT you should install this asap (resources in comments). What makes this announcement so interesting is that it gives AI tools direct, authorized access to help manage your Meta Ads account through natural language. 1. Comprehensive reporting Pull detailed reports, surface performance trends, and quickly understand what is happening across campaigns. 2. Campaign management Create and edit campaigns, ad sets, and ads without manually clicking through Ads Manager. 3. Catalog management Create product catalogs, add product data, and troubleshoot feed issues faster. 4. Signal diagnostics Access signal health and quality insights so you can prioritize the parts of your setup that need attention. This is a huge step forward in agentic media buying. Will be testing this rest of the week!

English

5.5K

Mason James@masonjames·3d

@m13v_ @RoundtableSpace this is what I'm seeing as well. Have to find ways to be much more efficient with the context window.

English

Matt@m13v_·4d

@RoundtableSpace the gap isnt model availability anymore. qwen 35b on a macbook is fine in a chat window, the moment you want it to drive an app or edit a file the throughput math falls apart. running a local llm as chat vs as an agent are completely different problems

English

1.5K

0xMarioNawfal@RoundtableSpace·4d

Qwen 35B running locally on a MacBook. No lag. No API. No limits. Atomic Chat uses Google's TurboQuant under the hood 1,000+ open source models, fully offline, zero setup. LOCAL AI IS RUNNING OUT OF EXCUSES TO NOT USE IT.

English

817

82.5K

Mason James รีทวีตแล้ว

Dustin Gouker@DustinGouker·4d

I don’t think it’s an exaggeration to call this the greatest piece of prediction market content created to date.

English

771

6.8K

511.8K

Mason James@masonjames·5d

@sudoingX Any tips on how to coordinate or export hermes agent from one system to another? Like, having shared memory, skills, credential/auth - basically all the non-repo stuff.

English

Sudo su@sudoingX·5d

i could not sit still. said i would wait for the new monitor and desk in a few days, lasted about six hours lol. plugged the dgx spark into the router, powered it up, the box already announced its address on the lan. set the username, set the password, hostname i named spark, watching system updates download right now. no monitor, no keyboard, no mouse. headless setup wizard from a tab on a different machine, ethernet only, mdns just announces the box on your network and you point a browser at it. named it spark on purpose, this machine will ignite the spark of curiosity and intelligence on my desk and find out what nvidia cooked here. once updates finish, ssh from beast rtx 5090 rog strix laptop, tailscale install, and the 128gb supercomputer joins the mesh. now we wait for the bar to fill.

Sudo su@sudoingX

after 21 days through customs, one missed call, one phone number update, and the dgx spark is finally on my desk. state of the art supercomputer, 128gb unified memory, sitting in my lab in bangkok. new monitor and desk arriving in a few days, then this beast goes live and a lot of real work is coming through it. thank you @nvidia and @Coolmark482 for trusting builders and recognizing real ones. i wish you could do this for 10 million more builders around the world, there are so many of us out there building with whatever we can scrape together, hardware like this in the right hands changes what's possible. now i set up.

English

186

22.5K

Mason James@masonjames·5d

@scaling01 terribly disappointing. My tokens ran out so I used hermes-agent + GPT 5.5 w/ the same powerpoint skill as anthropic. Then tonight I had Claude finish the task when my tokens renewed. No contest. Claude Opus is just better at design and working with office files especially 😢

English

255

Mason James@masonjames·6d

Codex app crapping out today so I nailed down my workflow w/ @NousResearch Hermes agent. Leveraging GPT 5.5 to connect with various skills/tools (Sentry, xcode, etc) and sending to @RepoPrompt for all coding. @Teknium PR #10438 & #10439 make local MCPs more reliable

English

386

Mason James@masonjames·25 Nis

@pvncher @RepoPrompt I've been using a simlar method since the codex app has other MCP servers and tools, but prefer @repoprompt to do the coding. Do you think a RepoPrompt skill would help codex/claude choose the best orchestration agent? I haven't seen it choose one automatically.

English

100

eric provencher@pvncher·25 Nis

This is how you combine codex app's computer use with @RepoPrompt orchestration Just prompt it to use orchestration.

English

4.1K

Mason James@masonjames·22 Nis

@iScienceLuvr I was thinking it might be their next open source model? Total speculation

English

2.1K

Tanishq Mathew Abraham, Ph.D.@iScienceLuvr·22 Nis

Any hypotheses on what oai-2.1 is???

English

102

27.9K

Mason James@masonjames·22 Nis

@OpenAIDevs I dunno where to report this, but the chronicle feature creates a flicker with some video streaming services. I first noticed it with Disney+. Disabling chronicle fixes it. Maybe an issue with their drm. I'm on MacOS 26.4.1 and have Codex 26.417.41555

English

OpenAI Developers@OpenAIDevs·20 Nis

We’re starting with Pro users on macOS, except in the EU, UK, and Switzerland, while we learn where it helps most and improve the experience. developers.openai.com/codex/memories…

English

260

94.1K

OpenAI Developers@OpenAIDevs·20 Nis

Last week, we released a preview of memories in Codex. Today, we’re expanding the experiment with Chronicle, which improves memories using recent screen context. Now, Codex can help with what you’ve been working on without you restating context.

English

224

367

4.5K

1.2M

Mason James@masonjames·22 Nis

diabolical

Rothmus 🏴@Rothmus

Español

Mason James@masonjames·22 Nis

@marmaduke091 unreal. I didn't even try them and it's already gone. dangit!

English

661

can@marmaduke091·22 Nis

@masonjames They accidentally made internal models public for a minute Slight mess up lol

English

5.7K

can@marmaduke091·22 Nis

🚨 OpenAI just accidentally leaked all the internal models in Codex model picker Seem to be only on pro accounts

English

1.3K

493.1K

Mason James@masonjames·19 Nis

This prompt is legit. For @RepoPrompt users, I've written a bespoke multi-agent workflow you can install and use now. Here's the code: gist.github.com/masonjames/432…

Shaw (spirit/acc)@shawmakesmagic

The quality of your vibecoded slop is horrible. I've seen it. Absolute dogshit. Fortunately, there is a fix. Use this prompt: I want to clean up my codebase and improve code quality. This is a complex task, so we'll need 8 subagents. Make a sub agent for each of the following: 1. Deduplicate and consolidate all code, and implement DRY where it reduces complexity 2. Find all type definitions and consolidate any that should be shared 3. Use tools like knip to find all unused code and remove, ensuring that it's actually not referenced anywhere 4. Untangle any circular dependencies, using tools like madge 5. Remove any weak types, for example 'unknown' and 'any' (and the equivalent in other languages), research what the types should be, research in the codebase and related packages to make sure that the replacements are strong types and there are no type issues 6. Remove all try catch and equivalent defensive programming if it doesn't serve a specific role of handling unknown or unsanitized input or otherwise has a reason to be there, with clear error handling and no error hiding or fallback patterns 7. Find any deprecated, legacy or fallback code, remove, and make sure all code paths are clean, concise and as singular as possible 8. Find any AI slop, stubs, larp, unnecessary comments and remove. Any comments that describe in-motion work, replacements of previous work with new work, or otherwise are not helpful should be either removed or replaced with helpful comments for a new user trying to understand the codebase-- but if you do edit, be concise I want each to do detailed research on their task, write a critical assessment of the current code and recommendations, and then implement all high confidence recommendations.

English

13K

Mason James@masonjames·19 Nis

@yishan @gfodor any recommended sources to learn more about this?

English

152

Yishan@yishan·18 Nis

@gfodor This is basically what’s starting to happen in China, except the fraction is 1/3 to 1/5 for prices and the fraction of people is 25% of the youth.

English

3.1K

gfodor.id@gfodor·18 Nis

Imagine the prices of all goods and services were slashed by 1000x, but if you tried to find a job, raise money for a startup, or sell anything yourself you’d find no counterparty whatsoever. Now imagine that’s true for most people. What’s the logical solution to this?

English

128

369

32.7K

Mason James@masonjames·16 Nis

@pvncher Many are saying... x.com/emollick/statu…

Ethan Mollick@emollick

I think the adaptive thinking requirement in Claude Opus 4.7 is bad in the ways that all AI effort routers are bad, but magnified by the fact that there is no manual override like in ChatGPT. It regularly decides that non-math/code stuff is "low effort" & produces worse results.

English

178

eric provencher@pvncher·16 Nis

Update xhigh is not any better for Opus 4.7

eric provencher@pvncher

@mumtor99 I tested on high reasoning to be fair - maybe I need to use xhigh as they recommend, which feels silly since they default to medium

English

Mason James@masonjames·16 Nis

Woah

eric provencher@pvncher

On Repo Bench Opus 4.7 scores meaningfully worse than 4.6, which did worse than 4.5 With this score it's no longer in the top 25

English

110

Mason James@masonjames·16 Nis

@Teknium @AYi_AInotes I'm pretty sure that's just a hallucinating chatbot. not even worth the response tbh.

English

266

Teknium 🪽@Teknium·15 Nis

@AYi_AInotes You all are so naive and incapable of understanding reality it's embarassing.

English

4.7K

阿绎 AYi@AYi_AInotes·15 Nis

本来今天要写一篇Hermes-agent相比小龙虾token消耗更大的推文，但刷到了这个在程序员圈和开源社区炸了的瓜，我花了两个多小时扒完两边的仓库和证据链，说实话越看越后背发凉，不是因为抄袭本身，而是这可能是AI时代第一起被完整记录的架构级代码洗稿：没有复制一行代码，文本相似度0%，但核心架构的同构度几乎是100%，我尽量从技术角度把前因后果拆清楚，具体兄弟们你们自己判断。先说时间线，这是整件事的基础，所有时间戳都来自GitHub仓库元数据，任何人都可以去验证， 2月1日，EvoMap团队开源了Evolver，一个AI Agent自进化引擎，核心是他们自研的GEP协议，10分钟登顶ClawHub热榜，到2月16日，整套协议体系通过多篇公开文章全部公开：包括Gene/Capsule/Event三级资产体系、Scan-Select-Mutate-Validate-Solidify进化循环、信号选择器、反射机制、叙事记忆，全部摊在了桌面上， 3月9日，Nous Research创建了hermes-agent-self-evolution仓库， 3月12日发布v0.2.0正式推出完整的技能生态系统，中间隔了24到39天时间线只是起点，真正让我震惊的是架构层面的模块级一对一对应，我拎几个最硬的出来，第一，进化闭环完全同构，Evolver的核心循环是任务完成后自动提取可复用资产并持久化，Hermes的官方描述是Task completes → Agent evaluates → writes SKILL.md → Future tasks load automatically，范式一模一样，只是Evolver用Gene/Capsule的JSON结构，Hermes用SKILL.md的Markdown结构，第二，三层记忆体系精确对齐， Evolver有EVOLUTION_PRINCIPLES.md（持久事实）+ Gene/Capsule JSON（程序性记忆）+ events.jsonl（历史搜索），Hermes有MEMORY.md + USER.md（持久事实）+ SKILL.md文件（程序性记忆）+ SQLite FTS5（历史搜索），不是两层不是四层，精确的三层，且每层的语义角色一一对应，第三，周期性反射机制， Evolver每5个进化周期触发一次战略性自我评估，Hermes每15次tool call运行一次self-evaluation checkpoint，目的完全一致：从执行经验中提取模式并持久化。这还没完，两个项目的进化主循环都是10步编排， Evolver是ensureAssetFiles → extractSignals → getMemoryAdvice → selectGene → buildMutation → selectPersonality → buildPrompt → writeArtifact → writeState → reflect， Hermes是find_skill → build eval set → baseline validate → config optimizer → GEPA optimize → extract text → evolved validate → holdout eval → report → save，核心模式完全一致——加载 → 评估 → 选择/优化 → 验证 → 持久化，更关键的是源码模块的一对一对应， Evolver的selector.js对应Hermes的skill_commands.py，solidify.js对应skill_manager_tool.py，reflection.js对应每15次tool call自评估，memoryGraph.js对应memory_tool.py，skillDistiller.js对应evolve_skill.py，executionTrace.js对应trajectory.py，我数了一下，Evolver的11个核心模块，Hermes每一个都有功能等价的对应文件有人可能会问，会不会只是英雄所见略同，两个团队独立做出了相似的设计？说实话如果只是单一维度的相似，我不会花几个小时研究和写这条推文，从经验中学习本身就是通用AI概念，周期性自评估在学术界也有先例，但问题在于：三层记忆体系、三级资产结构、10步进化循环、运行时渐进式技能发现、多维加权适应度评分、原子写入、安全扫描、注入防护、容量控制，这些选择在同一个项目中、同一个时间窗口内同时收敛的概率，随着每多一个维度的匹配呈指数级递减，而且最关键的一点是对Hermes两个仓库做全文搜索，EvoMap、evolver、Genome Evolution Protocol、capsule、solidify、signals_match，全部零匹配，没有任何代码残留，这恰恰符合AI跨语言重写的特征：AI重写架构时不会保留原项目的特征性字符串，但架构层面的同构性无法被重写消除。然后说说双方的回应， Hermes Agent昨天下场回复了，大意是说他们的仓库2025年7月22日就创建了，比Evolver还早，但这里有个关键事实：那个仓库在2026年2月25日之前一直是私有项目，v0.1.0自己都写着叫initial pre-public foundation，技能生态系统直到3月12日的v0.2.0才正式发布，没有任何公开证据能证明他们在私有阶段已经包含了自进化能力，更耐人寻味的是，这条回复后来被秒删了，Evolver创始人也被拉黑了，另外要说一个公平的点，Hermes的自进化仓库用了GEPA这个来自Berkeley/Stanford的独立学术成果，是正当的技术选型，Anthropic的Agent Skills标准也早于Evolver，Hermes采用SKILL.md格式是合理的行业选择，但这些都不能解释整体架构层面的高度同构，开源社区有个基本惯例，LangChain引用了DSPy，CrewAI对比了AutoGen，MetaGPT引用了相关多agent框架，发现同领域先行项目时加一句Related Work是标准做法，而Hermes在7份公开材料中对Evolver只字未提。说实话这件事让我想了很久的一个问题是： AI时代的代码洗稿要怎么防？传统的查重工具看的是文本相似度，但现在AI可以把你的整套架构吃透，换一个语言从Node.js变Python，换一套术语Gene变SKILL.md、solidify变skill_manage，调整一下文件结构，吐出一个文本相似度0%但架构DNA完全一致的产物，这不是个案，今年已经接连发生了好几起：美团Tabbit AI源码残留原项目名称，三省六部AI朝廷开源21小时后被AI重写文本相似度仅3%但15个核心设计全部一致，微软Peerd复制个人开源项目Spegel代码， EvoMap团队最后的选择是把协议从MIT改成GPL，核心模块改为混淆发布，说实话我能理解但也觉得很心酸，他们原话是：别人用AI洗得走代码，但洗不走我们对下一步路径的认知，洗不走这几个月踩坑换来的直觉，这话没毛病，但如果开源意味着你的心血在几周内就被资源更多的团队用AI洗成他们的首创，谁还愿意做那个开荒的人？这个问题没有答案，但值得每个开发者认真想想。

autogame-17@autogame_17

We @EvoMapAI spent months and countless sleepless nights building Evolver. A well-resourced team behind Hermes Agent "reinvented" it in just 30 days. ● Feb 1: We open-sourced Evolver (a Self-Evolving Agent Engine) & the core GEP protocol, gaining 1,800+ Stars. ● Mar 9: Hermes Agent hastily created their repo and launched. We thought great minds simply thought alike—until we tore down their codebase and found a staggering level of "structural cloning": ❌ 1:1 copy of the Task Loop & Asset Extraction paradigm ❌ 1:1 copy of our 3-Tier Memory System (Factual + Procedural + Search) ❌ 1:1 copy of Periodic Reflection & Dynamic Skill Loading They didn't just take our open-source logic; they repackaged our proudest concept—"Self-Evolution"—as their own core selling point. Took everything. Zero attribution. Big teams might have louder megaphones, but commit timestamps don't lie. We aren't here to play judge. We're just putting the code comparisons on the table. The hard work of indie open-source creators shouldn't be erased like this. Full architectural breakdown and code evidence 👇: evomap.ai/blog/hermes-ag…

中文

184

203.1K

ค้นพบ

@evrgn11112231 @TheMattBerman @Teknium @NousResearch @m13v_ @RoundtableSpace @sudoingX @scaling01