Jerry the Martian (@jerry543) - Twitter-Profil | Zamantika Mersobahis Locabet

Angehefteter Tweet

"The real opportunity is not in picking the best model but instead in building a learning loop on top of models where human capital and token capital compound." Thats why i built Gitmoot, local-first control layer that sits on top of whatever model you use. Your agents optimize themselves through a human feedback loop based on @Microsoft's SkillOpt, the skill file evolves from scored runs instead of manual prompt tweaking. open source, already usable.

Satya Nadella@satyanadella

x.com/i/article/2065…

English

3

0

3

589

Jerry the Martian@jerry543·11h

@Yif_Yang Hey Yifan, i noticed you liked my post on the SkillOpt fork i made that implements human feedback. It would be great to have a chat! x.com/jerry543/statu…

Jerry the Martian@jerry543

@StatsWire @akshay_pachaar I tried to brainstorm a little bit on this, since I have a repo with a RL prompt optimizer through human feedback based on @Microsoft’s Skillopt. It would be great if you could provide any feedback! @StatsWire github.com/jerryfane/gitm…

English

1

0

56

Yifan Yang@Yif_Yang·25 May

Please try our code here: github.com/microsoft/Skil… 🚀 We are working to package SkillOpt into an easy-to-use optimization framework for agent learning, similar in spirit to MMDetection or Detectron for vision.

English

1

4

43

2.9K

Yifan Yang@Yif_Yang·25 May

🚀 Introducing SkillOpt — an optimizer for agent skills. Instead of finetuning model weights, we treat a natural-language skill as a trainable external parameter. Think of it as deep learning for the frontier-model + agent era: learning rate, LR schedule, mini-batch, batch size, epoch, momentum — all in text-space optimization. SkillOpt enables stable, controllable skill updates through bounded edits, allowing the optimizer to summarize “gradient directions” from agent experience and continuously improve procedural capability. We evaluate SkillOpt across 6 benchmarks and 7 models, under both direct model calls and real agent execution loops with Codex + Claude Code. SkillOpt achieves best or tied-best results in 52/52 settings. Train the skill, not the model. 🛠️🤖 🌐 aka.ms/skillopt 📄 huggingface.co/papers/2605.23…

English

51

108

870

107.7K

Jerry the Martian@jerry543·23h

@CiroxEth Thanks ! Let me know if you try it

English

0

1

Cirox@CiroxEth·1d

@jerry543 Love the focus on compounding human and token capital. Gitmoot sounds like a smart step forward

English

1

0

1

11

Jerry the Martian@jerry543·5d

"The real opportunity is not in picking the best model but instead in building a learning loop on top of models where human capital and token capital compound." Thats why i built Gitmoot, local-first control layer that sits on top of whatever model you use. Your agents optimize themselves through a human feedback loop based on @Microsoft's SkillOpt, the skill file evolves from scored runs instead of manual prompt tweaking. open source, already usable.

Satya Nadella@satyanadella

x.com/i/article/2065…

English

3

0

3

589

Jerry the Martian@jerry543·1d

@StatsWire @akshay_pachaar I tried to brainstorm a little bit on this, since I have a repo with a RL prompt optimizer through human feedback based on @Microsoft’s Skillopt. It would be great if you could provide any feedback! @StatsWire github.com/jerryfane/gitm…

English

0

1

81

Jerry the Martian@jerry543·1d

@StatsWire @akshay_pachaar To make this work we would need some kind of anchor though, like human labels, so that the judge itself, can improve during agent training. Not sure how that would work right now, but it looks like an interesting idea to explore.

English

1

0

1

33

Akshay 🚀@akshay_pachaar·1d

Karpathy's prediction about RL is coming true now! He called reward functions unreliable and argued that a single reward number is too low-dimensional to teach an agent what "good" means for complex tasks. To solve this, Agents need a knowledge-guided review as a higher-dimensional feedback channel. Every major AI lab trains models with RL today (OpenAI, Anthropic, DeepSeek). And their key bottleneck has always been the reward functions. GRPO by DeepSeek worked well for math and code because the environment gave a binary signal. But for real agent tasks, someone still has to hand-code the scoring function. That takes days and breaks every time the pipeline changes. RULER (implemented in OpenPipe ART, 10k stars) addresses the exact problem Karpathy identified. The reward criteria are defined in plain English, and an LLM evaluates each trajectory against that description to provide feedback for training. I trained a Qwen3 1.4B agent that plays 2048 using GRPO with this exact workflow. In this case, the agent saw the board, picked a direction, and RULER evaluated the outcome, all from this natural language definition. You can see the full implementation on GitHub and try it yourself. Here's the ART Repo: github.com/OpenPipe/ART (don't forget to star it ⭐ ) Just like RLHF replaced manual rankings and GRPO replaced the critic model, natural language rewards are replacing hand-coded scoring functions. RL reward engineering is now prompt engineering. I wrote a full walkthrough on OpenPipe's ART, the agent RL trainer built on GRPO, including how RULER replaces manual reward engineering with automatic LLM-graded rewards. The article is quoted below.

Akshay 🚀@akshay_pachaar

x.com/i/article/2029…

English

33

138

930

153.7K

Jerry the Martian@jerry543·1d

@JohnJumperSci @demishassabis So Claude > Gemini ?

English

0

1

526

John Jumper@JohnJumperSci·1d

A bit of news: After nearly 9 years, I have decided to leave Google DeepMind and join Anthropic (after taking some time to recharge). I am incredibly grateful for my time at GDM. @demishassabis took a real chance letting me lead the AlphaFold team just six months after finishing my PhD, and the entire GDM team taught me so much about how to do great science. GDM is a special place, and I’ll still be excited to hear about what amazing things they discover next.

English

572

877

13.2K

5.2M

Jerry the Martian@jerry543·1d

@billyrestey What about buying it and listing for less than the buy price

English

1

0

4

154

billy restey@billyrestey·1d

my wife asked me why ppl would buy my art then immediately list it for what they purchased it for.. i laughed in her face she clearly doesn't understand how NFTs work..

English

15

0

77

3.1K

Jerry the Martian@jerry543·1d

@TTrimoreau Fableprivation

English

0

64

Thomas Trimoreau@TTrimoreau·1d

I am a vibe coder scare me with 1 word

English

335

0

109

19.5K

Jerry the Martian@jerry543·1d

@aryanlabde Ultracode + workflows is great

English

0

106

Aryan@aryanlabde·1d

i am a codex user, why should i switch to claude code?

English

64

1

46

4.8K

Jerry the Martian@jerry543·1d

@StatsWire @akshay_pachaar Interesting, do you have any actual real examples where the agents managed to game their LLM judge?

English

1

0

1

30

Stats Wire@StatsWire·1d

Nice breakdown. One nuance worth adding: RULER's edge isn't just "rewards in plain English" it's that it scores trajectories *relatively* within a group rather than assigning an absolute number per rollout. That sidesteps a lot of the calibration drift you'd get from an LLM judge trying to output consistent absolute scores across episodes. The open question is whether it just moves the reward-hacking problem up a level. Hand-coded reward functions get gamed by agents finding edge cases; LLM judges can get gamed by agents producing trajectories that *look* good to a judge without actually solving the task (verbose explanations, confident-sounding wrong answers, etc). Curious if you saw any of that with the 2048 agent, or if the task was clean enough that it didn't show up. Either way, "reward engineering is now prompt engineering" is the right framing just means the prompt becomes the new attack surface.

English

1

0

3

473

Jerry the Martian@jerry543·1d

@akshay_pachaar How do you make sure that the RULER/Judge LLM is actually grading the proposals properly? The judge itself looks like it would require a prompt optimization just for this

English

0

1

304

Jerry the Martian@jerry543·1d

This is the equivalent of Fable getting rereleased, but anime version

English

0

1

111

Jerry the Martian@jerry543·1d

@MatthewSchrager @petergyang @mattpocockuk This looks great! thanks for sharing. It's actually really similar to my current implementation flow, but consolidated into a skill

English

0

1

23

Matthew Schrager@MatthewSchrager·2d

@jerry543 @petergyang @mattpocockuk Ok, had to tweak it a bit for public consumption, here's a super quick rough cut, let me know if there are any issues. github.com/matthewschrage…

English

1

3

54

Peter Yang@petergyang·2d

So I have Codex running on a /goal and it's been working for 2 hours but the problem is it's making alot of wrong assumptions so I have to monitor and steer it constantly. Is this expected? Perhaps I should've had it make a detailed plan first?

English

237

8

473

123.5K

Jerry the Martian@jerry543·2d

@herdrdev Let’s gooo herdrrrrrr

English

0

1

511

herdr@herdrdev·2d

still leaving your laptop open so the agent doesn't die? still hand-rolling tmux + ssh + notifications? still can't check on it from your phone? you don't have to. try herdr.dev

English

12

9

266

20.9K

Jerry the Martian@jerry543·2d

@rezoundous For body scans apparently it's great

English

0

112

Tyler@rezoundous·2d

Is midjourney still a thing?

English

22

0

26

2.1K

Jerry the Martian@jerry543·2d

I'm still coping

English

0

91

Jerry the Martian@jerry543·2d

@MatthewSchrager @petergyang @mattpocockuk Is there a github repo for this ?

English

1

0

270

Matthew Schrager@MatthewSchrager·2d

My current workflow is a /grill-to-goal skill based on @mattpocockuk’s /grill-with-docs that basically interviews you to produce detailed documentation about your feature, with clear acceptance criteria etc., along with a goal-ready prompt that references that documentation. Then just call /goal with that prompt. Works very nicely in my experience.

English

10

4

183

9.5K

Jerry the Martian@jerry543·2d

@petergyang I have mine running for nearly 2 weeks with great assumptions and work x.com/jerry543/statu…

Jerry the Martian@jerry543

x.com/i/article/2063…

English

0

193

Jerry the Martian@jerry543·2d

@0xluffy_eth Episodes link ?

English

0

1

271

路飞 🏴‍☠️ AI 研究员🧐@0xluffy_eth·2d

🚨 Netflix 花 500 万美元制作一集动画。一个 21 岁的人上个月花了 124 美元，收入 12,345 美元。他用的是这套流程： → Claude 写脚本：10 分钟 → Midjourney 出画面：20 分钟 → Runway 做动态：15 分钟 → ElevenLabs 配音：10 分钟 → Suno 作曲：5 分钟 → 发布全平台：自动完成一小时一集。六个工具。一个人。睡着了，钱还在进来。

中文

58

39

249

25.9K

Jerry the Martian@jerry543·2d

@diegocabezas01 @UnrealEngine Engine

English

1

0

1

9

Diego | AI 🚀 - e/acc@diegocabezas01·2d

@UnrealEngine Imagine this with Fable 5, it will be Unreal

English

1

0

10

249

Unreal Engine@UnrealEngine·3d

Unreal Engine 5.8 ships today with experimental MCP server support: Your sources, your pipeline and your workflow—simply configure the MCP plugin and connect to any agent. Get familiar with the MCP server and the PCG Primitive Plugin today and see what teams can build together: epic.gm/ue-5-8-blog

English

232

896

7.3K

2.9M

Jerry the Martian

Entdecken