Jerry the Martian

4.3K posts

Jerry the Martian banner
Jerry the Martian

Jerry the Martian

@jerry543

Mars | Currently tokenmaxxxing | @luminexio | ex @Amazon eng | Building https://t.co/2sfoQFLGoU

Beigetreten Haziran 2017
2.3K Folgt15.9K Follower
Angehefteter Tweet
Jerry the Martian
Jerry the Martian@jerry543·
"The real opportunity is not in picking the best model but instead in building a learning loop on top of models where human capital and token capital compound." Thats why i built Gitmoot, local-first control layer that sits on top of whatever model you use. Your agents optimize themselves through a human feedback loop based on @Microsoft's SkillOpt, the skill file evolves from scored runs instead of manual prompt tweaking. open source, already usable.
Satya Nadella@satyanadella

x.com/i/article/2065…

English
3
0
3
589
Jerry the Martian
Jerry the Martian@jerry543·
@Yif_Yang Hey Yifan, i noticed you liked my post on the SkillOpt fork i made that implements human feedback. It would be great to have a chat! x.com/jerry543/statu…
Jerry the Martian@jerry543

@StatsWire @akshay_pachaar I tried to brainstorm a little bit on this, since I have a repo with a RL prompt optimizer through human feedback based on @Microsoft’s Skillopt. It would be great if you could provide any feedback! @StatsWire github.com/jerryfane/gitm…

English
1
0
0
56
Yifan Yang
Yifan Yang@Yif_Yang·
Please try our code here: github.com/microsoft/Skil… 🚀 We are working to package SkillOpt into an easy-to-use optimization framework for agent learning, similar in spirit to MMDetection or Detectron for vision.
English
1
4
43
2.9K
Yifan Yang
Yifan Yang@Yif_Yang·
🚀 Introducing SkillOpt — an optimizer for agent skills. Instead of finetuning model weights, we treat a natural-language skill as a trainable external parameter. Think of it as deep learning for the frontier-model + agent era: learning rate, LR schedule, mini-batch, batch size, epoch, momentum — all in text-space optimization. SkillOpt enables stable, controllable skill updates through bounded edits, allowing the optimizer to summarize “gradient directions” from agent experience and continuously improve procedural capability. We evaluate SkillOpt across 6 benchmarks and 7 models, under both direct model calls and real agent execution loops with Codex + Claude Code. SkillOpt achieves best or tied-best results in 52/52 settings. Train the skill, not the model. 🛠️🤖 🌐 aka.ms/skillopt 📄 huggingface.co/papers/2605.23…
English
51
108
870
107.7K
Cirox
Cirox@CiroxEth·
@jerry543 Love the focus on compounding human and token capital. Gitmoot sounds like a smart step forward
English
1
0
1
11
Jerry the Martian
Jerry the Martian@jerry543·
"The real opportunity is not in picking the best model but instead in building a learning loop on top of models where human capital and token capital compound." Thats why i built Gitmoot, local-first control layer that sits on top of whatever model you use. Your agents optimize themselves through a human feedback loop based on @Microsoft's SkillOpt, the skill file evolves from scored runs instead of manual prompt tweaking. open source, already usable.
Satya Nadella@satyanadella

x.com/i/article/2065…

English
3
0
3
589
Jerry the Martian
Jerry the Martian@jerry543·
@StatsWire @akshay_pachaar To make this work we would need some kind of anchor though, like human labels, so that the judge itself, can improve during agent training. Not sure how that would work right now, but it looks like an interesting idea to explore.
English
1
0
1
33
Akshay 🚀
Akshay 🚀@akshay_pachaar·
Karpathy's prediction about RL is coming true now! He called reward functions unreliable and argued that a single reward number is too low-dimensional to teach an agent what "good" means for complex tasks. To solve this, Agents need a knowledge-guided review as a higher-dimensional feedback channel. Every major AI lab trains models with RL today (OpenAI, Anthropic, DeepSeek). And their key bottleneck has always been the reward functions. GRPO by DeepSeek worked well for math and code because the environment gave a binary signal. But for real agent tasks, someone still has to hand-code the scoring function. That takes days and breaks every time the pipeline changes. RULER (implemented in OpenPipe ART, 10k stars) addresses the exact problem Karpathy identified. The reward criteria are defined in plain English, and an LLM evaluates each trajectory against that description to provide feedback for training. I trained a Qwen3 1.4B agent that plays 2048 using GRPO with this exact workflow. In this case, the agent saw the board, picked a direction, and RULER evaluated the outcome, all from this natural language definition. You can see the full implementation on GitHub and try it yourself. Here's the ART Repo: github.com/OpenPipe/ART (don't forget to star it ⭐ ) Just like RLHF replaced manual rankings and GRPO replaced the critic model, natural language rewards are replacing hand-coded scoring functions. RL reward engineering is now prompt engineering. I wrote a full walkthrough on OpenPipe's ART, the agent RL trainer built on GRPO, including how RULER replaces manual reward engineering with automatic LLM-graded rewards. The article is quoted below.
Akshay 🚀@akshay_pachaar

x.com/i/article/2029…

English
33
138
930
153.7K
John Jumper
John Jumper@JohnJumperSci·
A bit of news: After nearly 9 years, I have decided to leave Google DeepMind and join Anthropic (after taking some time to recharge). I am incredibly grateful for my time at GDM. @demishassabis took a real chance letting me lead the AlphaFold team just six months after finishing my PhD, and the entire GDM team taught me so much about how to do great science. GDM is a special place, and I’ll still be excited to hear about what amazing things they discover next.
English
572
877
13.2K
5.2M
billy restey
billy restey@billyrestey·
my wife asked me why ppl would buy my art then immediately list it for what they purchased it for.. i laughed in her face she clearly doesn't understand how NFTs work..
English
15
0
77
3.1K
Thomas Trimoreau
Thomas Trimoreau@TTrimoreau·
I am a vibe coder scare me with 1 word
English
335
0
109
19.5K
Aryan
Aryan@aryanlabde·
i am a codex user, why should i switch to claude code?
English
64
1
46
4.8K
Stats Wire
Stats Wire@StatsWire·
Nice breakdown. One nuance worth adding: RULER's edge isn't just "rewards in plain English" it's that it scores trajectories *relatively* within a group rather than assigning an absolute number per rollout. That sidesteps a lot of the calibration drift you'd get from an LLM judge trying to output consistent absolute scores across episodes. The open question is whether it just moves the reward-hacking problem up a level. Hand-coded reward functions get gamed by agents finding edge cases; LLM judges can get gamed by agents producing trajectories that *look* good to a judge without actually solving the task (verbose explanations, confident-sounding wrong answers, etc). Curious if you saw any of that with the 2048 agent, or if the task was clean enough that it didn't show up. Either way, "reward engineering is now prompt engineering" is the right framing just means the prompt becomes the new attack surface.
English
1
0
3
473
Jerry the Martian
Jerry the Martian@jerry543·
@akshay_pachaar How do you make sure that the RULER/Judge LLM is actually grading the proposals properly? The judge itself looks like it would require a prompt optimization just for this
English
0
0
1
304
Jerry the Martian
Jerry the Martian@jerry543·
This is the equivalent of Fable getting rereleased, but anime version
Jerry the Martian tweet media
English
0
0
1
111
Peter Yang
Peter Yang@petergyang·
So I have Codex running on a /goal and it's been working for 2 hours but the problem is it's making alot of wrong assumptions so I have to monitor and steer it constantly. Is this expected? Perhaps I should've had it make a detailed plan first?
English
237
8
473
123.5K
herdr
herdr@herdrdev·
still leaving your laptop open so the agent doesn't die? still hand-rolling tmux + ssh + notifications? still can't check on it from your phone? you don't have to. try herdr.dev
English
12
9
266
20.9K
Tyler
Tyler@rezoundous·
Is midjourney still a thing?
English
22
0
26
2.1K
Matthew Schrager
Matthew Schrager@MatthewSchrager·
My current workflow is a /grill-to-goal skill based on @mattpocockuk’s /grill-with-docs that basically interviews you to produce detailed documentation about your feature, with clear acceptance criteria etc., along with a goal-ready prompt that references that documentation. Then just call /goal with that prompt. Works very nicely in my experience.
English
10
4
183
9.5K
路飞 🏴‍☠️ AI 研究员🧐
🚨 Netflix 花 500 万美元制作一集动画。 一个 21 岁的人上个月花了 124 美元,收入 12,345 美元。 他用的是这套流程: → Claude 写脚本:10 分钟 → Midjourney 出画面:20 分钟 → Runway 做动态:15 分钟 → ElevenLabs 配音:10 分钟 → Suno 作曲:5 分钟 → 发布全平台:自动完成 一小时一集。六个工具。一个人。 睡着了,钱还在进来。
中文
58
39
249
25.9K
Unreal Engine
Unreal Engine@UnrealEngine·
Unreal Engine 5.8 ships today with experimental MCP server support: Your sources, your pipeline and your workflow—simply configure the MCP plugin and connect to any agent. Get familiar with the MCP server and the PCG Primitive Plugin today and see what teams can build together: epic.gm/ue-5-8-blog
English
232
896
7.3K
2.9M