OpenReward

16 posts

OpenReward banner
OpenReward

OpenReward

@OpenReward

where machines get reward. built by @genreasoning.

Katılım Ocak 2026
5 Takip Edilen49 Takipçiler
OpenReward retweetledi
Shashwat Goel
Shashwat Goel@ShashwatGoel7·
Cool idea from @AashaySachdeva: unified environment interfaces like @OpenReward can enable LLM meta-learning research! Pleased with where things are going with more parts of the stack accessible publically. For e.g. I now look forward to weekly @tinkerapi roundups as much as John Oliver episodes!
aashay sachdeva@AashaySachdeva

Played around with this. This was exactly something I was looking for! Tried a few things - Creating an env - pretty dope! end to end claude was able to port it from github with only minor issues. One shotted @ShashwatGoel7 OpenForecaster env here. A lot more people should contribute their own envs. I hope they launch monetisation here. Running a curator over env tasks during RL - When there are so many tasks, which one should you focus on? This is the auto-curriculum/meta-learning bit. I am still not able to beat random/pass@k but I think signals are there over long run this will help with diversity. This obviously has a power law, every run will have top envs dominating but I feel those 20% random tasks will give a big boost to any model. optimise the GEPA optimiser - gepa is great but pretty slow. What if we could teach a model to do this better? This was in my list for so long, finally with openreward was able to attempt it.

English
4
5
23
8.4K
OpenReward retweetledi
aashay sachdeva
aashay sachdeva@AashaySachdeva·
Played around with this. This was exactly something I was looking for! Tried a few things - Creating an env - pretty dope! end to end claude was able to port it from github with only minor issues. One shotted @ShashwatGoel7 OpenForecaster env here. A lot more people should contribute their own envs. I hope they launch monetisation here. Running a curator over env tasks during RL - When there are so many tasks, which one should you focus on? This is the auto-curriculum/meta-learning bit. I am still not able to beat random/pass@k but I think signals are there over long run this will help with diversity. This obviously has a power law, every run will have top envs dominating but I feel those 20% random tasks will give a big boost to any model. optimise the GEPA optimiser - gepa is great but pretty slow. What if we could teach a model to do this better? This was in my list for so long, finally with openreward was able to attempt it.
General Reasoning@GenReasoning

Introducing OpenReward. 🌍 330+ RL environments through one API ⚡ Autoscaled sandbox compute 🍒 4.5M+ unique RL tasks 🚂 Works like magic with Tinker, Miles, Slime Link and thread below.

English
3
5
59
11K
OpenReward retweetledi
Xiangyi Li
Xiangyi Li@xdotli·
.@benchflow_ai started in 09/24 as unity for benchmarks and a hosting hub with early users from Stanford and Princeton. 4 months before R1 dropped We stopped after 9 months with 0 traction. Today our latest work SkillsBench is #1 trending on @OpenReward. Game of eval is just on
Xiangyi Li tweet media
English
1
6
17
1.7K
OpenReward retweetledi
Tinker
Tinker@tinkerapi·
OpenReward serves hundreds of RL environments through a single API with autoscaled compute. Plug into Tinker to train agents on millions of tasks from anywhere. x.com/GenReasoning/s…
General Reasoning@GenReasoning

🤝 OpenReward is interoperable with any training library. Here we use the SETA environment by @Eigent_AI. We use @tinkerapi for model compute and @OpenReward for environment compute. This allows you to run agentic RL training from a laptop. github.com/OpenRewardAI/o….

English
0
4
47
8.4K
OpenReward retweetledi
Ibragim
Ibragim@ibragim_bad·
Congrats to @rosstaylor90 and the team. I was a big fan of Papers with Code back in 2019. Happy to support SWE-rebench-V2 on OpenReward from day 0! I tried the platform; really cool that you can just use any backend and run your rl on a lot of rl environments!
General Reasoning@GenReasoning

Introducing OpenReward. 🌍 330+ RL environments through one API ⚡ Autoscaled sandbox compute 🍒 4.5M+ unique RL tasks 🚂 Works like magic with Tinker, Miles, Slime Link and thread below.

English
4
4
11
2.1K
OpenReward retweetledi
OpenReward retweetledi
Ross Taylor
Ross Taylor@rosstaylor90·
We’re releasing OpenReward, a minimalist product that does one thing really well: serve RL environments at scale. Agentic RL is really painful because it adds a new axis of compute - environment compute - alongside training compute that needs to be scaled seamlessly on demand. OpenReward is a narrowly focused product based on this problem. We serve complex agentic environments as minimal API endpoints, which work with any training framework and scale based on use. Our vision is a home of reward on the internet, which is interoperable with any form of training or evaluation - and ultimately provides an open ecosystem alternative to the closed RL vendor market. 🧵
General Reasoning@GenReasoning

Introducing OpenReward. 🌍 330+ RL environments through one API ⚡ Autoscaled sandbox compute 🍒 4.5M+ unique RL tasks 🚂 Works like magic with Tinker, Miles, Slime Link and thread below.

English
17
24
215
43.7K
OpenReward retweetledi
General Reasoning
General Reasoning@GenReasoning·
Introducing OpenReward. 🌍 330+ RL environments through one API ⚡ Autoscaled sandbox compute 🍒 4.5M+ unique RL tasks 🚂 Works like magic with Tinker, Miles, Slime Link and thread below.
General Reasoning tweet media
English
25
192
1.3K
225.7K