General Reasoning

80 posts

General Reasoning

@GenReasoning

A long-horizon reinforcement learning company.

London, England Katılım Mayıs 2024

0 Takip Edilen5.4K Takipçiler

Sabitlenmiş Tweet

General Reasoning@GenReasoning·26 May

🎡📜 We are recruiting for our London chapter. The next era of reinforcement learning is going to be unlike the last, and requires new algorithms, environments and everything inbetween. 🏛️ Shape the next civilisation with us. Link below 👇

English

14K

General Reasoning@GenReasoning·26 Haz

OpenReward integration for FutureSim by @ShashwatGoel7 and team! 👇 Try it it out

Shashwat Goel@ShashwatGoel7

Major FutureSim updates! 🥽 Visualized trajectories: extremely long horizon, 1000s of actions per run and dozens of compactions, with summaries! Goldmine of interesting agent behaviors across Fable 5, GPT 5.5, GLM 5.2 etc! 📈 @OpenReward integration! You can now use FutureSim, incl. on your own data, with our favourite standard! The @GenReasoning was super supportive in making this happen! Some cool observations in 🧵

English

2.7K

General Reasoning@GenReasoning·25 Haz

👇 Come join Ross and Chengxi at @aiDotEngineer World’s Fair next week in SF!

Ross Taylor@rosstaylor90

Excited to be speaking at @aiDotEngineer World’s Fair alongside partner in crime @ChengxiTaylor. It seems we have a lot to talk about: 🤼‍♂️ Actor-critic is hot again? How are current RL algorithms changing with long-horizon rollouts. 🌍 What does a good long-horizon environment look like? How do they differ from terminal based tasks? 🤖 What does good RL infra look like in 2026? Why is long horizon breaking things? As a bonus, I’ll give a run down of our own RL journey starting from Galactica (2021/2022), early SoTA reasoning efforts at Meta (2023), and more. Super fun, high signal-to-noise guaranteed. Come join us!

English

2.5K

General Reasoning@GenReasoning·18 Haz

Because of backtest variance, we also record a process-based rubric measure called "sophistication", which we track over time. This uses a human expert rubric to judge the sophistication of the strategies employed. GLM-5.2 shows impressive sophistication compared to other open models, although it still does not surpass the closed SoTA on this metric for any time this year. You can see the full leaderboard and more analysis here: gr.inc/releases/intro…

English

2.2K

General Reasoning@GenReasoning·18 Haz

We evaluated recent open models on KellyBench. Here is what we found: 🏆 GLM 5.2 is new open source SoTA, but still loses -30% on average over 5 runs. 📈 We estimate GLM 5.2 is 6+ months behind the frontier based on KellyBench and internal quant evaluations. (Note: we have not evaluated Fable) 🌗 Kimi K2.6 slightly improves on Kimi K2.5 but still struggles at -60% average RoI. 🐈 Recent Mistral models struggle, obtaining mean RoIs of -78% and -99% respectively. Leaderboard link and more graphs below.

English

21.1K

General Reasoning retweetledi

OpenReward@OpenReward·17 Haz

Train on OpenReward environments with TRL! 👇

Adithya S K@adithya_s_k

You can now train on 350+ RL Environments from OpenReward with TRL with just a few lines of code

English

422

General Reasoning@GenReasoning·26 May

gr.inc/careers

ZXX

1.2K

General Reasoning@GenReasoning·26 May

English

14K

General Reasoning@GenReasoning·7 May

We've updated the leaderboard with GPT-5.5 results: gr.inc/releases/intro… TLDR: 30% strategy sophistication (new SoTA), much more efficient than GPT-5.4, but still in the red and systematically underperforming human approaches.

English

536

General Reasoning@GenReasoning·6 May

We’re also thankful that KellyBench was featured on the front page of the FT last month! (The British AI neuron has a superposition with football…)

English

935

General Reasoning@GenReasoning·6 May

New models on KellyBench, the benchmark for long-horizon sequential decision making. 🎆 Claude Opus 4.7 - new state-of-the-art, with the highest strategy sophistication recorded on the benchmark (28.5%). But still loses -3.7% on average over five seeds. 🐋 DeepSeek V4 Pro - shows strong feature development capabilities, but scores poorly on overfitting rubrics which harms it in deployment (-47%). Link and more graphs below.

English

11.5K

General Reasoning@GenReasoning·5 May

Thanks to @ibragim_bad, Jeff Smith and Giovanni R for judging. Thanks to event partners @join_ef @airstreet. You can view the environments contributed at the event below: openreward.ai/environments

English

516

General Reasoning@GenReasoning·5 May

🌍 Last month we hosted the Complex Worlds Hackathon in London. Participants built an impressive range of environments, spanning synthetic game pipelines, arable farm management, dynamic vehicle routing, hospital triage, robotics, cybersecurity, and more. Congrats to our winners, Julie Huang and Khalid A!

English

5.8K

General Reasoning@GenReasoning·22 Nis

🎉 We're now supporting the Agent Data Protocol as a default agentic trajectory format. Any trajectories you log to @OpenReward can be exported in the ADP format. Thanks to @gneubig @yueqi_song for the collaboration!

English

17.6K

General Reasoning@GenReasoning·22 Nis

@ruffy0369 @OpenReward @gandhikanishk Hey @ruffy0369 - the SDK you are using is outdated; please upgrade to 1.101. pypi.org/project/openre…

English

ruffy369@ruffy0369·22 Nis

@GenReasoning @OpenReward @gandhikanishk Hi @GenReasoning ,is it just me or is the API currently having some routing issues? I'm trying to run the EndlessTerminals environment but the SDK (0.1.33) is hitting a 404 fault filter abort on matrix.openreward.ai Curious if this is a known issue or if I'm missing something.

English

General Reasoning@GenReasoning·31 Mar

🌍 Environments of the Week It's been a week since we launched @OpenReward. Here are some of our favourite environments this week - some newly added, some heavily used, and some hidden gems. First, the most used environment of the week is EndlessTerminals by @gandhikanishk with 830k+ tool calls. openreward.ai/kanishk/Endles… 🧵

English

9.7K

General Reasoning@GenReasoning·21 Nis

Release: gr.inc/releases/deplo…

English

563

General Reasoning@GenReasoning·21 Nis

🎉 Native Harbor support on OpenReward! 🐋 Connect your GitHub repository. We'll build the Docker images for each harbor task and deploy the environment as an API endpoint. 🚂 Train on the deployed tasks with any RL framework. ⚖️ Evaluate on the deployed tasks with any harness. Drop the anchor here and get started below: docs.openreward.ai/environments/d…

English

4.7K

General Reasoning@GenReasoning·21 Nis

Release: gr.inc/releases/intro… GitHub: github.com/GeneralReasoni…

English

690

General Reasoning@GenReasoning·21 Nis

🔥🐴 Firehorse. Run any model with any harness on any @OpenReward environment. ⚖️ Evaluate the latest models on environment endpoints. 🗂️ Collect agentic data for midtraining and SFT from open models. 🧪 Early experimental library. More support soon. Link below.

English

5.1K

Keşfet

@ShashwatGoel7 @aiDotEngineer @ibragim_bad @join_ef @airstreet @OpenReward @gneubig @yueqi_song