alphaXiv

2.5K posts

alphaXiv

@askalphaxiv

High fidelity research

Katılım Kasım 2023

58 Takip Edilen49.5K Takipçiler

Sabitlenmiş Tweet

alphaXiv@askalphaxiv·12 May

Reinforcing Recursive Language Models Can a 4B model learn to recursively call itself to answer hard long-context questions? We RL fine-tuned a small model to behave as a native RLM. On evidence selection across scientific papers, our 4B RLM matches Sonnet 4.6 in quality while running significantly faster and cheaper.

English

541

81.5K

alphaXiv retweetledi

Lossfunk@lossfunk·7h

🚨 With the review process for CAISc 2026 complete, we are excited for the upcoming online sessions as part of the conference program, starting this Friday, 17th July and running until Sunday, 26th July. We look forward to discussing how the way we do, share, evaluate and organize science is changing with practitioners working at the very frontier of this change. Themes covered include the development of AI-led research infrastructure, the evolution of the scientific method with AI, the future of science communication and scientific artefacts, and building sovereign AI for Science. All sessions will be online. Full list with links to register on Luma in the next tweet. 👇

English

alphaXiv@askalphaxiv·1h

read more: alphaxiv.org/abs/2607.11505

English

857

alphaXiv@askalphaxiv·1h

“Proxy Exploration and Reusable Guidance” Post-training usually forces the large model to do expensive RL exploration itself. This paper, PUST, moves that exploration to a smaller proxy model, then transfers only the learned update direction, not the proxy’s final distribution. This lets small models search once, cache the signal, and reuse it to improve bigger models across math and code, with Qwen3 4B signals giving strong gains on Qwen3 8B.

English

alphaXiv@askalphaxiv·2h

3/3 Check out the artifacts for this run and try the workflow out yourself with your own agents! alphaxiv.org/replicate/2607…

English

589

alphaXiv@askalphaxiv·2h

GPT 5.6 Sol can one-shot convert an arXiv paper into an interactive Marimo notebook! Great for papers best understood hands on (lots of fun examples in interpretability, inference engineering, agent harnesses, benchmarking, and more) Play around with the notebook, inspect the code, or try the same workflow with your own agents below

English

198

10.1K

alphaXiv@askalphaxiv·2h

2/3 We’re still trying to get a sense of exactly which papers this can be helpful for and which ones will be slop. Besides the model quality, a big factor here is papers requiring less computational resources lend themselves better to being “reproduced” in general. We’ve put out a lot of these demos over the last few weeks but it certainly requires a more thorough study, if you’re interested please reach out!

English

376

alphaXiv@askalphaxiv·2h

1/3 In the workflow we use, the agent first focuses on doing autoresearch to reproduce an illustrating experiment from the paper. From here, the agent assembles a notebook with interactive visualizations based on artifacts from the prior experiments, and when possible adds a GPU cell that runs a lightweight example from scratch.

English

489

alphaXiv@askalphaxiv·1d

Reading a paper on the go? Introducing alphaXiv Mobile 🚀 Trending Feed AI Assistant Paper Reader + Annotations Visual Overviews Download on the App Store today

English

7.3K

alphaXiv@askalphaxiv·23h

Introducing autoresearch with GPT 5.6 We had GPT 5.6 Sol reproduce the key findings from “Towards Mechanistically Understanding Why Memorized Knowledge Fails to Generalize in LLM Finetuning” Compared to GPT-5.5 and even Fable 5, GPT-5.6 stayed more focused on a few, critical experiments and spent less time on peripheral details. It also asked fewer “clarification” questions and independently resolved ambiguities instead of pushing them back to us @OpenAI pushing the boundaries of the automated research loop with models that don’t have handcuffs

English

829

64.3K

alphaXiv@askalphaxiv·17h

read more: alphaxiv.org/abs/2606.03979

English

2.5K

alphaXiv@askalphaxiv·17h

“Language Models Need Sleep: Learning to Self-Modify and Consolidate Memories” LLMs can learn inside the context window, but that knowledge disappears when the session ends. This paper from google adds a sleep phase where the model consolidates short-term memories into long-term weights using Knowledge Seeding. Then it dreams by generating RL-trained synthetic data to rehearse new knowledge and reduce forgetting. Creating a continual-learning LLM that can self-modify instead of staying frozen after pretraining.

English

128

732

44K

alphaXiv@askalphaxiv·19h

@mrstrijker Sorry, should be fixed now

English

153

Mr Strijker@mrstrijker·19h

@askalphaxiv Link is broken for me

English

144

alphaXiv@askalphaxiv·23h

Try autoresearch with GPT 5.6 for yourself: curl -LsSf openresearch.sh/install.sh | sh orx up Link to the repo GPT 5.6 generated: github.com/alphaXiv/knowl…

English

2.2K

alphaXiv@askalphaxiv·23h

“Replicate the GLM 5.2 paper” doesn’t mean anything, but there are a class of papers (interpretability, inference eng, benchmarks, agent harnesses) where autoresearch experiments like this are interesting to run and don’t require a huge amount of computational resources that the agent is responsible for.

English

2.9K

alphaXiv retweetledi

Ali Behrouz@behrouz_ali·1d

Moving from conventional ML to continual learning requires revisiting even the fundamental concepts such as “test”/“train” time. LLMs Need Sleep and Dreaming! We introduce a phase, where the model consolidates its fragile short-term memories into stable long-term memories, and then dreams to recursively self-improve over time. For memory consolidation, we introduce a new form of distillation, called Knowledge Seeding (KS), where a small model(s) distills its knowledge to a larger model. Our experiments on continual learning and reasoning tasks show that this new phase can help the model to perform better and relatively better mitigates catastrophic forgetting.

English

132

830

60.1K

alphaXiv@askalphaxiv·1d

@HarshaN18987860 Android app coming soon!

English

168

hnp@HarshaN18987860·1d

@askalphaxiv No plans for an Android app?

English

148

Keşfet

@OpenAI @mrstrijker @HarshaN18987860 @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates