Tinker
129 posts

Tinker
@tinkerapi
I tink, therefore I am. Post-training API by @thinkymachines

RL Systems Mind the Gap: Matching Trainer and Generator Throughput RL Training Infrastructure, GRPO, PipelineRL, Async RL, Policy Staleness, RL Sandbox Infra, CPU Requirements, TCO Analysis, Thinking Machines Tinker newsletter.semianalysis.com/p/rl-systems-m…

Today we're shipping Nemotron 3 Ultra. A 550B MoE frontier-intelligence open model built for long-running agents. It delivers 5x faster inference and lowers the cost of complex agentic tasks by up to 30% versus other open frontier models.

We have been exploring new algorithmic frontiers and are excited to share our contributions to Self Distillation Policy Optimization (SDPO) for agentic continual learning, check out our blog post here: trajectory.ai/field-notes/sc…

Today, @MichaelElabd, @QuantumArjun, and I are excited to announce Trajectory. We are a research lab and product company building the platform for Continual Learning. Our platform unlocks the signal already sitting in product usage, so companies can continuously post-train large-scale agentic models that outperform the frontier. @trajectorylabs We’ve raised $15M from @Conviction, @BessemerVP, @radicalvcfund, @jeffdean, @drfeifei and more. We’re partnering with some of the best AI-native companies: @ClayRunHQ @Harvey, @DecagonAI, @mercor_ai, @RogoAI to power their agentic systems, some of which we are already in production with. We’ve brought together a world class research team from DeepMind, OpenAI, Apple, Meta Superintelligence, Amazon AGI, Scale AI, and an elite product team from Stripe and Figma. AI will never again start on day one. Every correction, every retry, every edit will make products smarter. This is Continual Learning.

People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action. thinkingmachines.ai/blog/interacti…

New preprint from @lightningrodai! We trained AI to predict clinical events — ICU transfers, new diagnoses, complications, procedures, ventilation, mortality — directly from raw clinical notes. No labeled data required – Foresight Learning infers outcomes from what happens later in patient records. Using Tinker from @thinkymachines , we trained a lightweight adapter on GPT-OSS-120B, resulting in a specialized predictor that runs on a single GPU. Results: 🎯 ~70% lower calibration error 📈 Brier skill score: ~0% → 27% 🧠 84% win-rate vs the base model in blind reasoning review 🥇 Slightly better Brier than GPT-5, despite being a fraction of the size Hospitals and specialty clinics often treat unique patient populations that out-of-the-box models don't have training data for. This makes it possible to build frontier-quality predictors for highly specific patient groups, with nothing but raw clinical records. Congrats to the team — @indiequant @KSkotheim64001 🙌 Full paper 👇 arxiv.org/abs/2605.12817











In Agent RL, models suffer from Template Collapse. They generate vast, diverse outputs (High Entropy) that lose all meaningful connection to the input prompt (Low Mutual Information). In other words, agent learn different ways to say nothing. 🚀 Introducing RAGEN-v2 -- Here's how we define and fix such silent failure modes in Agent RL. 🧵





FrontierSWE was built with collaborators from industry and academia to ensure that tasks are diverse and reflect real work engineers and researchers encounter. We specifically thank our partners @Modular, @PrimeIntellect and @thoughtfullab for their contributions

Another task tests AI research capabilities: using @tinkerapi from @thinkymachines, agents are asked to post-train an agent to play logic games, which involves writing an entire training pipeline and running experiments with different recipes to finally submit the best model

Introducing FrontierSWE, an ultra-long horizon coding benchmark. We test agents on some of the hardest technical tasks like optimizing a video rendering library or training a model to predict the quantum properties of molecules. Despite having 20 hours, they rarely succeed
