OpenPipe

113 posts

OpenPipe

@OpenPipeAI

OpenPipe: Fine-tuning for production apps. Train higher quality, faster models. (YC S23)

San Francisco 加入时间 Ağustos 2023

7 关注4.1K 粉丝

OpenPipe 已转推

Shane Henke@realshanehenke·23 Oca

Today was the first time I used my fine-tuned model I used Llama 3.1 using @OpenPipeAI the results are encouraging. There are like 5 more to build so wish me luck 😂

English

1.6K

OpenPipe@OpenPipeAI·14 Kas

@charles_irl @DecagonAI Don't get us started!

English

Charles 🎉 Frye@charles_irl·14 Kas

Read more about how @DecagonAI serves custom small models for interactive intelligence here: modal.com/blog/decagon-c…. Or ask the @OpenPipeAI team about small models. They'll talk your ear off!

English

1.1K

Charles 🎉 Frye@charles_irl·14 Kas

Start in Python, then Rewrite It In Rust™ once you understand the problem and the requirements. Start with off-the-shelf large models, then migrate to custom small models once you've collected enough data and explored failure modes.

English

263

16.4K

OpenPipe@OpenPipeAI·3 Kas

Coming soon!

Alex Volkov@altryne

@jeremyphoward @teknium @aidanshandle 👀 Lora inference is part of our recent release with @OpenPipeAI and I was told we're working on an independent way to do lora inference on @wandb inference. Will keep you posted if you'd like Jeremy 🫡

English

3.1K

OpenPipe 已转推

Hina@soleil_colza_·12 Eki

We as a team are learning RL for the first time this weekend. We're totally new to this field, but @OpenPipeAI's docs really helped us kickstart! @wandb

English

3.4K

OpenPipe 已转推

Nathan Lambert@natolambert·21 Eki

.@corbtt did a great job on this pod, he's a cool guy, recommended youtube.com/watch?v=yYZBd2…

YouTube

English

24.2K

OpenPipe@OpenPipeAI·22 Eki

@Vtrivedy10 @corbtt @latentspacepod 🫡

QME

OpenPipe 已转推

Viv@Vtrivedy10·21 Eki

some thoughts on why/how the standard paradigm for optimizing Task Specific Agents will be Harness Engineering + Rubric Based Task Specific RL. this write up codifies a lot of my thoughts on where harness engineering is going + inspiration from @corbtt on @latentspacepod Steps: 1. Obsessively hand/auto tune agent harness until you reach a baseline threshold of task performance. Goal: make sure the agent has roughly what it needs to succeed. 2. Do Task Specific RL to make the model better at operating in the harness, touching the model weights pushes us beyond what harness engineering alone can do. FAQs: 1. Why in this order? If your agent rarely succeeds on the Task, it can’t get enough of a reward, this makes RL difficult. Optimize the harness first —> “You can’t succeed if you don’t have the right tools” 2. Why Not Just Keep Optimizing your Harness, I thought Prompt Engineering is the way? Harness engineering in the latter stages is incredibly hard. It’s also combinatorially complex from the start. You have to jointly optimizes each component(System Prompt, Tools/Skills/MCP, Subagent definitions, Additional Context) but you have: - A Selection Problem: How do you intelligently select relevant tools from a sea of possibilities? Context is a precious resource and selecting too many tools is confusing and degrades perf. - Codependency: No component is optimized in isolation, it’s one big system (ex: changing the system prompt may change how/if a tool is called) 3. So how should I start? First painstakingly test everything in your harness. - try multiple models - hand tune system prompts, try GEPA - more/less tools, tool descriptions, compound tools - handing off tasks to subagents - preloading useful notes, docs, and instructions as references eventually when you hit a wall (on performance or human resources), you move on. You can also move on much earlier once you hit some performance threshold. 4. What does RL get you? Agents (Claude Code, Codex) from the labs are so reliable at using their tools (WebSearch, Multi-Edit, Grep) because they’re directly post-trained with them. We want that same for our tasks, we want to make the model more comfortable using its harness while training to increase Task performance. 5. How should I get started with Task Specific RL? A fantastic first place to start is RULER from @OpenPipeAI which relies on rubrics created by you + LLM as a judge across multiple generations. For your task, you probably already have a set of ideas on what’s good, codifying that in a rubric is all you need to get started working on writing up a walkthrough blog of this with code. really excited about building products that treat agent building as a harness optimization problem that you measure deeply + push further with RL

English

11.1K

OpenPipe 已转推

Weights & Biases@wandb·16 Eki

LIVE: Kyle Corbitt, Head of the OpenPipe team at CoreWeave, joins ThursdAI to talk about launching the first Serverless Reinforcement Learning capability. x.com/i/broadcasts/1…

English

OpenPipe 已转推

Santiago Pombo@SantiagoPombo·15 Eki

The Custom SLMs era is upon us 🙌 - Nanochat by @karpathy - Thinker (PEFTaaS) by @thinkymachines - Tunix (Post-train in Jax) by @GoogleAI - Art (Agent RL) by @OpenPipeAI - Environments Hub by @PrimeIntellect - NeMo Microservices by @nvidia

Andrej Karpathy@karpathy

Excited to release new repo: nanochat! (it's among the most unhinged I've written). Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single, dependency-minimal codebase. You boot up a cloud GPU box, run a single script and in as little as 4 hours later you can talk to your own LLM in a ChatGPT-like web UI. It weighs ~8,000 lines of imo quite clean code to: - Train the tokenizer using a new Rust implementation - Pretrain a Transformer LLM on FineWeb, evaluate CORE score across a number of metrics - Midtrain on user-assistant conversations from SmolTalk, multiple choice questions, tool use. - SFT, evaluate the chat model on world knowledge multiple choice (ARC-E/C, MMLU), math (GSM8K), code (HumanEval) - RL the model optionally on GSM8K with "GRPO" - Efficient inference the model in an Engine with KV cache, simple prefill/decode, tool use (Python interpreter in a lightweight sandbox), talk to it over CLI or ChatGPT-like WebUI. - Write a single markdown report card, summarizing and gamifying the whole thing. Even for as low as ~$100 in cost (~4 hours on an 8XH100 node), you can train a little ChatGPT clone that you can kind of talk to, and which can write stories/poems, answer simple questions. About ~12 hours surpasses GPT-2 CORE metric. As you further scale up towards ~$1000 (~41.6 hours of training), it quickly becomes a lot more coherent and can solve simple math/code problems and take multiple choice tests. E.g. a depth 30 model trained for 24 hours (this is about equal to FLOPs of GPT-3 Small 125M and 1/1000th of GPT-3) gets into 40s on MMLU and 70s on ARC-Easy, 20s on GSM8K, etc. My goal is to get the full "strong baseline" stack into one cohesive, minimal, readable, hackable, maximally forkable repo. nanochat will be the capstone project of LLM101n (which is still being developed). I think it also has potential to grow into a research harness, or a benchmark, similar to nanoGPT before it. It is by no means finished, tuned or optimized (actually I think there's likely quite a bit of low-hanging fruit), but I think it's at a place where the overall skeleton is ok enough that it can go up on GitHub where all the parts of it can be improved. Link to repo and a detailed walkthrough of the nanochat speedrun is in the reply.

English

1.3K

OpenPipe 已转推

Maziyar PANAHI@MaziyarPanahi·13 Eki

@morgymcg @TrelisResearch @wandb @OpenPipeAI i am looking forward to @OpenPipeAI next week on @altryne podcast, see what's new and what's ahead. awesome! looking forward to it.

English

362

OpenPipe 已转推

Morgan McGuire@morgymcg·13 Eki

@MaziyarPanahi @TrelisResearch @wandb Serverless RL from the @OpenPipeAI crew (now also at Coreweave) with W&B inference is pretty sweet - new models being added soon too openpipe.ai/blog/serverles…

English

335

OpenPipe@OpenPipeAI·9 Eki

@l2k 🫡

QME

OpenPipe 已转推

Lukas Biewald@l2k·9 Eki

We've started a great tradition at CoreWeave of shipping an integrated new product weeks after acquisition - congrats @OpenPipeAI on the serverless RL launch!

English

5.2K

Yacine Mahdid@yacinelearning·7 Eki

I'm in the unfortunate position to let you know that I've fallen for the RL-LLMs propaganda 100% with these results from openpipe I am now fully RL pilled and there is no turning back very sorry folks

Prime Intellect@PrimeIntellect

Introducing the Environments Hub RL environments are the key bottleneck to the next wave of AI progress, but big labs are locking them down We built a community platform for crowdsourcing open environments, so anyone can contribute to open-source AGI

English

973

101.2K

OpenPipe@OpenPipeAI·9 Eki

@yacinelearning Welcome to the RL side. You'll like it here.

English

OpenPipe 已转推

Average JEPA stan account 🇫🇷@pdshateaccount·7 Eki

@yacinelearning One thing that surprised me pretty much is how well feedback from LLM is for automatic improvement. OpenPipe is one example with RL, but it also works well with prompt optimization (sometimes outperforming RL), see GEPA paper. arxiv.org/abs/2507.19457

Average JEPA stan account 🇫🇷 tweet media

English

OpenPipe 已转推

Shawn Lewis@shawnup·8 Eki

Sept 8: @CoreWeave acquires @OpenPipeAI Oct 8: @OpenPipeAI ships Serverless RL

Kyle Corbitt@corbtt

🚀 Big launch from @OpenPipeAI: We just launched Serverless RL — train agents faster and cheaper with zero infra headaches. Compared to running your own GPUs, Serverless RL is: - 40% cheaper - 28% faster wall‑clock - instantly deployed to prod via @wandb Inference

English

7.8K

OpenPipe 已转推

Kyle Corbitt@corbtt·8 Eki

English

222

34.9K

OpenPipe 已转推

sohail@Sohailmo·9 Eki

@wandb @CoreWeave the coreweave acquisitions is about to facilitate the ultimate end to end platform

English

1.3K

OpenPipe@OpenPipeAI·9 Eki

@wandb @CoreWeave First of many launches! Stay tuned. 👀

English

132

Weights & Biases@wandb·8 Eki

RL X-mas came early. 🎄 For too long, building powerful AI agents with Reinforcement Learning has been blocked by GPU scarcity and complex infrastructure. That ends today. Introducing Serverless RL from wandb, powered by @CoreWeave! We're making RL accessible to all.

English

153

112.6K

发现

@charles_irl @DecagonAI @wandb @corbtt @Vtrivedy10 @latentspacepod @karpathy @thinkymachines