
Sijun He
441 posts

Sijun He
@SijunHe
Working on #ERNIE at @Baidu, used to struggle at @TwitterEng, @Stanford



4/ For Cogito v2.1, we fork off the open-licensed Deepseek base model from November 2024. This is an obvious choice for a pretrained base model, as Deepseek architecture has an ecosystem of cheap inference built around it. We have built a frontier training stack, while being an early stage startup, since we can stand on the shoulders of open source champions like @huggingface, @togethercompute, @runpod and @nebiusai, as well as stellar contributions by @Microsoft, @Meta, @nvidia and a lot of other folks in open source. Over the last months, we have iterated and refined our post-training strategies of self-play + RL (called Iterated Distillation and Amplification - IDA) with Cogito v1 and v2. You will see high-quality responses from Cogito v2.1 while being a bit different from usual models - we increase the model’s intelligence prior and teach it how to think via process supervision. So there are significantly shorter reasoning chains for the responses. We also use less markdown, less verbosity. In short, we want to make the model great for API usage - faster, fewer tokens with super high quality.

















I prefer to operate in “GPU-Poor” mode. I don’t agree with the take from the semianalysis piece. Creative breakthroughs often occur under constraints—new systems, models, and methods that can better take advantage of even larger-scale compute semianalysis.com/p/google-gemin…

Today is a huge milestone for one of our latest libraries: Text Generation Inference - we released v1.0 and under a new license: HFOIL 1.0 github.com/huggingface/te… This 🧵 explains what this new license means, and why the change!




New Post: Sharing some thoughts on the emerging tech stack of generative AI. LLMs != AI System != AI Product Thin wrappers are being built every day, but are they sufficient for vertical SaaS applications? Let's unpack 👉🚀 @cresta #GenerativeAI #GPT4 cresta.com/ai-innovation/…







