Rani
76 posts



🚀 MASSIVE upgrades for UltraData data stack! The tiered data management (L0-L4) framework has now fully battle-tested on MiniCPM5-1B and is ready for your models! No gatekeeping, just pure data power. What’s NEW in our latest release:👇 ✅ Ultra-FineWeb-L3 — 600B tokens (200B+ Chinese, 400B+ English) of high-density synthetic pre-training data, which expanded from Ultra-FineWeb via multi-style rewriting & QA generation, and has used in MiniCPM5-1B's decay stage. 🤗 huggingface.co/datasets/openb… ✅ UltraData-SFT-2605 — 15M+ post-training samples across math, code, knowledge & instruction following, with deep-thinking and non-thinking training styles, used in MiniCPM5-1B's SFT stage. 🤗 huggingface.co/datasets/openb…

Impossible to walk past a street gig like this. 🎸 Pure joy Immaculate vibes... And 100% generated by AI. 🤯 Come and sing along! 🎤✨ #AI #hixai #GenerativeAI





















