piaoyang
91 posts

piaoyang
@pycui64
MTS@xAI. engineer, entrepreneur. Not affiliated with RealChar (left project in 2023)





I resigned from xAI today. This company - and the family we became - will stay with me forever. I will deeply miss the people, the warrooms, and all those battles we have fought together. It's time for my next chapter. It is an era with full possibilities: a small team armed with AIs can move mountains and redefine what's possible. Thank you to the entire xAI family. Onward. 🚀 And to Elon @elonmusk - thank you for believing in the mission and for the ride of a lifetime.



We are accustomed to the fact that LLMs know the modern world and particularly the tech of themselves (transformer, etc). ChatGPT can casually explain how itself works to you. However the knowledge doesn't have to align with the existence of themselves. We could totally imagine training a LLM with pre-deep learning, pre-computer, or pre-industrial revolution data, while still being highly intelligent. You can thus simulate an ancient human and see how it perceives and thinks about modern world, and is puzzled by its own existence. Seems fun!





Kimi AMA on K2 Thinking: 1. $4.6M training cost is not an official number 2. Trained on H800s (nerfed H100s) 3. KDA (Kimi Delta Attention) hybrids with NoPE MLA perform better than full MLA with RoPE 4. Muon scales well to 1T parameters. “there are tens of optimizers and architectures that do not survive the grill.” 5. Kimi K2 will have vision 6. K2 Thinking is natively INT4 to be friendlier to non-Blackwell GPUs while leveraging the existing int4 inference marlin kernels.















