xjdr
7.3K posts

xjdr
@_xjdr
building AI that wont embarrass me in front of my own standards

SMCI confesses to being Nvidia's partner-in-crime in Singapore sales. Amazing.



It's been known for a while that Canon from @ZeyuanAllenZhu is a monstrously powerful augmentation to the Transformer recipe, and I'm lowkey seething that it's not had industry adoption yet. GLM switched to DSA in months. If you're doing new pretraining, why not test Canon too?


The idea of rotating attention by 90° is sooooooo cool (credits to @Jianlin_S 's insights), and it surprisingly works. We (w/ the amazing @nathan) are so excited about this— been working on the paper for months and couldn't stop. Go give it a try. It's a drop-in replacement for standard residuals, born in 2015. really like the figs btw :-)


just kidding, I'm pretty confident they'll start a new era with V4
















