enty
3.4K posts

enty
@chronurgist
no themes. only such model pretraining and gpu kernels

Introducing ><former Most transformers are rectangles◻️: every layer has the same width But is that optimal?🤔 We propose variable-width transformers that have different widths across layers, improving loss while cutting compute & KV cache size 🧵

Git graph with timeline. 📈

Today marks the beginning of our launch calendar and to celebrate i am making ncode and our flagship kimi k2.7 model free to use for the next week (or until the traffic knocks us out). all you need to do is: 1) sign up for a noumena account at code.noumena.com 2) go to github.com/Noumena-Networ… and clone and build noumena code (ncode) 3) login to the platform with `ncode auth login` (or /login once you are in the app) 4) enjoy blazing fast tokens on the noumena platform with ncode





What Makes Good Synthetic Pretraining Data with Joël Niklaus from Hugginface x.com/i/broadcasts/1…


this is the "jailbreak" that got Fable shut down you’ve been prepared for the singularity for years. for AGI to change everything. for nation-states to go to war over GPUs. but were you prepared for it to be this retarded?



















