
New open weights LLM from @MistralAI params.json: - hidden_dim / dim = 14336/4096 => 3.5X MLP expand - n_heads / n_kv_heads = 32/8 => 4X multiquery - "moe" => mixture of experts 8X top 2 👀 Likely related code: github.com/mistralai/mega… Oddly absent: an over-rehearsed professional release video talking about a revolution in AI. If people are wondering why there is so much AI activity right around now, it's because the biggest deep learning conference (NeurIPS) is next week.





