

Richard Kuzma
235 posts

@rskuzma
Helping AI agents talk to data @google, ex-LLMs https://t.co/U0wNJ4SFex, AI for national security @USSOCOM, Tech for Public Good @DIU_x and Harvard @Kennedy_School




Apologies for the delay on getting MoE 101's last episode out! Originally I planned to cover inference arithmetics only, but we turned it into the MoE inference 101! I know you enjoyed the MoE training perf modeling, this one is for inference. On both gpus and cerebras. cerebras.ai/blog/moe-guide… 🧵1/n

🧮 Calling all Mathletes, this one is for you. We’ve been asked to show the math behind our MoE claims. So we did. Our analysis confirms: On GPUs, expert parallelism creates severe communication overheads that dwarf computation and make MoE training painfully slow. At Cerebras, we avoid model parallelism entirely, but sparsity subdivides batches and leaves experts I/O bound. With BTA, we fix it. By decoupling batch size requirements across experts and attention layers, we remove the bottleneck. @dmsobol breaks it down.





Today, we're excited to introduce Strella and announce $4M in seed funding led by Decibel Ventures with participation from Unusual Ventures to transform the future of customer research! 🚀 Read more about our story and the journey ahead here: hubs.la/Q02TqVlb0

We were honored to welcome Vice Admiral Frank Whitworth to the stage at AIPCon to discuss Maven Smart System, and the role of Palantir in @NGA_GEOINT's critical mission. Watch his full demonstration.

Verified by @ArtificialAnlys, Cerebras Inference achieves 1,850 tokens/sec on Llama 3.1 8B and 450 tokens/sec on Llama 3.1 70B! By dramatically reducing processing time, we're enabling more complex AI workflows and enhancing real-time LLM intelligence. This includes a new class of intelligent agents that can “think faster” than ever before. Cerebras Inference will power a new era of Instant AI. 👉Try it today: inference.cerebras.ai 👉Read our blog: cerebras.ai/blog/introduci… 👉Check out Artificial Analysis for more data: artificialanalysis.ai/providers/cere…


The DJ just played peanut butter jelly time and handed out uncrustables man




Cerebras BTLM-3B-8K model crosses 1M downloads🤯 It's the #1 ranked 3B language model on @huggingface! A big thanks to all the devs out there building on top of open source models 🙌

