
We just raised $400M and launched two new products. But the more interesting story is why inference infrastructure is the hardest problem in AI right now – and how we're approaching it.
Marshall Choy
369 posts

@MarshallChoy
NorCal original | My opinions

We just raised $400M and launched two new products. But the more interesting story is why inference infrastructure is the hardest problem in AI right now – and how we're approaching it.



We are excited to welcome Marshall Choy as CBO and Jennifer Glore as EVP of Product Management at Rebellions! With their leadership, we are ready to accelerate global growth and push AI forward in a more efficient and accessible way. 🌎











I've been playing with @SambaNovaAI's API serving fast Llama 3.1 405B tokens. Really cool to see leading model running at speed. Congrats to Samba Nova for hitting a 114 tokens/sec speed record (and also thanks @KunleOlukotun for getting me an API key!) sambanova.ai/blog/speed-rec…

🚀 World record performance: SambaNova is running Llama 3.1 405B at 114 t/s with full precision accuracy, in only one rack. Verified by @ArtificialAnlys! 🦙 This speed unlocks so many use cases for enterprises and developers that we cannot wait to see them built on our platform. Apply for early access today: sambanova.ai/fast-api

📣 Typhoon, a Thai LLM by SCBX & @SCB10X_OFFICIAL, is now a part of Samba-1. Previewed during Typhoon Hackathon 2024, this collaboration enables AI developers worldwide to experience enhanced power and scalability while advancing Thailand’s AI capabilities. #AI #ThaiLLM #ThaiNLP







Artificial Analysis has independently benchmarked @SambaNovaAI's custom AI chips at 1,084 tokens/s on Llama 3 Instruct (8B)! 🏁 This is the fastest output speed we have benchmarked to date and >8 times faster than the median output speed across API providers of @Meta's Llama 3 Instruct (8B) we benchmark. SambaNova currently does not yet publicly offer a serverless API but you can try out their system via their chat interface (see below tweet). SambaNova is not yet listed on the Artificial Analysis leaderboard but we understand API services using SambaNova chips will be available in the near future and we look forward to initiating full coverage. SambaNova’s custom SN40L RDU chips are their fourth generation design and are built on TSMC’s 5nm process. They are reported as having the potential to scale to serve much larger models than Llama 3 Instruct (8B) - Llama 3 400B+ 👀. Artificial Analysis has also verified that Llama 3 Instruct (8B) on Samba-1 Turbo achieves quality scores in-line with full FP16 precision by testing an MMLU-based benchmark.

🚀🌟🚀Excited to announce Samba-CoE v0.2, which outperforms DBRX by @DbrxMosaicAI and @databricks, Mixtral-8x7B from @MistralAI, and Grok-1 by @grok at a breakneck speed of 330 tokens/s. These breakthrough speeds were achieved without sacrificing precision and only on 8 sockets, showcasing the true capabilities of dataflow! Why would you buy 576 sockets and go to 8 bits when you can run using 16 bits and just 8 sockets. Try out the model and check out the speed here - coe-1.cloud.snova.ai. We are also providing a sneak peak of our next model, Samba-CoE v0.3, available soon with our partners at @LeptonAI. Read more about this announcement at sambanova.ai/blog/accurate-…


