Reza

14 posts

Reza

@Reza_LOD

Never stop fighting

Vancouver, British Columbia Katılım Mayıs 2017

84 Takip Edilen66 Takipçiler

Reza@Reza_LOD·25 Nis

medium.com/snowflake/snow…

ZXX

343

Reza@Reza_LOD·25 Nis

4/4 Want to learn more about our optimization techniques? Dive into the Snowflake Arctic Cookbook Series on building an efficient training system for Arctic for in-depth insights.

English

385

Reza@Reza_LOD·25 Nis

3/4 Last but not least, communication optimization! Leveraging smart parallelization-topology and overlapping techniques, we minimize communication overhead for the Arctic’s MoE architecture. That means faster training and smoother performance.

English

546

Reza@Reza_LOD·25 Nis

2/4 And that's not all! Let's delve into selective activation-checkpointing. By strategically reusing parts of the computation graph and quantizing activations, we find the sweet spot between speed and memory usage. It's all about maximizing efficiency!

English

468

Reza@Reza_LOD·25 Nis

1/4 Have you wondered how to optimize sys-perf for training Arctic-like models (MoE arch)? Let’s dive in! Our first technique: custom fused kernels. By crafting these kernels, we streamline irregular and sparse operators, boosting efficiency. #SnowflakeArctic #SystemOptimization

English

11.6K

Reza@Reza_LOD·22 Oca

@iliasmiraoui @MSFTDeepSpeed Do you mean the logits after each token-generation step?

English

Ilias Miraoui@iliasmiraoui·20 Oca

@MSFTDeepSpeed Can we get log probs from the inference server?

English

795

DeepSpeed@DeepSpeedAI·20 Oca

Introducing Mixtral, Phi2, Falcon, and Qwen support in #DeepSpeed-FastGen! - Up to 2.5x faster LLM inference - Optimized SplitFuse and token sampling - Exciting new features like RESTful API and more! For more details: github.com/microsoft/Deep… #DeepSpeeed #AI

English

415

49.5K

Reza@Reza_LOD·20 Oca

More updates o deepspeed inference support. The performance improvement of the MoE model (Mixtral) is quite substantial. Kudos to all the folks at DeepSpeed team :)

DeepSpeed@DeepSpeedAI

English

180

Reza@Reza_LOD·2 Eyl

@StasBekman @ContextualAI Congratulations, Stas :)

English

122

Stas Bekman@StasBekman·1 Eyl

I'm super excited to start working at @contextualai where I will be training LLMs w/ Retrieval to help businesses deploy AI that overcomes hallucination, keeps data up-to-date and runs much faster inference. If you're new to Contextual.AI, see: contextual.ai/announcing-nex… Applied ML here I come!

English

122

17.3K

Reza@Reza_LOD·31 Ağu

@MSFTDeepSpeed Just love working at this team! You work on adding a new module and all the other team members come join you and make it strong in a way that you no longer recognize it as it was originally designed. You can now use DeepSpeed-Chat with all new features and with high efficiency!

English

191

DeepSpeed@DeepSpeedAI·31 Ağu

🚀 Exciting Updates for #DeepSpeedChat! 🤖 - Llama-2 Support: Enjoy 7.1x faster generation with DeepSpeed Hybrid Engine! - Improved efficiency and accessibility through MixZ++ and ZeRO-Offload. - Improved stability and software enhancements. Blog: github.com/microsoft/Deep…

English

5.8K

Reza@Reza_LOD·19 Tem

@teknium Does it remember the previous context? It seems it is not storing cache!

English

Teknium (e/λ)@Teknium·18 Tem

Here are some ways to test Llama 2: replicate.com/a16z-infra/lla…

English

151

21.2K

Reza@Reza_LOD·17 Tem

@tri_dao Correcting myself, I am actually seeing 14% e2e performance speed! Thanks a lot @tri_dao for this amazing work.

English

133

Reza@Reza_LOD·17 Tem

@tri_dao sorry, I label the experiments wrongly, the first one is with flash-attn 2.0 and the second one with the previous version.

English

151

Tri Dao@tri_dao·17 Tem

Announcing FlashAttention-2! We released FlashAttention a year ago, making attn 2-4 faster and is now widely used in most LLM libraries. Recently I’ve been working on the next version: 2x faster than v1, 5-9x vs standard attn, reaching 225 TFLOPs/s training speed on A100. 1/

English

647

3.3K

903K

Reza@Reza_LOD·27 Haz

@ClementDelangue Open-source community should be honored to have such transparent, open, and dedicated representative. Thanks @ClementDelangue

English

clem 🤗@ClementDelangue·26 Haz

This is my 5-minute testimony before the US Congress! Open science and open source AI distribute economic gains by enabling hundreds of thousands of small companies and startups to build with AI. It fosters innovation, and fair competition between all. Thanks to ethical openness, it creates a safer path for development of artificial intelligence by giving civil society, non-profits, academia, and policy makers the capabilities they need to counterbalance the power of big private companies. Open science and open source AI prevent blackbox systems, make companies more accountable, and help solving today’s challenges like mitigating biases, reducing misinformation, promoting copyright, & rewarding all stake-holders including artists & content creators in the value creation process. Let's go!

English

362

616.3K

Keşfet

@iliasmiraoui @StasBekman @ContextualAI @contextualai @teknium @tri_dao @ClementDelangue @elonmusk