Gopi Kumar

525 posts

Gopi Kumar

@zenlytix

AI Infra @ Microsoft Research. https://t.co/yXMFohcHB1. Amateur musician @ https://t.co/jbymO74ZpX. Opinions my own.

Redmond, WA Katılım Mart 2016

74 Takip Edilen234 Takipçiler

Gopi Kumar@zenlytix·13 Şub

@dylan522p Humungous Benchmarking Operations

English

122

Dylan Patel@dylan522p·13 Şub

$1,000 for whoever comes up with the best name replacement for InferenceMAX InferenceMAX 2.0 dropping soon but we have to rename it because HBO MAX sent us a cease and desist. We have all NVIDIA GPUs from h100 to GB300 on large MoEs with SOTA optimizations like Disagg PD tested

English

360

296

59.4K

Gopi Kumar@zenlytix·7 Eki

Is it just me or the ASCII art on openai.com/devday/ represents OpenAI (the frog) gobbling up the startups (flies) :) #devday2025 #OpenAIDevDay #openai

English

Gopi Kumar@zenlytix·20 Ağu

In this part two of the AI coding practices, we look at the AI assisted coding in the enterprise production code scenario. linkedin.com/pulse/coding-e… Let me know what you think and if it resonates with you.

English

Gopi Kumar@zenlytix·19 Ağu

Putting together a series of short posts with some thoughts and opinions on the practice of vibe coding. This is the introductory post to set the stage for the conversation. Appreciate your feedback and suggestions. linkedin.com/pulse/practice…

English

Gopi Kumar@zenlytix·19 Haz

Informative and entertaining #gpucomic from @Modular that I can relate to (a lot :)).

Modular@Modular

🎉 To celebrate the launch, we're giving away free, limited edition 🔥 Kernel Problems T-shirts 👕! To claim one, just reshare comic.modular.com on X, mention @Modular, and tag the post with #gpucomic. We'll DM ya for sizing and shipping info.

English

208

Gopi Kumar@zenlytix·6 Şub

@miguelgfierro Agree with learning by practice. One thing one would benefit starting with RAG/even 0-shot prompting LLM is understanding how to evaluate results. You don't need know at 1st how to build model or what gradient descent is but basics of how to measure is key to grok from start.

English

Miguel Fierro@miguelgfierro·4 Şub

Unpopular option: if you want to get into AI, start from a RAG system like this instead of linear regression. I call this reverse learning. 𝐓𝐫𝐚𝐝𝐢𝐭𝐢𝐨𝐧𝐚𝐥 𝐰𝐚𝐲: Linear regression->Logistic Regression->SVMs->Decision Trees->Random Forests->Gradient Boosted Trees->CNNs->RNN->LSTMs->LLMs->RAG. 𝐑𝐞𝐯𝐞𝐫𝐬𝐞 𝐋𝐞𝐚𝐫𝐧𝐢𝐧𝐠: RAG->LLMs->Gradient Boosted Trees->Random Forests->Logistic Regression 𝐓𝐫𝐚𝐝𝐢𝐭𝐢𝐨𝐧𝐚𝐥 𝐰𝐚𝐲: First study the theory, then practice with projects. 𝐑𝐞𝐯𝐞𝐫𝐬𝐞 𝐋𝐞𝐚𝐫𝐧𝐢𝐧𝐠: First practice with projects, then learn the theory underlying the models. 𝐓𝐫𝐚𝐝𝐢𝐭𝐢𝐨𝐧𝐚𝐥 𝐰𝐚𝐲: Six months to learn the AI that that companys require to get an AI position. 𝐑𝐞𝐯𝐞𝐫𝐬𝐞 𝐋𝐞𝐚𝐫𝐧𝐢𝐧𝐠: One month to learn the AI that that companys require to get an AI position. The key idea is start from the AI that is useful to get an AI position instead of the AI that was used 30 years ago. 𝐐𝐮𝐞𝐬𝐭𝐢𝐨𝐧: But it's impossible to do RAG+LLMs if you don't know linear and logistic regression! 𝐀𝐧𝐬𝐰𝐞𝐫: Are you sure? In the picture below you have a RAG solution with DeepSeek LLM in less than 30 lines of code. Challenge the status quo! Vamos!!!🦾🦾🦾

English

217

Gopi Kumar@zenlytix·9 Eki

@reach_vb github.com/microsoft/Dire… And github.com/microsoft/onnx…

Vaibhav (VB) Srivastav@reach_vb·8 Eki

On-device AI framework ecosystem is blooming these days: 1. llama.cpp - All things Whisper, LLMs & VLMs - run across Metal, CUDA and other backends (AMD/ NPU etc) github.com/ggerganov/llam… 2. MLC - Deploy LLMs across platforms especially WebGPU (fastest WebGPU LLM implementation out there) github.com/mlc-ai/web-llm 3. MLX - Arguably the fastest general purpose framework (Mac only) - Supports all major Image Generation (Flux, SDXL, etc), Transcription (Whisper), LLMs github.com/ml-explore/mlx… 4. Candle - Cross-platform general purpose framework written in Rust - wide coverage across model categories github.com/huggingface/ca… Honorable mentions: 1. Transformers.js - Javascript (WebGPU) implementation built on top of ONNXruntimeweb github.com/xenova/transfo… 2. Mistral rs - Rust implementation for LLMs & VLMs, built on top of Candle github.com/EricLBuehler/m… 3. Ratchet - Cross platform, rust based WebGPU framework built for battle-tested deployments github.com/huggingface/ra… 4. Zml - Cross platform, Zig based ML framework github.com/zml/zml Looking forward to how the ecosystem would look 1 year from now - Quite bullish on the top 4 atm - but open source ecosystem changes quite a bit! 🤗 Also, which frameworks did I miss?

English

101

559

62.2K

Gopi Kumar@zenlytix·22 May

Here is a short walkthrough of running the Phi-3 Mini model on a Windows365 Cloud GPU desktop with #nvidia A10 showing the e2e steps starting with setup, downloading the model and running the model inference in two different ways (all in under 5 mins) youtu.be/xNT8aRJeC3k

YouTube

English

114

Gopi Kumar@zenlytix·21 May

Introducing the new Azure AI infrastructure VM series ND MI300X v5 - Microsoft Community Hub techcommunity.microsoft.com/t5/azure-high-…

English

Gopi Kumar retweetledi

Jeff Boudier 🤗@jeffboudier·21 May

Today at @Microsoft Build, @satyanadella announced a deepened partnership with @huggingface, with new experiences across cloud, hardware, open source and developers. Here's a round up of all the joint work we announced this week! huggingface.co/blog/microsoft… - Azure AI Studio Model Catalog - Azure AMD GPU MI300X VMs - Phi-3 open models - WebGPU with transformers.js and optimum - Spaces Dev Mode

English

109

18.4K

Gopi Kumar@zenlytix·30 Nis

4. Download the chat sample and run: Reference: github.com/microsoft/onnx… curl raw.githubusercontent.com/microsoft/onnx… -o model-qa.py python model-qa.py -m directml/directml-int4-awq-block-128 -l 2048 -g

English

Gopi Kumar@zenlytix·30 Nis

3. Download the Phi-3 ONNX DirectML Model huggingface-cli download microsoft/Phi-3-mini-4k-instruct-onnx --include directml/* --local-dir . --local-dir-use-symlinks False

English

Gopi Kumar@zenlytix·30 Nis

Four easy steps: 1. Create Conda environment and activate: conda create --name phiamdv620 python==3.10 -y conda activate phiamdv620 mkdir phiamdv620 cd phiamdv620 2. Install following packages: pip install numpy huggingface-hub pip install --pre onnxruntime-genai-directml

English

Gopi Kumar@zenlytix·30 Nis

Prerequisites: * Install AMD GPU Driver as indicated in Azure doc page: learn.microsoft.com/en-us/azure/vi… Download the EXE package and execute it. Follow the wizard. (Just Driver alone is adequate). VM will need a reboot * Install Miniconda docs.anaconda.com/free/miniconda…

English

Gopi Kumar@zenlytix·30 Nis

Environment Used: Azure AMD GPU V620 VM (Standard NG32ads V620 v1 (32 vcpus, 64 GiB memory, 1xV620 GPU) running Windows 11 Pro VM Image

Gopi Kumar@zenlytix·30 Nis

Quick walkthrough to setup and run @MicrosoftAI Phi-3 model on Windows machine with an AMD V620 GPU using #microsoft @Windows DirectML and @onnxruntime GenAI library in under 5 minutes. These instructions will work across Nvidia, AMD and Intel GPUs. 🧵 youtube.com/watch?v=hghCoi…

YouTube

English

127

Gopi Kumar retweetledi

Sebastien Bubeck@SebastienBubeck·23 Nis

phi-3 is here, and it's ... good :-). I made a quick short demo to give you a feel of what phi-3-mini (3.8B) can do. Stay tuned for the open weights release and more announcements tomorrow morning! (And ofc this wouldn't be complete without the usual table of benchmarks!)

English

175

917

485.7K

Gopi Kumar@zenlytix·14 Mar

Research report showing productivity gains from Microsoft Security #copilot. Full report: go.microsoft.com/fwlink/?linkid…

English

Gopi Kumar@zenlytix·8 Şub

@bindureddy I guess if Alice is playing online chess hard to say what the other sister is doing? :)

English

Bindu Reddy@bindureddy·8 Şub

Bard is Now Gemini! My initial thoughts - Still continues to be somewhat nerfed and refuses to answer questions - Refused to generate a simple illustration of George Clooney, ChatGPT is better - missing PDF upload - Answers do seem better than the previous version - Seems to have a "reasoning vibe" - However, it does NOT answer some hard questions that GPT-4 does. For example, it didn't get "In a room I have only 3 sisters. Anna is reading a book. Alice is playing a match of chess. What the third sister, Amanda, is, doing ?" The answer is the 3rd sister is playing Chess. GPT-4 nails it. Overall, we plan to do a lot more analysis, but first impressions are good but not great. TLDR; I don't think it will make a material difference to how Bard was doing before, especially if their plan is to charge for this. However, it's always good to have more players in the market. 🤷‍♀️

English

364

100.1K

Gopi Kumar@zenlytix·10 Oca

LoRA SFT took about an hour on the MedAlpaca on a single node with 8xAMD MI250X (Dual vGPU) huggingface.co/datasets/medal…. Used the Llama-Factory (github.com/hiyouga/LLaMA-…) to do the fine tuning which made things super easy. Standard instructions worked just fine on AMD/ROCm too.

English

121

Gopi Kumar@zenlytix·10 Oca

This weekend I was able to smoothly run inferencing and Supervised Fine tuning (LoRA) on @Microsoft #phi2 model (huggingface.co/microsoft/phi-2) on an @amd MI250X GPU / #rocm on @Azure . Nice work @SebastienBubeck and team.

English

176

Keşfet

@dylan522p @Modular @miguelgfierro @reach_vb @Microsoft @satyanadella @huggingface @MicrosoftAI