Brad Bokal

2.9K posts

Brad Bokal

@bbokal

Be a good human. BD @TensorWaveCloud

Colorado, USA Katılım Ekim 2010

0 Takip Edilen4.6K Takipçiler

Brad Bokal retweetledi

TechCrunch@TechCrunch·8 Eki

TensorWave thinks it can break Nvidia’s grip on AI compute with an AMD-powered cloud tcrn.ch/3NnQ80Z

English

22.4K

Brad Bokal@bbokal·5 Ağu

@qnguyen3 @arcee_ai That’s awesome - been trying to get Brian and Jacob to meet to talk @tensorwave and see if we can collaborate - maybe we can meet once you get there.

English

Quan Nguyen@qnguyen3·5 Ağu

I am joining @arcee_ai 👋🥳

English

11.4K

Brad Bokal@bbokal·3 Ağu

Nvidia delays leaving you scrambling to figure out what now? Hit me up and let’s talk @AMD #MI300X we have ready to go. @tensorwave #nvidiadelays #blackwell #gpus

English

583

Brad Bokal retweetledi

Artificial Analysis@ArtificialAnlys·25 Tem

Llama 3.1 405B could be the catalyst for much greater #AMD adoption for AI inference 📈 @AMD's MI300X may be uniquely suited to cost-effective Llama 3.1 405B inference. Its 192GB of memory allows a single 8xMI300X node to serve Llama 3.1 405B in its native FP16 precision - whereas two 8xH100 nodes are required on @nvidia. As we have previously covered, a single NVIDIA 8xH100 node only has 640GB of memory - not enough to hold Llama 3.1 405B’s full 810GB of FP16 weights in memory at once. This means that providers are forced to deploy two 8xH100 nodes with interconnect to serve 405B in FP16 precision, forcing them to accept a significant cost and complexity penalty. Nvidia’s future H200 and B100 come with 141GB and 192GB of high bandwidth memory respectively - but unlike those, AMD MI300X is available now. @LisaSu noted on AMD’s Q2 earnings call that AMD was demand-limited on MI300X for the remainder of 2024. Will Llama 3.1 405B alone flip that narrative? We are starting to see adoption and support increase. Both @FireworksAI_HQ and @LeptonAI are hosting Llama 3.1 405B on AMD MI300X chips. They stand out as the lowest cost providers of Llama 3.1 405B. However, it is important to note they are serving the model at FP8 and INT8 precision respectively. Furthermore, projects like GPU.cpp from @answerdotai (@jeremyphoward, @austinvhuang) are making it easier than ever to write and run portable code across different chip (hardware & software) architectures - decreasing the CUDA lock-in. What is your view? Long #AMD?

English

179

37.8K

Brad Bokal retweetledi

TensorWave@tensorwave·23 Tem

Researchers have introduced a new method to speed up long context windows in LLMs. Their adaptive structured sparse attention mechanism reduces Time-to-First-Token latency without affecting accuracy or needing extra training. Learn more: buff.ly/4bR7NYI #LLMs #research

English

549

Brad Bokal@bbokal·24 Tem

@justin_trugman @ollama @tensorwave @AMD Let’s talk we can hook you up

English

Justin Trugman@justin_trugman·23 Tem

@ollama @tensorwave @AMD I'm going to need more VRAM

English

1.9K

Brad Bokal retweetledi

ollama@ollama·23 Tem

ollama run llama3.1:405b Tested in @tensorwave with @AMD MI300X 🤯

English

126

1.1K

98.1K

Brad Bokal retweetledi

TensorWave@tensorwave·19 Tem

Milestone Unlocked 🚀 We have achieved FP8 on @AMD's MI300X. Discover the implications for AI workloads in our latest blog post. Click to learn more! 👉 buff.ly/3zNX7gp

English

3.2K

Brad Bokal@bbokal·17 Tem

Running out of #compute credits? If you are a startup let’s talk. We are building something special and welcome your feedback! @tensorwave #compute #ai #GPU

English

Brad Bokal retweetledi

J͓T͓@jt___xyz·16 Tem

New achievement unlocked! 🔐 Thanks to our friends over at @mkoneai & @Gradient_AI_ , we've cracked the code on a real-time chat with 1M context window using Llama 70B! Which is cool by itself, however, thanks to our MI300X's and their massive 192GB of memory per card; These accelerators are pivotal for running long context models efficiently, allowing model parameters and large context caches to be stored on fewer cards.😎 Also I'm pretty sure the only company that's tackling this is Google’s Gemini 1.5 Pro. However, Google's impressive models come with significant limitations: ❌ Feature Gaps: No real-time chat with cached context ❌ Limited Customization: Minimal fine-tuning capabilities ❌ Scalability Constraints: API rate limits restrict large-scale deployments ❌ Cost Inefficiency: High expenses for long context token usage 🔗 Read the full report here: lnkd.in/gZSiyiB9

English

1.7K

Brad Bokal retweetledi

TensorWave@tensorwave·16 Tem

A TensorWave Report: Unlocking real-time chat with 1M context Llama3-70B model on AMD’s MI300X tensorwave.com/blog/unlocking…

English

553

Brad Bokal retweetledi

TensorWave@tensorwave·14 Tem

AMD does it again! We love to see AMD growing presence in the AI and data center market 🚀 #AMD #TensorWave #AI #CloudCompute buff.ly/4dUyQ83

English

840

Brad Bokal@bbokal·2 Tem

@qnguyen3 @tensorwave @cognitivecompai This is awesome.

English

Brad Bokal retweetledi

Quan Nguyen@qnguyen3·2 Tem

Just dropped my first blog post (out of three) on getting started with AMD ROCm for AI! 🚀 Thanks to @tensorwave and @cognitivecompai for hooking me up with some sweet MI300x GPUs. If you're curious about AMD's answer to CUDA, check it out. open.substack.com/pub/qnguyen3/p…

English

1.8K

Brad Bokal retweetledi

Gary Parrish@GaryParrishCBS·21 Haz

Alex Rodriguez asked a question. Reggie Jackson answered it. (Shouts to the producer and rest of the desk for staying out of Reggie’s way and just letting him talk. I doubt they expected this answer. But it’s a great few minutes of television.)

English

2.2K

41.8K

163K

19.2M

Brad Bokal@bbokal·19 Haz

@Yuhu_ai_ @elonmusk before you go build on H100s let’s talk about MI300X and our capacity. Better performance and better cost. Let’s grab coffee in SF next week. We are hosting an event on the future of compute. Worth a quick chat? Hope so.

English

127

Brad Bokal@bbokal·14 Haz

@LiquidMetalAI Thanks for sharing. Ping me if you are interested in checking them out!

English

LiquidMetal AI@LiquidMetalAI·13 Haz

About what you'd expect with 60% more memory bandwidth, and more than 2x the memory capacity (30% more FP8 flops also helps a little but it's really the memory). #inference #amd #nvidia #GPUs blog.tensorwave.com/amds-mi300x-ou…

English

Brad Bokal@bbokal·14 Haz

@gravicle did you get the doughnuts? Talk GPUs? @tensorwave

English

Brad Bokal retweetledi

TensorWave@tensorwave·13 Haz

A TensorWave Report: AMD’s MI300X Outperforms NVIDIA’s H100 for LLM Inference There has been much anticipation around AMD’s flagship MI300X accelerator. With unmatched raw specs, the pressing question remains: Can it outperform NVIDIA’s Hopper architecture in real-world AI workloads? We have some exciting early results to share. Read the full article here: blog.tensorwave.com/amds-mi300x-ou…

English

154K

Brad Bokal retweetledi

@levelsio@levelsio·12 Haz

Can someone explain what @realGeorgeHotz did here? MLPerf is a benchmark suite that is used to evaluate training and inference performance of on-premises and cloud platforms, usually used for Nvidia GPUs? So he got an AMD GPU to benchmark on an Nvidia benchmark meaning he transliterated the Nvidia instructions for AMD GPU meaning if he continues that's Nvidia dominance over, right?

English

1.1K

961.3K

Keşfet

@qnguyen3 @arcee_ai @tensorwave @AMD @nvidia @LisaSu @FireworksAI_HQ @LeptonAI