Brad Bokal

2.9K posts

Brad Bokal banner
Brad Bokal

Brad Bokal

@bbokal

Be a good human. BD @TensorWaveCloud

Colorado, USA Katılım Ekim 2010
0 Takip Edilen4.6K Takipçiler
Brad Bokal retweetledi
TechCrunch
TechCrunch@TechCrunch·
TensorWave thinks it can break Nvidia’s grip on AI compute with an AMD-powered cloud tcrn.ch/3NnQ80Z
English
3
6
21
22.4K
Brad Bokal
Brad Bokal@bbokal·
@qnguyen3 @arcee_ai That’s awesome - been trying to get Brian and Jacob to meet to talk @tensorwave and see if we can collaborate - maybe we can meet once you get there.
English
0
0
2
35
Brad Bokal retweetledi
Artificial Analysis
Artificial Analysis@ArtificialAnlys·
Llama 3.1 405B could be the catalyst for much greater #AMD adoption for AI inference 📈 @AMD's MI300X may be uniquely suited to cost-effective Llama 3.1 405B inference. Its 192GB of memory allows a single 8xMI300X node to serve Llama 3.1 405B in its native FP16 precision - whereas two 8xH100 nodes are required on @nvidia. As we have previously covered, a single NVIDIA 8xH100 node only has 640GB of memory - not enough to hold Llama 3.1 405B’s full 810GB of FP16 weights in memory at once. This means that providers are forced to deploy two 8xH100 nodes with interconnect to serve 405B in FP16 precision, forcing them to accept a significant cost and complexity penalty. Nvidia’s future H200 and B100 come with 141GB and 192GB of high bandwidth memory respectively - but unlike those, AMD MI300X is available now. @LisaSu noted on AMD’s Q2 earnings call that AMD was demand-limited on MI300X for the remainder of 2024. Will Llama 3.1 405B alone flip that narrative? We are starting to see adoption and support increase. Both @FireworksAI_HQ and @LeptonAI are hosting Llama 3.1 405B on AMD MI300X chips. They stand out as the lowest cost providers of Llama 3.1 405B. However, it is important to note they are serving the model at FP8 and INT8 precision respectively. Furthermore, projects like GPU.cpp from @answerdotai (@jeremyphoward, @austinvhuang) are making it easier than ever to write and run portable code across different chip (hardware & software) architectures - decreasing the CUDA lock-in. What is your view? Long #AMD?
Artificial Analysis tweet media
English
3
40
179
37.8K
Brad Bokal retweetledi
TensorWave
TensorWave@tensorwave·
Researchers have introduced a new method to speed up long context windows in LLMs. Their adaptive structured sparse attention mechanism reduces Time-to-First-Token latency without affecting accuracy or needing extra training. Learn more: buff.ly/4bR7NYI #LLMs #research
TensorWave tweet media
English
0
3
5
549
Brad Bokal retweetledi
ollama
ollama@ollama·
ollama run llama3.1:405b Tested in @tensorwave with @AMD MI300X 🤯
English
31
126
1.1K
98.1K
Brad Bokal retweetledi
TensorWave
TensorWave@tensorwave·
Milestone Unlocked 🚀 We have achieved FP8 on @AMD's MI300X. Discover the implications for AI workloads in our latest blog post. Click to learn more! 👉 buff.ly/3zNX7gp
TensorWave tweet media
English
4
5
19
3.2K
Brad Bokal retweetledi
J͓T͓
J͓T͓@jt___xyz·
New achievement unlocked! 🔐 Thanks to our friends over at @mkoneai & @Gradient_AI_ , we've cracked the code on a real-time chat with 1M context window using Llama 70B! Which is cool by itself, however, thanks to our MI300X's and their massive 192GB of memory per card; These accelerators are pivotal for running long context models efficiently, allowing model parameters and large context caches to be stored on fewer cards.😎 Also I'm pretty sure the only company that's tackling this is Google’s Gemini 1.5 Pro. However, Google's impressive models come with significant limitations: ❌ Feature Gaps: No real-time chat with cached context ❌ Limited Customization: Minimal fine-tuning capabilities ❌ Scalability Constraints: API rate limits restrict large-scale deployments ❌ Cost Inefficiency: High expenses for long context token usage 🔗 Read the full report here: lnkd.in/gZSiyiB9
J͓T͓ tweet media
English
0
5
24
1.7K
Brad Bokal retweetledi
Gary Parrish
Gary Parrish@GaryParrishCBS·
Alex Rodriguez asked a question. Reggie Jackson answered it. (Shouts to the producer and rest of the desk for staying out of Reggie’s way and just letting him talk. I doubt they expected this answer. But it’s a great few minutes of television.)
English
2.2K
41.8K
163K
19.2M
Brad Bokal
Brad Bokal@bbokal·
@Yuhu_ai_ @elonmusk before you go build on H100s let’s talk about MI300X and our capacity. Better performance and better cost. Let’s grab coffee in SF next week. We are hosting an event on the future of compute. Worth a quick chat? Hope so.
English
0
0
3
127
Brad Bokal
Brad Bokal@bbokal·
@LiquidMetalAI Thanks for sharing. Ping me if you are interested in checking them out!
English
0
0
0
20
Brad Bokal retweetledi
TensorWave
TensorWave@tensorwave·
A TensorWave Report: AMD’s MI300X Outperforms NVIDIA’s H100 for LLM Inference There has been much anticipation around AMD’s flagship MI300X accelerator. With unmatched raw specs, the pressing question remains: Can it outperform NVIDIA’s Hopper architecture in real-world AI workloads? We have some exciting early results to share. Read the full article here: blog.tensorwave.com/amds-mi300x-ou…
TensorWave tweet media
English
18
22
92
154K
Brad Bokal retweetledi
@levelsio
@levelsio@levelsio·
Can someone explain what @realGeorgeHotz did here? MLPerf is a benchmark suite that is used to evaluate training and inference performance of on-premises and cloud platforms, usually used for Nvidia GPUs? So he got an AMD GPU to benchmark on an Nvidia benchmark meaning he transliterated the Nvidia instructions for AMD GPU meaning if he continues that's Nvidia dominance over, right?
English
31
49
1.1K
961.3K