Signal65

281 posts

Signal65 banner
Signal65

Signal65

@Signal_65

We are here to ensure our partners become the signal of innovation in the noise of the technology markets.

Katılım Aralık 2023
77 Takip Edilen951 Takipçiler
Signal65 retweetledi
Ryan Shrout
Ryan Shrout@ryanshrout·
I continue to be amazed by the different bottlenecks that AI workloads are finding in the data center. This work the team @Signal_65 did with @HPE and @KamiwazaAI around storage performance is another one!
Signal65@Signal_65

As we get into @NVIDIAGTC week, one topic I expect to get some attention is on the storage front. GPU utilization in AI inference is a storage problem as much as a compute problem. @Signal_65 worked with @HPE and @KamiwazaAI to test KV-Cache offloading to the HPE Alletra Storage MP X10000, and the results were significant. Full report: signal65.com/research/maxim… Key findings from our testing: ➡️ Output token generation rates increased up to 19.4x compared to systems with no KV-Cache ➡️ Time to first token improved up to 21.5x vs. no KV-Cache ➡️ Even vs. memory-only offload, adding the X10000 delivered a 5.9x token rate increase and 5.6x TTFT reduction ➡️ RDMA for S3 storage delivered up to 2x the throughput of traditional S3 over HTTP with 80% lower latency and dramatically reduced CPU overhead That last point matters. The table below shows why GPU Direct Storage via RDMA changes the equation: ~20 GB/s+ throughput, 5.1x latency reduction, and consistent P99 performance where traditional S3 showed high jitter.

English
0
1
3
1.6K
Signal65
Signal65@Signal_65·
As we get into @NVIDIAGTC week, one topic I expect to get some attention is on the storage front. GPU utilization in AI inference is a storage problem as much as a compute problem. @Signal_65 worked with @HPE and @KamiwazaAI to test KV-Cache offloading to the HPE Alletra Storage MP X10000, and the results were significant. Full report: signal65.com/research/maxim… Key findings from our testing: ➡️ Output token generation rates increased up to 19.4x compared to systems with no KV-Cache ➡️ Time to first token improved up to 21.5x vs. no KV-Cache ➡️ Even vs. memory-only offload, adding the X10000 delivered a 5.9x token rate increase and 5.6x TTFT reduction ➡️ RDMA for S3 storage delivered up to 2x the throughput of traditional S3 over HTTP with 80% lower latency and dramatically reduced CPU overhead That last point matters. The table below shows why GPU Direct Storage via RDMA changes the equation: ~20 GB/s+ throughput, 5.1x latency reduction, and consistent P99 performance where traditional S3 showed high jitter.
Signal65 tweet media
English
0
3
28
331.2K
Signal65 retweetledi
Vadim Kai
Vadim Kai@vaddykai·
CPU performance plays a decisive role in accelerator-based #AI platforms. The faster the CPU retires host-side workload components, the higher its overall impact on throughput, performance per watt, and overall TCO. @Signal_65 found that @AMD #EPYC delivers notable performance improvements relative to competition. signal65.com/research/ai/im…
Vadim Kai tweet media
English
0
2
3
245
Signal65
Signal65@Signal_65·
What role does the host CPU play in AI inference performance? @Signal_65's latest research answers that question. signal65.com/research/ai/im… We benchmarked @AMD EPYC host nodes against competitive options across 7 distinct AI models, including GPT-OSS-120B, Llama-4-Scout, and DeepSeek-R1. The data shows clear advantages for AMD EPYC in throughput, time to first token, and inter-token latency. As AI workloads scale, the efficiency of the host node becomes a critical factor in overall system performance and TCO. Read the full report and watch our expert interview on Signal65.com!
Signal65 tweet media
English
0
2
7
3.8K
Signal65 retweetledi
Patrick Moorhead
Patrick Moorhead@PatrickMoorhead·
Lots of big data center AI news today. @intel + @SambaNovaAI announce a multi-year collaboration for Xeon-based AI inference, plus Intel Capital into SambaNova’s $350M+ Series E. Makes sense to me. Intel is trying to stay attached to inference spend in the rack while its GPU roadmap ramps. A CPU-anchored inference stack plus go-to-market leverage through Intel’s channels. SambaNova is driving a non-GPU path for agentic inference with SN50 and a vertically integrated cloud. Big claims: up to 5x max speed and 3x lower TCO vs GPUs. SoftBank (the telco) is the first named SN50 deployment. Would love to have @Signal_65 test all the claims. What I’m watching: time to first token, tokens per second per watt, model coverage, software portability, and how fast this turns into real deployments. Net-net is that inference economics via API is where the puck is headed, and Intel wants a seat even before its new GPUs are ready. $INTC
Patrick Moorhead tweet media
English
5
10
89
14K
Signal65 retweetledi
Signal65
Signal65@Signal_65·
While public pricing from Oracle Cloud shows that MI355X is cheaper per GPU-hour, the NVIDIA GB200 NVL72 per-GPU throughput advantage more than makes up for that. We see ~5.85x at 25 tokens/sec/user, and up to 28x at 75 tokens/sec/user, translating to as much as ~1/15th relative cost per token on DeepSeek-R1. Read the research: signal65.com/research/ai/fr…
Signal65 tweet media
English
0
1
3
354
Signal65
Signal65@Signal_65·
The next AI chip war is not focused on training. It is inference. And it continues to get hyperscaler-specific. ⚡ @Signal_65 just published an editorial on Microsoft Azure Maia 200, and Maia 200 is built as a token generation economics engine, not a general-purpose “do everything” accelerator. Report: signal65.com/research/azure… The attention-grabber is FP4: 10,145 TFLOPS FP4 (dense), positioning that FP4 bet directly against other hyperscaler silicon (TPU v7, Trainium3). But another important story is the system design choices that map to real serving pain. ⚪️ Memory hierarchy: 216 GB HBM3E (7 TB/s) paired with 272 MB on-die SRAM (80 TB/s) to keep high-value data local and avoid memory-bound stalls. ⚪️ Scale-up networking philosophy: integrated NIC and a two-tier scale-up design using commodity Ethernet building blocks, aiming for predictable collectives and better utilization. ⚪️ Software is a co-equal product: PyTorch bring-up paths, a Triton-based compiler pipeline, kernel libraries, and tooling built with pre-silicon readiness in mind. This is Microsoft leaning into a reality the market is finally admitting. Inference is an “efficient frontier” (latency, throughput, accuracy, cost, energy), and no single chip wins every point on that curve. That is why Maia 200 is positioned as part of a heterogeneous Azure fleet alongside NVIDIA and AMD options. The real scoreboard is still coming with some hands-on testing. Specs are directional. The market is going to demand sustained tokens/sec on representative models, quantization guidance (FP4/FP8), scaling efficiency at realistic sizes, and clarity on how customers actually consume Maia in Azure.
Signal65 tweet media
English
1
1
3
1.5K
Signal65 retweetledi
NVIDIA Data Center
NVIDIA Data Center@NVIDIADC·
AI reasoning models — especially mixture-of-experts (MoE) architectures — are reshaping the economics of intelligence. They generate exponentially more tokens and require constant, high-speed coordination across GPUs, turning cost per token into the defining measures of AI performance. The real differentiator? Extreme co-design across compute, networking, software — and our partner ecosystem. Hear from us, @Signal_65, @CoreWeave, and @Microsoft as we describe how extreme co-design is essential to scaling AI. 📹 Watch on YouTube: nvda.ws/4rO6z99 🔗 Learn more: nvda.ws/4ajHdtW
English
13
39
224
5.8K
Signal65
Signal65@Signal_65·
Efficiency is the new benchmark. 🔋 Signal65 lab testing shows the @MediaTek Kompanio Ultra 910 delivers nearly 5x the efficiency of the latest x86 Chromebooks and uses up to 85% less power under load. All-day battery + flagship performance = no more compromise. Full report: signal65.com/research/the-k…
Signal65 tweet media
English
1
3
15
210.1K
Signal65 retweetledi
MediaTek
MediaTek@MediaTek·
@Signal_65 Thanks for showcasing the capabilities of MediaTek Kompanio Ultra 910 and helping to redefine expectations for Chromebook Plus performance.
English
0
1
1
218
Signal65
Signal65@Signal_65·
Signal65 recently completed an architectural comparison between the @MediaTek Kompanio Ultra 910 and the latest x86-based alternatives. Using Geekbench 6.5.0, we found that the Kompanio Ultra 910 consistently outperforms leading x86 Chromebook competitors in both single-thread and multi-thread workloads. Key Takeaways: 🚀 Up to 2x faster CPU performance. 🎮 Up to 9x faster graphics. 🔋 Superior architectural efficiency for ChromeOS. Check out the full data-driven report: 🔗 signal65.com/research/the-k…
Signal65 tweet media
English
2
1
7
175.6K
Signal65 retweetledi
Signal65
Signal65@Signal_65·
Earlier this summer we looked at the performance of the latest @AIatAMD MI355X accelerator in a report on @Signal_65. We recently sat down with a couple of @AMD leaders to talk about the implications of MI355X on the market. youtube.com/watch?v=3MJkOT…
YouTube video
YouTube
English
1
2
4
1.3K
Signal65
Signal65@Signal_65·
The Agentic AI Era Needs Different Benchmarks. As enterprise AI moves from simple chatbots to complex "agentic" workflows, traditional benchmarks are hitting a wall. They often suffer from data contamination (memorization) and fail to reflect real-world business tasks. The latest report from @Signal_65 that uses the @KamiwazaAI KAMI v0.1, a benchmark that uses dynamic randomization to ensure models are actually reasoning, not just repeating training data. After testing across 170,000 items and 5.5 billion tokens, the results are incredibly interesting. Key Highlights from the Report: ✅ GPT-5 Leadership: GPT-5 (Medium Reasoning) takes the #1 spot with a staggering 95.7% mean accuracy score, showcasing leading agentic capability. ✅ The Rise of Competitors: GLM-4.6 (#2) and DeepSeek-v3.1 (#3) are showing incredible strength, outperforming many established proprietary models. ✅ Preventing "Gaming": Unlike static benchmarks, KAMI’s randomization makes it nearly impossible for models to "memorize" the test, providing a true measure of enterprise readiness. ✅ Reasoning Matters: The data shows that "Medium Reasoning" modes significantly boost task success rates across the board. For anyone evaluating model deployment for agentic tasks, this report is a must-read for understanding the trade-offs between open and proprietary leadership. Read the full analysis here: signal65.com/research/ai/be…
Signal65 tweet media
English
1
1
3
1.1K
Signal65
Signal65@Signal_65·
Many enterprise Gen AI projects do not fail because the model is “not smart enough.” They fail because the data is not compute-ready. On the @Signal_65 AI Lab page (signal65.com/ai-lab/) we just published Data Preparation for Enterprise AI: Compute-Ready Data with Dell AI Infrastructure (signal65.com/wp-content/upl…), and the message is that if you treat documents as undifferentiated text, you get undifferentiated outcomes. Here’s the practical shift we think matters: Stop making the LLM re-interpret raw documents on every query. Traditional RAG often forces the model to re-read and re-reason in real time, which creates inconsistency, higher latency, and higher cost. Instead, move the heavy work to ingestion with “compute-ready” prep: ✅ Topic-based decomposition (split by topic boundaries, not blind 500-word chunks) ✅ Entity indexing (people, places, events become structured objects you can query) ✅ Enrichment at ingestion (summaries, sentiment, synthetic Q&A) so “intelligence” lives in metadata, not in last-second token roulette 🎲 ✅ Governed retrieval where permissions and citation lineage inherit from the source document, reducing leakage risk and improving chain-of-custody One concept we keep coming back to is deterministic reliability vs. probabilistic risk. When the system retrieves a pre-computed “golden record” answer from a semantic representation of the content, you get repeatability and control. That is what enterprise teams actually buy. There’s also a clear infrastructure angle: the report highlights on-prem document processing benefits like predictable ingestion latency (no network round-trips and queueing delays), parallelized reasoning during ingestion, and deterministic scaling under load. If you are building RAG or agentic workflows, the question is not “Which model?” first. It is: How are you turning raw documents into governed, queryable, reusable assets? Read the full report here: signal65.com/wp-content/upl… And see the infographic on these results here: signal65.com/wp-content/upl… @DellTech
Signal65 tweet media
English
2
0
1
1.6K
Signal65 retweetledi
Ryan Shrout
Ryan Shrout@ryanshrout·
Intel Core Ultra Series 3 reviews just landed, and the early vibe is…unsurprisingly bullish. The headline is not “another laptop CPU.” It is Intel shipping a flagship client platform on @intel 18A, with a real attempt to win back performance-per-watt. Initial reviews are strongly focused on two positives: battery life and integrated graphics muscle. Looking over the first rounds of reviews I mostly see Panther Lake positioned as a major win, citing big graphics gains (maybe even RTX 4050-adjacent for entry gaming) plus “up to 22 hours” or more battery life in testing, even if single-core still trails and thermals matter. CPU MT performance is better vs the previous generation Lunar Lake, but about flat with last year's H-series. Intel marketing does still lean into AI PC framing: up to 50 NPU TOPS and the better integrated graphics as leading the story. But the real story is around hybrid AI - more on that later. There are of course plenty of question marks we need to see addressed: perf on battery across a wide range of designs, real-world usable battery life across a range of workloads, and how the new parts from Snapdragon and AMD will change the landscape later this quarter. The real test starts now: first laptops are slated to go on sale tomorrow. Intel has the "first out the gate" advantage.
Ryan Shrout tweet mediaRyan Shrout tweet media
English
8
12
139
11.7K
Signal65
Signal65@Signal_65·
Edge compute is not getting harder because servers are weaker. It’s getting harder because operations are now distributed across datacenters and edge sites, where oversight, patching, and security consistency get messy fast. The @Signal_65 report is a good reminder that “edge ready” is a platform story, not just a box story. On the hardware side, @HPE ProLiant DL145 Gen11 (AMD EPYC 8124P) held up under heat stress with minimal latency impact (<2%) from 75°F to 105°F, while staying below 50 dB during an AI inference workload. But the more important angle is the unified ops stack: iLO 7 (security and lifecycle control) plus HPE Compute Ops Management (cloud-native fleet management). HPE positions this combo as an advantage in automation, policy consistency, and predictive insights, especially versus competitor tooling. One stat jumped out: a @TheFuturumGroup survey cited in the report says remote management is a top requirement for over 60% of enterprises deploying hybrid and edge workloads. That’s the market telling us what matters. Read the report: signal65.com/research/the-u…
English
1
1
3
1K
Signal65 retweetledi
Ryan Shrout
Ryan Shrout@ryanshrout·
I feel like I'm running in circles sometimes! 🤪 Mixture-of-Experts inference performance keeps moving and improving quickly, and the latest NVIDIA update is another good reminder that the stack is still in heavy motion. New TensorRT-LLM improvements (up to 2x+ token throughput on NVIDIA GB200 NVL72) plus Blackwell platform tuning are showing sizeable gains on MoE workloads, which is good news for the industry, but also a reminder that any point-in-time performance comparison needs constant re-validation. NVIDIA HGX B200 gains come from combining multi-token prediction with NVFP4 precision type, which pushes more throughput while keeping the focus on real serving behavior. (NVFP4 is how NVIDIA makes FP4 usable for real models: it stores FP4E2M1 values plus scaling information) Net: progress like this is great, and it reinforces the idea that inference efficiency and cost per token are an "always-on" adventure. Keeping up means watching hardware, software, precision modes, and serving strategies evolve in near real time.
NVIDIA Data Center@NVIDIADC

🚀 Raising the bar for AI inference. The NVIDIA Blackwell architecture delivers world-class token throughput across every scale: 📈 New NVIDIA TensorRT-LLM upgrades boost MoE performance, delivering 2x+ token throughput on NVIDIA GB200 NVL72. 📈 NVIDIA HGX B200 delivers massive gains with multi-token prediction and NVFP4 precision. Learn how #NVIDIABlackwell continues to push token throughput higher across AI models ➡️ nvda.ws/45BVWxx

English
0
1
7
1.4K
Signal65 retweetledi
Ryan Shrout
Ryan Shrout@ryanshrout·
The new @DellTech XPS is definitely one of the nicest PCs in recent years. They brought back the stuff that matters (function keys, trackpad markers!) and adds small refinements to a sleek and solid design. And they worked to hyper-optimize for battery life. Can’t wait to test!
Ryan Shrout tweet mediaRyan Shrout tweet media
English
1
2
22
2K