Underfox

17.6K posts

Underfox banner
Underfox

Underfox

@Underfox3

Physicist, Telecom Engineering lover, HPC Enthusiast. Prog Rock/Metal fan.

Katılım Aralık 2017
129 Takip Edilen9.6K Takipçiler
Sabitlenmiş Tweet
Underfox
Underfox@Underfox3·
Researchers have developed a new simulator to predict the throughput of basic blocks of all Intel Core μarchs released in the last decade, demonstrating to be more accurate than the predictions of state-of-the-art tools by more than an order of magnitude. arxiv.org/pdf/2107.14210…
Underfox tweet media
English
5
183
691
0
Underfox
Underfox@Underfox3·
@TDevilfish Nvidia would never allow a GROMACS benchmark on this before securing its sales... In the end, it's not surprising at all.
English
1
0
1
32
Underfox
Underfox@Underfox3·
Beyond the impressive gen-to-gen performance improvement, closing the gap between Nvidia CPUs and their direct competitors, we also need to analyze what the final price of the product will be. Certainly, we will soon see both AMD and Intel begin to move in response to this.
Phoronix@phoronix

NVIDIA @nvidia Vera CPU Benchmarks: Olympus Cores Delivering The Best Performance Ever Seen On ARM Exclusive first public benchmarks of NVIDIA's new Vera CPU. phoronix.com/review/nvidia-…

English
0
3
12
1.1K
Underfox
Underfox@Underfox3·
Furthermore, TritonMoE maintains cross-platform portability, validated on both NVIDIA A100 and AMD MI300X.
English
0
1
2
291
Underfox
Underfox@Underfox3·
The results show that, on an NVIDIA A100, TritonMoE achieves 89 - 131% of the throughput of the CUDA-optimized Megablocks at inference batch sizes (≤512 tokens) across Mixtral-8x7B, DeepSeek-V3, and Qwen2-MoE configurations.
English
1
1
3
409
Underfox
Underfox@Underfox3·
In this paper is presented TritonMoE, a fused MoE dispatch kernel written entirely in OpenAI Triton that performs the complete forward pass using only portable Triton primitives. arxiv.org/pdf/2605.23911
Underfox tweet mediaUnderfox tweet mediaUnderfox tweet media
English
1
8
39
2.5K
Underfox retweetledi
Luca Benini
Luca Benini@LucaBeniniZhFe·
It's not easy to outperform Moore using 3D logic folding (or 3D-IC): you need to align many planets. CMOS2.0 is the program initiated by @imec_int with top research partners to address the key challenges. See CMOS2.0 position paper with solid data here: arxiv.org/abs/2510.04535
Underfox@Underfox3

Nothing that Huawei has presented was groundbreaking to those truly familiar with semiconductors; even the LogicFolding strategy is not really big news. In fact, DARPA has been testing this strategy since 2017 in the FRANC program. top500.org/news/darpa-pic…

English
0
7
16
2K
Underfox
Underfox@Underfox3·
These findings highlight the importance of holistic, system-level power management for sustainable AI infrastructure. We hope these insights will guide future efforts in designing efficient, scalable AI datacenters.
Underfox tweet mediaUnderfox tweet mediaUnderfox tweet mediaUnderfox tweet media
English
0
1
4
315
Underfox
Underfox@Underfox3·
This work presents detailed power measurements for a 150 MW datacenter hosting a cluster of 83K GB200 GPUs connecting through RDMA back end network.
Underfox tweet mediaUnderfox tweet mediaUnderfox tweet mediaUnderfox tweet media
English
1
1
4
411
Underfox
Underfox@Underfox3·
Meta researchers described the end-to-end power management process for a hyperscale AI datacenter, from early power planning to tuning power settings after large-scale deployment, and finally to dynamic, runtime power management for evolving workloads. arxiv.org/pdf/2605.24461
Underfox tweet mediaUnderfox tweet mediaUnderfox tweet mediaUnderfox tweet media
English
1
7
28
2.4K
Underfox
Underfox@Underfox3·
Compared with a 2x H100 GPU baseline under identical hyperparameters, TPU training completes 1.61x faster at 2.12x lower cost. Inference throughput is within 3% across platforms, while TPU achieves 2x lower time-to-first-token.
Underfox tweet mediaUnderfox tweet media
English
1
1
4
334
Underfox
Underfox@Underfox3·
In this paper is presented the the first end-to-end demonstration of fine-tuning and serving Google’s Gemma 4 31B model on TPU hardware, providing an empirical comparison of TPU and GPU platforms for LLM adaptation. arxiv.org/pdf/2605.25645
Underfox tweet mediaUnderfox tweet mediaUnderfox tweet mediaUnderfox tweet media
English
3
5
12
960
Underfox
Underfox@Underfox3·
Nothing that Huawei has presented was groundbreaking to those truly familiar with semiconductors; even the LogicFolding strategy is not really big news. In fact, DARPA has been testing this strategy since 2017 in the FRANC program. top500.org/news/darpa-pic…
Underfox tweet media
English
21
19
125
15K
Underfox
Underfox@Underfox3·
Even when the pulse repetition frequencies of all terminals are the same, the proposed scheme can utilize the slight random drift between terminals to recover high-fidelity information.
Underfox tweet media
English
0
0
2
366
Underfox
Underfox@Underfox3·
The results show that the proposed scheme has wide frequency adaptability, which allows it to separate mixed signals with modulation-rate differences ranging from several million hertz to a few hertz.
Underfox tweet mediaUnderfox tweet media
English
1
0
3
399
Underfox
Underfox@Underfox3·
Researchers have experimentally demonstrated a single-photon Fourier transform scheme that exploits the implicit correlation shared in photon stream to separate mixed weak signals with high fidelity against extreme environments. #optics arxiv.org/pdf/2605.23611
Underfox tweet mediaUnderfox tweet media
English
1
5
17
1.4K
Underfox
Underfox@Underfox3·
"On pure performance when normalized on the SIMD/Vector length MCv3 on its peak efficiency point (16 cores) achieves 46% performance of Intel Sapphire Rapids server and 91% performance of NVIDIA Grace CPU superchip."
English
0
0
0
253
Underfox
Underfox@Underfox3·
The evaluation results show that the SG2044 more than doubles single-core performance and improves scalability compared to SG2042 (MCv2).
English
1
0
2
352
Underfox
Underfox@Underfox3·
In this brief paper is presented Monte Cimone v3, the third iteration of the Monte Cimone RISC-V HPC cluster, showing that commercially available RISC-V compute nodes are closing the gap with their competitors in the HPC segment. #HPC arxiv.org/pdf/2605.22831
Underfox tweet mediaUnderfox tweet mediaUnderfox tweet mediaUnderfox tweet media
English
1
3
12
1.5K