HGPU group
10.6K posts

HGPU group
@hgpu
High performance computing on graphics processing units (GPU): AMD/ATI, nVidia, Intel Xeon Phi, CUDA, OpenCL, OpenGL, GPGPU, HPC
Katılım Mayıs 2011
118 Takip Edilen3.8K Takipçiler

True 4-Bit Quantized Convolutional Neural Network Training on CPU: Achieving Full-Precision Parity
#Precision #CNN #Package
hgpu.org/?p=30680
English

KernelFoundry: Hardware-aware evolutionary GPU kernel optimization
#CUDA #SYCL #LLM
hgpu.org/?p=30679
English

An Efficient Heterogeneous Co-Design for Fine-Tuning on a Single GPU
#Triton #NVIDIA #AMD #LLM
hgpu.org/?p=30678
English

KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization
#CUDA #LLM #Performance #Package
hgpu.org/?p=30665
English

AgentServe: Algorithm-System Co-Design for Efficient Agentic AI Serving on a Consumer-Grade GPU
#CUDA #LLM
hgpu.org/?p=30663
Català

EvoScientist: Towards Multi-Agent Evolving AI Scientists for End-to-End Scientific Discovery
#LLM #AI #Package
hgpu.org/?p=30662
English

Diagnosing FP4 inference: a layer-wise and block-wise sensitivity analysis of NVFP4 and MXFP4
#LLM #FP4 #NVFP4 #MXFP4 #Precision #AMD #NVIDIA
hgpu.org/?p=30661
English

CONCUR: Benchmarking LLMs for Concurrent Code Generation
#CodeGeneration #LLM #Package
hgpu.org/?p=30644
English

RepoLaunch: Automating Build & Test Pipeline of Code Repositories on ANY Language and ANY Platform
#LLM #Package
hgpu.org/?p=30643
English

Catalyst-Agent: Autonomous heterogeneous catalyst screening and optimization with an LLM Agent
#Chemistry #LLM #Catalyst
hgpu.org/?p=30641
English

Practical FP4 Training for Large-Scale MoE Models on Hopper GPUs
#CUDA #LLM #Hopper #FP4 #Precision #Package
hgpu.org/?p=30640
English

CUDABench: Benchmarking LLMs for Text-to-CUDA Generation
#CUDA #LLM #Benchmarking #Package
hgpu.org/?p=30630
English

StitchCUDA: An Automated Multi-Agents End-to-End GPU Programing Framework with Rubric-based Agentic Reinforcement Learning
#CUDA #CodeGeneration #LLM
hgpu.org/?p=30629
English

CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation
#CUDA #CodeGenerarion #LLM #Package
hgpu.org/?p=30628
English

CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models
#CodeGeneration #LLM #Package
hgpu.org/?p=30620
English

CL4SE: A Context Learning Benchmark For Software Engineering Tasks
#CodeGeneration #LLM #Package
hgpu.org/?p=30619
English

From Prompts to Performance: Evaluating LLMs for Task-based Parallel Code Generation
#OpenMP #LLM #CodeGeneration
hgpu.org/?p=30618
English