Standard Kernel Co.

15 posts

Standard Kernel Co. banner
Standard Kernel Co.

Standard Kernel Co.

@Standard_Kernel

Building AI Infrastructure with AI; fast kernels go brrr

Palo Alto, CA انضم Eylül 2025
4 يتبع1.6K المتابعون
Standard Kernel Co. أُعيد تغريده
Jump Capital
Jump Capital@jumpcapital·
AI progress increasingly depends on how efficiently workloads run on hardware. @Standard_Kernel is tackling this challenge at the kernel level, unlocking more performance from modern GPUs. We're proud to lead their seed with @generalcatalyst, @CoreWeave, @felicis, & @ericsson
Jump Capital tweet media
English
0
2
11
881
Standard Kernel Co. أُعيد تغريده
Saaya Nath Pal
Saaya Nath Pal@saayanath·
It’s rare to find founders so perfectly and uniquely suited to solve a problem, let alone a problem of this magnitude and importance. Proud to lead @Standard_Kernel’s seed round.
Anne Ouyang@anneouyang

Excited to share @Standard_Kernel's seed round and some reflections on what we’ve learned about kernel generation and what we believe is next. Grateful to our amazing team, supporters, and the broader community pushing this space forward.

English
1
2
9
2.9K
Standard Kernel Co. أُعيد تغريده
Dylan Patel
Dylan Patel@dylan522p·
Anne is killing it. Here's my quote from the press release Kernel generation is key for improving performance and efficiency of AI hardware. As fleet sizes for users of AI hardware get larger, and more hardware diversity is introduced, Standard Kernel becomes key to deployment.”
Dylan Patel tweet media
Anne Ouyang@anneouyang

Excited to share @Standard_Kernel's seed round and some reflections on what we’ve learned about kernel generation and what we believe is next. Grateful to our amazing team, supporters, and the broader community pushing this space forward.

English
3
16
287
59.2K
Standard Kernel Co. أُعيد تغريده
Anne Ouyang
Anne Ouyang@anneouyang·
Excited to share @Standard_Kernel's seed round and some reflections on what we’ve learned about kernel generation and what we believe is next. Grateful to our amazing team, supporters, and the broader community pushing this space forward.
Anne Ouyang tweet media
English
45
44
508
120.8K
Standard Kernel Co.
Standard Kernel Co.@Standard_Kernel·
The choice of timing method (CUDA events, CUDA graphs, Nsight Compute, PyTorch profiler, etc.) can result in different measured GPU performance, and the effect depends on the workload. For microsecond-scale kernels, true execution time is often indistinguishable from measurement overhead and system variance. (7/9)
Standard Kernel Co. tweet media
English
0
0
2
454
Standard Kernel Co.
Standard Kernel Co.@Standard_Kernel·
Warm caches can make GPU kernels appear much faster, and sensitivity to cold vs. warm cache varies by workload. Be explicit about which condition you measure, as the appropriate choice depends on the behavior you want to evaluate. (6/9)
Standard Kernel Co. tweet mediaStandard Kernel Co. tweet media
English
1
1
9
1.3K
Standard Kernel Co.
Standard Kernel Co.@Standard_Kernel·
GPU timing is deceptively hard. Power limits, thermal state, clock behavior, caching, and measurement method all affect results in subtle ways. We explored sources of timing variation to obtain more reliable results for kernel benchmarking. (1/9)
Standard Kernel Co. tweet media
English
5
19
136
31.8K
Standard Kernel Co.
Standard Kernel Co.@Standard_Kernel·
Getting identical hardware from cloud providers is not guaranteed. Across three cloud providers all listing “A100 80GB,” we received different variants with differing clock limits, power caps, and driver environments. When benchmarking an identical GEMM, the runtime distributions formed distinct clusters for each provider. (9/9)
Standard Kernel Co. tweet media
English
0
2
14
1K
Standard Kernel Co.
Standard Kernel Co.@Standard_Kernel·
Compilation flags can change GPU performance on the same kernel + hardware. Sensitivity varies by workload, so uncontrolled flags can look like real algorithmic gains when they're just compiler effects. (8/9)
Standard Kernel Co. tweet media
English
1
1
10
1K
Standard Kernel Co. أُعيد تغريده
Anne Ouyang
Anne Ouyang@anneouyang·
Excited to share what friends and I have been working on at @Standard_Kernel We've raised from General Catalyst (@generalcatalyst), Felicis (@felicis), and a group of exceptional angels. We have some great H100 BF16 kernels in pure CUDA+PTX, featuring: - Matmul 102%-105% perf of cuBLAs in 100 lines of code - Attention 104% perf of FlashAttention3 in 500 lines - Fused Llama3 FFN 120% perf of PyTorch (gpt-fast) Reach out if you want to work on AI kernel gen with us!
Anne Ouyang tweet media
English
52
91
1K
207.4K