Jsson

168 posts

Jsson

@jssonxia

Building scalable performance tools | Rice CS & Cornell ECE | Performance engineering + ML systems

Houston, TX Katılım Haziran 2018

110 Takip Edilen439 Takipçiler

Jsson retweetledi

the tiny corp@__tinygrad__·4 Nis

VIZ=1 now includes a memory profiler. No code changes required, just VIZ=1 on the environment (only a ~10% perf hit). Our goal is to make the most easy to debug framework, have you tried running with DEBUG=2?

English

156

8.3K

Jsson retweetledi

the tiny corp@__tinygrad__·3 Nis

@antlionai Bullish on AMD. Not bullish on Intel. geohot.github.io/blog/jekyll/up…

English

145

7.3K

Jsson retweetledi

John Carmack@ID_AA_Carmack·13 Oca

When I was at Meta, I asked the hardware team what I should read to learn the details of modern consumer HW development so I could work more effectively with them. The consensus was that there really isn’t good literature; you just have to “be there”. Has anything emerged since?

English

174

163

4.4K

597.4K

Jsson retweetledi

Paul Graham@paulg·7 Oca

@roshanpateI Jessica and I decided to start YC on a date.

English

1.7K

309.3K

Jsson retweetledi

Andrej Karpathy@karpathy·15 Kas

Probably not what you want to hear but docs 😅. Actual real life examples. Better and more comprehensive kwarg docs. More helpful links to actual code not just wrapper of wrapper of wrapper code. Example code of larger apps showing best practices (style of torch titan, nanoGPT or etc). Helpful historical context if any, possibly links to useful issues. In process of my zero to hero videos I think I’ve come by ~10 examples of bad, incomplete, unhelpful or misleading docs where you just kinda have to know somehow.

English

1.3K

55.2K

Jsson retweetledi

Andrej Karpathy@karpathy·14 Kas

I'm not sure that enough people subscribe to the @Smol_AI newsletter. It's 1 very comprehensive email per day summarizing AI/LLM chatter across X, Reddit, Discord. There's probably others (feel free to reply), but I like this one quite a bit, ty again to @swyx and team.

English

129

168

2.6K

257.6K

Jsson@jssonxia·19 Eki

@DmitrySoshnikov are there any plans to offer MLIR courses

English

Dmitry Soshnikov@DmitrySoshnikov·18 Eki

Explore Compiler Engineering path dmitrysoshnikov.com/courses/compil… + new promos

English

1.6K

Jsson retweetledi

Ross Tate@rossetate·18 Eki

As the author of this PDF, it's been interesting seeing people guess at the rationale behind its design. However, the rationale had nothing to do with theory vs practice, and everything to do with pragmatically coping with an unaccommodated disability in academia. (1/16)

Deedy@deedydas

Compilers was was known to be the hardest CS class at Cornell which was hard as it is. We were handed a 8-page PDF at the start of sem for a language spec we'd be implementing by the end of sem, split into 6 parts. On part 5, the median was a 0/100 and most the class failed.

English

127

1.5K

16.2K

3.4M

Jsson retweetledi

Elon Musk@elonmusk·10 Eki

And all transport will be fully autonomous within 50 years

Tim Urban@waitbutwhy

How people got around in 190 BC: horseback, horse-drawn carriage, sailboat 2,000 years later... How people got around in 1810: horseback, horse-drawn carriage, sailboat 160 years later... How people got around in 1970: bike, train, subway, car, bus, airplane, spaceship

English

9.1K

7.5K

89.5K

70.8M

Jsson@jssonxia·11 Mar

research.colfax-intl.com/adding-fp8-to-…

ZXX

110

Jsson retweetledi

Zihao Ye@ye_combinator·5 Şub

(1/4) Announcing FlashInfer, a kernel library that provides state-of-the-art kernel implementations for LLM Inference/Serving. FlashInfer's unique features include: - Comprehensive Attention Kernels: covering prefill/decode/append attention for various KV-Cache formats (Page Table, Ragged Tensor, etc.) for both single-request and batch-serving scenarios. - Optimized Shared-Prefix Batch Decoding: 31x faster than vLLM's Page Attention implementation for long prompt large batch decoding. - Efficient Attention for Compressed KV-Cache: optimized grouped-query attention with Tensor Cores (3x faster than vLLM's GQA), fused-RoPE attention, and high-performance quantized attention. Check our blog and code at: 1. flashinfer.ai/2024/02/02/int… 2. github.com/flashinfer-ai/…

English

292

58.1K

Jsson retweetledi

Horace He@cHHillee·15 Mar

Everybody wants their models to run faster. However, researchers often cargo cult performance without a solid understanding on the underlying principles. To address that, I wrote a post called "Making Deep Learning Go Brrrr From First Principles". (1/3) horace.io/brrr_intro.html

English

389

2.1K

Jsson retweetledi

Delip Rao e/σ@deliprao·26 Oca

Crazy AF. Paper studies @_akhaliq and @arankomatsuzaki paper tweets and finds those papers get 2-3x higher citation counts than control. They are now influencers 😄 Whether you like it or not, the TikTokification of academia is here! arxiv.org/abs/2401.13782

English

269

1.6K

417.2K

Jsson retweetledi

Indranil Gupta@indygupta·1 May

Types of Distributed Systems Papers. Joke modeled after @xkcd 's xkcd.com/2456/ #distributedsystems #distributedsystemsjokes

English

248

Jsson retweetledi

Elaine Shi@ElaineRShi·18 Ağu

Thanks to the amazing @AndrewCMyers, video of our PL-crypto workshop is out! Check out youtube.com/watch?v=LnR-lw… List of talks are available here: andrewcmyers.github.io/plcrypt/

YouTube

English

Jsson retweetledi

freeyao(Last Fucking Generation)@TimeOfSand·1 Ağu

Apache 软件基金会一年花多少钱？2020/2021财年，$1.6M，收入则是$3.0M，即使去掉最大头的赞助商计划，仅凭会议收入和公共捐赠，其收入也有$1.1M。值得注意的是，相比上一财年公共捐赠的收入增加了12倍。谁愿意研究开源软件基金会的发展，探讨背后的原因。原语里弄可设置课题 apache.org/foundation/doc…

中文

Keşfet

@antlionai @roshanpateI @Smol_AI @swyx @DmitrySoshnikov @_akhaliq @arankomatsuzaki @xkcd