Lorenzo Garcia

51 posts

Lorenzo Garcia

Lorenzo Garcia

@_fmla_

Manchester, England Katılım Mart 2021
328 Takip Edilen3 Takipçiler
Jiayi Yuan
Jiayi Yuan@jiayiy·
🚀 BLASST just won Best Paper at #MLSys26! In this paper, we introduce a simple, training-free dynamic sparse attention mechanism that uses a single scalar threshold on online softmax statistics to skip negligible attention blocks. Unfortunately I won’t be there in person, but please say hi to my awesome coauthors! 🙌 Paper: arxiv.org/abs/2512.12087
Jiayi Yuan tweet media
SemiAnalysis@SemiAnalysis_

Sparse attention mechanisms are finally moving beyond academic benchmarks into production systems, including DeepSeek Sparse Attention, and recently @NousResearch 's Lighthouse Attention. BLASST by NVIDIA, from paper Dynamic Blocked Attention Sparsity via Softmax Thresholding, attempts to sparsify attention in a different way, leveraging a similar rescale factor threshold idea from Flash Attention 4. We expect to see more interesting sparse attention techniques in the future. arxiv.org/abs/2512.12087 (2/4)

English
20
52
359
40.3K
Lorenzo Garcia
Lorenzo Garcia@_fmla_·
@always_ff_rohan > FPGAs can hit 400 GFLOPs, basically replacing a 16-core server. 400 Gflops is nothing in the grand scheme of things. A single Graviton 4 core can do around 360 bf16 GFlops, so i don’t know which 16 cores are you planning to replace with this fpga
English
0
0
2
87
Rohan makes ASICs 🛠️
Rohan makes ASICs 🛠️@always_ff_rohan·
Baidu and a bunch of other companies have already been using FPGAs for AI. Meanwhile, Indian AI/DL startups are just burning money at GPUs. For instance on matmul, FPGAs can hit 400 GFLOPs, basically replacing a 16-core server. For DL workloads with small batch sizes, FPGAs can actually beat both GPUs and CPUs in terms of throughput.
Rohan makes ASICs 🛠️ tweet media
English
20
13
145
20.2K
Edward Z. Yang
Edward Z. Yang@ezyang·
Give me your O(num operators) PyTorch improvement ideas that you are interested in. Historical examples: making every kernel deterministic / support zero size. Not done: every kernel in your favorite DSL / batch invariant / masked / padded / device side size
English
7
2
58
11.3K
kache
kache@yacineMTB·
@difficultyang i would simply just not use torch at all and write cuda directly
English
7
0
21
3.7K
difficultyang
difficultyang@difficultyang·
Let's say we wanted to rewrite PyTorch from scratch, because such a thing is topic du jour in the age of LLMs. What would the goals of such a rewrite be? What problems could a rewrite solve that incremental evolution from where the code is today not? 🧵
English
24
44
620
72.2K
Lorenzo Garcia
Lorenzo Garcia@_fmla_·
@bubbleboi taylor series approximation of exp is not an innovation. people have been doing this for ages
English
0
0
0
28
Benji Hyam
Benji Hyam@benjihyam·
ChatGPT is the next clubhouse. In 12 months, people will forget it existed.
English
1.3K
284
4.8K
1.5M
Jenny
Jenny@JennyTheDev·
@AnthropicAI 24,000 fake accounts and 16 million exchanges just to avoid training their own models. The "build vs steal" decision tree is getting wild.
English
6
0
21
11.2K
Anthropic
Anthropic@AnthropicAI·
We’ve identified industrial-scale distillation attacks on our models by DeepSeek, Moonshot AI, and MiniMax. These labs created over 24,000 fraudulent accounts and generated over 16 million exchanges with Claude, extracting its capabilities to train and improve their own models.
English
7.2K
6.2K
54.6K
33.8M
GopiNath
GopiNath@gopinath9629·
@mkurman88 Llama.cpp doing well I guess but still its lack behind when its coming to multiple gpu and vllm don't support cpu i am currently expolaring exo and then distributed llama for my setup.
English
2
0
3
2.3K
Mariusz Kurman
Mariusz Kurman@mkurman88·
Is vllm broken? How can two identical requests produce different outputs? Temperature 0.0, seed 42 - how the hell can you ensure determinism?
English
35
0
97
28.3K
Lorenzo Garcia
Lorenzo Garcia@_fmla_·
@istoica05 17x speedup means that the baseline was either not written by experts or the op was not important enough to be optimised
English
0
0
1
40
Ion Stoica
Ion Stoica@istoica05·
Optimizing kernels for emerging hardware architectures remains a critical bottleneck today. This new blog post describes Autocomp, a new ADRS (AI-Drivern Research for Systems) framework that leverages LLM-driven search to automate this process and achieves up to a 17x speedup over expert-written code on AWS Trainium.
AI-Driven Research for Systems@ai4research_ucb

🚀 AI optimizes tensor kernels to run 17x faster than human expert designs! [ADRS Blog] Programming hardware accelerators is notoriously hard. We describe Autocomp, the first LLM-driven optimizer for tensor accelerators, which outperforms hand-tuned expert kernels on AWS Trainium by up to 17x! ✍️ Read the blog: adrs-ucb.notion.site/autocomp 📖 ADRS Blog Series: ucbskyadrs.github.io 📃 Autocomp Paper: arxiv.org/pdf/2505.18574 👩‍💻 Code: github.com/ucb-bar/autoco…

English
3
30
261
29.9K
Susan Zhang
Susan Zhang@suchenzang·
@_fmla_ ilya would never need linkedin recruiters
English
1
0
42
2.4K
Susan Zhang
Susan Zhang@suchenzang·
be bold take risks aim to build super-intelligence before even reaching baseline intelligence
Susan Zhang tweet mediaSusan Zhang tweet mediaSusan Zhang tweet media
English
17
3
256
43K
SO?
SO?@noor_supernova7·
For those who do not know, the occupation of Gaza has begun. If you’re scrolling, PLEASE leave a dot . it's just a dot.
English
34.3K
34.5K
244K
5.7M
Hasan alrabay
Hasan alrabay@HasanEssam29636·
I’m very hungry. My body is slowly falling apart from malnutrition, dizziness, and weight loss. If you’re scrolling, PLEASE leave a dot . it's just a dot
English
52K
40.8K
346.4K
8.3M
Chris
Chris@Chrisgpt·
can’t sleep, who wants to be added to a gc?
English
29
0
72
6.9K
Gavin M. Wax
Gavin M. Wax@GavinWax·
Silicon Valley is a strategic national asset whose jobs should be legally reserved for U.S. citizens. It’s a matter of national security. This shouldn’t be controversial.
English
852
464
3.4K
5.1M
Lorenzo Garcia
Lorenzo Garcia@_fmla_·
@SupaDupaChop That’s not what they’re saying in the video. They’re encouraging julani to strike tel aviv
English
0
0
0
102
Jason Wei
Jason Wei@_jasonwei·
There are traditionally two types of research: problem-driven research and method-driven research. As we’ve seen with large language models and now AlphaEvolve, it should be very clear now that total method-driven research is a huge opportunity. Problem-driven research is nice because you have a consistent and specific goal. The goal is usually virtuous, so it feels good to have a mission and identity. However, it just doesn’t work due to The Bitter Lesson. Basically everything in classical NLP (machine translation, summarization, chatbots) lost to simple scaling. ChatGPT is a prime example—it used nothing from chatbot research and certainly wasn’t the intended end goal of OpenAI’s 2022 research program, but was a huge hit because someone (John Schulman et al) figured out the right way to package large language models as a product. Method-driven research feels less stable because you’re constantly searching for problems and you have to be opportunistic. But I believe AI will allow method-driven research to dominate progress in most fields of science, one-by-one. The latest method (or “hammer”), as we’ve seen in AlphaEvolve, is ruthless search and optimization against a reward function (whether this requires RL or not is a separate discussion). Things that problem-driven researchers have been trying to solve for a long time like the kissing number problem will become nails hit by the hammer. Eventually the hammer will become bigger, stronger, and more general and will hit more and more nails. So a very important meta-skill for the next decade will be knowing how to create the right environments to use The Hammer. Ironically, the problem-driven researchers, who by definition are experts in a specific problem, are well-positioned to create these environments. If, that is, they can put down their egos and pick up the hammer.
English
21
91
712
78.2K