Faisal Mumtaz

3.9K posts

Faisal Mumtaz banner
Faisal Mumtaz

Faisal Mumtaz

@digitacy

Innovating and building crazy things @HUMAIN

Saudi Arabia Entrou em Şubat 2019
709 Seguindo10K Seguidores
Tweet fixado
Faisal Mumtaz
Faisal Mumtaz@digitacy·
I wrote an LLM inference engine from scratch in Rust. No PyTorch. No ONNX. No Python. Just one binary that downloads a model and runs GPU-native inference on Apple Silicon + NVIDIA. It beats llama.cpp at Q4 on Apple Silicon. Here’s how it works 🧵
English
10
24
385
983.6K
Faisal Mumtaz
Faisal Mumtaz@digitacy·
OpenAI- and Anthropic-compatible server in one command. SSE streaming, tool calling, native tokenizer, and native format in a single Rust codebase.
English
1
0
1
599
Faisal Mumtaz
Faisal Mumtaz@digitacy·
I wrote an LLM inference engine from scratch in Rust. No PyTorch. No ONNX. No Python. Just one binary that downloads a model and runs GPU-native inference on Apple Silicon + NVIDIA. It beats llama.cpp at Q4 on Apple Silicon. Here’s how it works 🧵
English
10
24
385
983.6K
CryptoGoos
CryptoGoos@cryptogoos·
CRAZY: 🇺🇸 SpaceX is going public at ~$1.8 trillion. Revenue last year: $18.7 billion. Net loss: ~$4.9 billion. That's a price-to-sales multiple near 96x. On a company currently growing revenue around 33% a year, and still losing money. To grow into that price, it has to sustain 40%+ revenue growth for 10 years straight. Almost impossible.
CryptoGoos tweet media
English
201
263
1.3K
147.2K
Faisal Mumtaz
Faisal Mumtaz@digitacy·
@helloitsaustin People who typically fit the higher echelons such as distinguished fellows, don't get interviewed in traditional ways. They decide if the company and its mission is worth their time.
English
0
0
2
2.9K
Andrej Karpathy
Andrej Karpathy@karpathy·
Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time.
English
8K
11.2K
150.2K
27.6M
Faisal Mumtaz retweetou
Crémieux
Crémieux@cremieuxrecueil·
Last year, China added as much energy to its grid as Germany has in total.
Crémieux tweet media
English
689
1.6K
5.6K
1.1M
Faisal Mumtaz retweetou
Balaji
Balaji@balajis·
The US federal budget is $7 trillion. There are 535 in the Senate and Congress. They collectively allocate that money. Specifically: 7000/535 = 13 billion per official. Which means AOC is a political billionaire. She allocates far more than any market billionaire. Indeed she allocates ten billion liquid, per year. A market billionaire has one billion, illiquid, per life. So: AOC spends 10-100X what a market billionaire has. It's not even close. She's right about one thing, though. AOC didn't earn the tens of billions she spends. The state taxed that money, by force.
Breitbart News@BreitbartNews

Alexandria Ocasio-Cortez: You can't earn a billion dollars. Ilana Glazer: That's right. AOC: You just can't earn that. Glazer: That's exactly correct. AOC: You can get market power. You can break rules. You can do all sorts of things. You can abuse labor laws. Glazer: Yup. AOC: You can pay people less than what they're worth. Glazer: Yup. AOC: But you can't earn that, right? Glazer: That's right. AOC: And so you have to create a myth that -- since you didn't earn that, you have to create a myth of earning it.

English
214
622
5.5K
387K
Faisal Mumtaz
Faisal Mumtaz@digitacy·
@alex_whedon If you're going to make such claims, start with empirical proof, not marketing
English
0
0
0
58
Alexander Whedon
Alexander Whedon@alex_whedon·
Introducing SubQ - a major breakthrough in LLM intelligence. It is the first model built on a fully sub-quadratic sparse-attention architecture (SSA), And the first frontier model with a 12 million token context window which is: - 52x faster than FlashAttention at 1MM tokens - Less than 5% the cost of Opus Transformer-based LLMs waste compute by processing every possible relationship between words (standard attention). Only a small fraction actually matter. @subquadratic finds and focuses only on the ones that do. That's nearly 1,000x less compute and a new way for LLMs to scale.
English
1.5K
2.9K
23K
12.8M
urata ☄️
urata ☄️@UrataSyndicate·
@damianplayer What else would you even do? What else would the expectation even be? This note feels like it is saying nothing
English
21
0
201
33.2K
Damian Player
Damian Player@damianplayer·
this is why Elon’s companies move 10x faster than most. every founder should run their team like this: push back, ask, or execute.
Damian Player tweet media
English
181
1.1K
18.4K
964.1K
Zain Shah
Zain Shah@zan2434·
Imagine every pixel on your screen, streamed live directly from a model. No HTML, no layout engine, no code. Just exactly what you want to see. @eddiejiao_obj, @drewocarr and I built a prototype to see how this could actually work, and set out to make it real. We're calling it Flipbook. (1/5)
English
1.1K
3.7K
28.8K
6M
Hubert Thieblot
Hubert Thieblot@hthieblot·
pitch me your company in 1 word.
English
2.4K
15
1.1K
288.9K
World of Statistics
World of Statistics@stats_feed·
🚀 Successful orbital rocket launches in 2026 🇺🇸 United States: 56 Rest of the world: 26 Rest of the world: 🇨🇳 China: 20 🇷🇺 Russia: 5 🇫🇷 France: 1
English
45
103
2.4K
116.4K
Faisal Mumtaz
Faisal Mumtaz@digitacy·
I will give this a shot on Lumen, Inference engine I published recently, however I am highly skeptical. HBM bandwidth is 3-4 TB/s, while Fast PCIe 5.0 NVMe SSD's are 5-10 GB/s. There's a reason why Apple doesn't market SSD performance. Notebookcheck’s 1TB M4 Max review unit measured about 5.06 GB/s read and 6.24 GB/s write in Blackmagic Disk Speed Test. The Verge’s 4TB M4 Max review unit measured about 7.34 GB/s sustained read and 7.97 GB/s sustained write. Good work though, interesting study. github.com/faisalmumtaz89…
English
0
0
0
49