sshkhr

1.3K posts

sshkhr banner
sshkhr

sshkhr

@sshkhr16

research eng @GoogleDeepMind prev: founder @DiceHealth, researcher @AIatMeta @VectorInst

Toronto, Canada Katılım Nisan 2018
1.6K Takip Edilen1.9K Takipçiler
Sabitlenmiş Tweet
sshkhr
sshkhr@sshkhr16·
Our work on improving neural scaling beyond power law won an Outstanding Paper award at @NeurIPSConf 2022!! Come check it out on Wed, Nov 30, at Poster Session 3 in New Orleans.
Surya Ganguli@SuryaGanguli

Our "Beyond Neural Scaling laws" paper got a #NeurIPS22 outstanding paper award! Congrats Ben Sorscher, Robert Geirhos, @sshkhr16 & @arimorcos awards: blog.neurips.cc/2022/11/21/ann… paper: arxiv.org/abs/2206.14486 🧵 twitter.com/SuryaGanguli/s…

English
9
9
107
0
The Lunduke Journal
The Lunduke Journal@LundukeJournal·
Remember the security firm that Ubuntu hired to audit the (ill-advised, highly buggy) Rust-rewrites of all of the GNU Coreutils? Turns out that security firm is run by @gf_256, who: - Appears to be a man who thinks he's a woman ("trans"). - Uses an anime cartoon of a girl as his avatar. - Appears to have an OnlyFans page. I repeat: Ubuntu hired a "Trans" man, with an anime girl avatar and an OnlyFans page... to audit Rust code. It's hard to get more on-the-nose than that.
The Lunduke Journal tweet mediaThe Lunduke Journal tweet mediaThe Lunduke Journal tweet mediaThe Lunduke Journal tweet media
English
412
89
1.2K
586K
Sam Altman
Sam Altman@sama·
you know what all of these "which is better" polls are silly use codex or claude code, whatever works best for you i am grateful we live in a time with such amazing tools, and grateful there is a choice
English
2.2K
1.1K
23K
1.6M
himanshu
himanshu@himanshustwts·
dude i love how dwarkesh keeps scaling the podcasting experience and every time it boils down to first principles of learning
himanshu tweet media
Dwarkesh Patel@dwarkesh_sp

Did a very different format with @reinerpope – a blackboard lecture where he walks through how frontier LLMs are trained and served. It's shocking how much you can deduce about what the labs are doing from a handful of equations, public API prices, and some chalk. It’s a bit technical, but I encourage you to hang in there - it’s really worth it. There are less than a handful of people who understand the full stack of AI, from chip design to model architecture, as well as Reiner. It was a real delight to learn from him. Recommend watching this one on YouTube so you can see the chalkboard. 0:00:00 – How batch size affects token cost and speed 0:31:59 – How MoE models are laid out across GPU racks 0:47:02 – How pipeline parallelism spreads model layers across racks 1:03:27 – Why Ilya said, “As we now know, pipelining is not wise.” 1:18:49 – Because of RL, models may be 100x over-trained beyond Chinchilla-optimal 1:32:52 – Deducing long context memory costs from API pricing 2:03:52 – Convergent evolution between neural nets and cryptography

English
13
71
2K
113.9K
sshkhr
sshkhr@sshkhr16·
@cHHillee naah, i have it from reliable sources you dropped out of school to start thinky 😅
English
0
0
3
4.3K
Horace He
Horace He@cHHillee·
While I'm happy that many folks seemed to enjoy this talk, there are a lot of inaccuracies in this tweet 😆 "Jane Street hired" - I've never worked at Jane Street "This junior" - at this point I'm 5 years out of undergrad, so I think arguably I'm not a junior anymore although perhaps some would disagree :) "uses AI to analyze ... data" - I would not describe my role like this haha Probably also good to mention that it's from the Jane Street Tech Talk series: youtu.be/139UPjoq7Kw?si… and not from this reposter
YouTube video
YouTube
bodila@51bodila

Jane Street hired this junior at $220k-$600k /year because he uses AI to analyse TRILLIONS of data in this 1-hour lecture - he show how to research trillion of data points thanks to his machine Bookmark & watch it, instead of Netflix to learn how to do the same!

English
34
93
2.1K
357.2K
Dwarkesh Patel
Dwarkesh Patel@dwarkesh_sp·
Did a very different format with @reinerpope – a blackboard lecture where he walks through how frontier LLMs are trained and served. It's shocking how much you can deduce about what the labs are doing from a handful of equations, public API prices, and some chalk. It’s a bit technical, but I encourage you to hang in there - it’s really worth it. There are less than a handful of people who understand the full stack of AI, from chip design to model architecture, as well as Reiner. It was a real delight to learn from him. Recommend watching this one on YouTube so you can see the chalkboard. 0:00:00 – How batch size affects token cost and speed 0:31:59 – How MoE models are laid out across GPU racks 0:47:02 – How pipeline parallelism spreads model layers across racks 1:03:27 – Why Ilya said, “As we now know, pipelining is not wise.” 1:18:49 – Because of RL, models may be 100x over-trained beyond Chinchilla-optimal 1:32:52 – Deducing long context memory costs from API pricing 2:03:52 – Convergent evolution between neural nets and cryptography
English
146
595
6.5K
1.2M
sshkhr
sshkhr@sshkhr16·
ball is in CS 153's court
sshkhr tweet media
Dwarkesh Patel@dwarkesh_sp

Did a very different format with @reinerpope – a blackboard lecture where he walks through how frontier LLMs are trained and served. It's shocking how much you can deduce about what the labs are doing from a handful of equations, public API prices, and some chalk. It’s a bit technical, but I encourage you to hang in there - it’s really worth it. There are less than a handful of people who understand the full stack of AI, from chip design to model architecture, as well as Reiner. It was a real delight to learn from him. Recommend watching this one on YouTube so you can see the chalkboard. 0:00:00 – How batch size affects token cost and speed 0:31:59 – How MoE models are laid out across GPU racks 0:47:02 – How pipeline parallelism spreads model layers across racks 1:03:27 – Why Ilya said, “As we now know, pipelining is not wise.” 1:18:49 – Because of RL, models may be 100x over-trained beyond Chinchilla-optimal 1:32:52 – Deducing long context memory costs from API pricing 2:03:52 – Convergent evolution between neural nets and cryptography

English
1
2
29
6.1K
maharshi
maharshi@maharshii·
“triton? kernels? cuda? what are you talking about bro let’s go eat some ants”
maharshi tweet media
English
8
14
385
6.6K
Aran Komatsuzaki
Aran Komatsuzaki@arankomatsuzaki·
This feels like confusing a serving-runtime problem for a chip-startup opportunity. Agents do change inference patterns: loops, tool calls, branching, long context, KV reuse, burstiness. But most of that is an inference systems problem: scheduling, routing, KV-cache management, etc. Think Dynamo. By the time a new chip co tapes out + builds a compiler stack + wins cloud distribution, NVIDIA/AMD will likely have baked the obvious hardware-level optimizations into existing platforms.
Y Combinator@ycombinator

Inference Chips for Agent Workflows @sdianahu Most AI chips are designed for "prompt in, response out." Agents don't work that way. They loop, branch, and hold context across dozens of steps, and current GPUs hit 30–40% utilization as a result. That gap is where purpose-built silicon wins.

English
15
10
99
25.3K
MegaApple
MegaApple@MegaApple18·
@asha_shar ma'am please normalize regional pricing for India 🇮🇳 Currently extremely high (almost DOUBLE) compared to recommended pricing. Thank you, wish you the best.
MegaApple tweet media
English
1
3
22
201
sshkhr
sshkhr@sshkhr16·
@github Copilot after they change to usage-based pricing
GIF
English
0
0
2
2.1K
GitHub
GitHub@github·
Starting June 1st, GitHub Copilot will move to a usage-based billing model as GitHub Copilot supports more agentic and advanced workflows. In early May, you'll see a preview bill experience, giving visibility into projected costs before the transition. 👉 Read more about the upcoming change: github.blog/news-insights/…
English
519
934
2.9K
3.7M
sshkhr
sshkhr@sshkhr16·
@yacineMTB @basedjensen I met one of the Pause AI guys at a NeurIPS party, he asked me how old I was and then proceeded to turn around and walk away. Maybe he just did not like the number 26. Similar physiognomy. I could probably overhead press his body weight too
English
0
0
1
281
kache
kache@yacineMTB·
@basedjensen Actually, for me personally, it all started when I visited their office in 2023 for a shrimp sushi party when I met one particular snobby guy who pissed me off. He was a manlet. Very short man
English
6
1
235
9.5K
sshkhr retweetledi
Ambition
Ambition@ambitionlabsinc·
We're looking for our founding design engineer. Someone who builds with care and intention, who is willing to bet on a vision for AI that makes people believe they can be ambitious in ways they couldn't be before. More details at design[dot]ambition[dot]inc
English
10
13
156
17.4K