Sergey Serebryakov

1.6K posts

Sergey Serebryakov banner
Sergey Serebryakov

Sergey Serebryakov

@megaserg

ML Engineer, AI Infra expert. Ex-@weHRTyou, ex-@Cruise, ex-@Tesla, ex-@Twitter, ex-RocketFuel, ex-@JetBrains, ex-@Facebook, ex-@Google. mostly puns

London, United Kingdom Katılım Nisan 2010
853 Takip Edilen753 Takipçiler
Sergey Serebryakov
Sergey Serebryakov@megaserg·
I saw the best minds of my generation developing AI to make compulsive short-form videos
English
0
0
4
153
Sergey Serebryakov
Sergey Serebryakov@megaserg·
sucks to be limited by the speed of light tbh
English
1
0
4
362
Andrew Yeung
Andrew Yeung@andruyeung·
How the heck are we almost halfway through 2024?
English
13
2
45
6.6K
Sergey Serebryakov retweetledi
Isomorphic Labs
Isomorphic Labs@IsomorphicLabs·
How could #AlphaFold 3 transform drug discovery? Most drugs are small molecules known as ligands that bind to proteins to change how they interact in human health and disease. AlphaFold 3 can predict these interactions to atomic accuracy.
English
7
117
444
75.2K
Sergey Serebryakov
Sergey Serebryakov@megaserg·
@levwalkin "мне пришла в голову мысль, но ушла не застав меня"
Русский
0
0
0
37
Lev Walkin
Lev Walkin@levwalkin·
— Эта мысль меня посещала. Но мы с ней разошлись в мнениях.
Русский
1
0
2
1.1K
Sergey Serebryakov
Sergey Serebryakov@megaserg·
There are vast and obvious inefficiencies wherever there was no dedicated optimization effort, and possibly even where there was Also true for organizations
Andrej Karpathy@karpathy

This post became popular; Few more thoughts / pointers on the topic for the interested reader. Example of the complexity involved: @cHHillee has a great post "Making Deep Learning Go Brrrr From First Principles" horace.io/brrr_intro.html I was always struck by this diagram from this post. Left to right is time. Look at all these functions stacked up vertically that are dispatched until 30 layers deep you get the actual computation (addition in this example). All of this stuff is PyTorch function overhead. In practical settings this overhead becomes narrow in comparison to the actual computation because the arrays we're adding are so large, but still. What is all this stuff? We're just trying to add numbers. Second: startup latency. Open up Python interpreter and try to import the PyTorch library (`import torch`). On my computer this takes about 1.3 seconds. This is just the library import, before you even do anything. In a typical training run you'll end up importing a lot more libraries, so even just starting your training script can often add up to tens of seconds of you just waiting around. A production-grade distributed training run can even add up to minutes. I always found this very frustrating. Computers are *fast* - even a single CPU core (of up to ~dozens on your computer) does billions of operations in one second. What is happening? In llm.c, all this startup latency is ~gone. Right after allocating memory your computer just directly dives into useful computation. I love the feeling of hitting Enter to launch your program, and it just goes. Direct to useful computation on your problem. No waiting. Third thought: LLM as a compiler. It feels likely to me that as LLMs get much better at coding, a lot more code might be written by them, to target to whatever narrow application and deployment environment you care about. In a world where very custom programs are "free", LLMs might end up being a kind of compiler that translates your high level program into an extremely optimized, direct, low-level implementation. Hence my LLM Agent challenge earlier of "take the GPT-2 PyTorch training script, and output llm.c", as one concrete example. Lastly I also wanted to mention that I don't mean to attack PyTorch at all, I love the library and I have used it for many years. And I've worked in Python for much longer. These are a lot more general problems and tradeoffs that are really fun to think through - between flexibility, generality, hackability, security, abstractions overhead, code complexity, speed (latency / throughput), etc. The fun and magic of pareto optimal infrastructure, and of programming computers.

English
0
0
1
616
Sergey Serebryakov
Sergey Serebryakov@megaserg·
Do not use NotaryCam. Find a local notary public instead.
English
1
0
0
358
Sergey Serebryakov
Sergey Serebryakov@megaserg·
You want to build and install an iOS app on your device? Your iOS is too new, an Xcode update is required. To update Xcode, a macOS update is required. This development toolchain is so broken.
English
1
0
1
386
Sergey Serebryakov retweetledi
vik
vik@vikhyatk·
spent all this time studying ML and all i got was an addiction to training large models on expensive GPU clusters
English
5
2
58
8.7K
Sergey Serebryakov
Sergey Serebryakov@megaserg·
@jmmv Nice post! I was bitten by EBS I/O performance as well when debugging bottlenecks of deep learning training nodes. Workload visualization is also super useful there :)
English
0
0
1
50
Julio Merino
Julio Merino@jmmv·
Remember that cool graph I posted a few weeks ago about visualizing the behavior of our Bazel build farm and using it to root-cause problems? We ❄️ now have a post! Read on for the cool stuff we are doing at Snowflake in dev tools. medium.com/snowflake/buil…
English
1
1
10
2K
Sergey Serebryakov
Sergey Serebryakov@megaserg·
shrinkflation is when your therapist charges you the same price for a 55-minute session
English
1
0
5
539
Sergey Serebryakov retweetledi
Стартапы и бизнес
Прокуратура обвинила в организации экскурсии по коллекторам, во время которой погибло восемь человек, Александра Кима — основателю сервиса «Спутник». Ким проходил по делу как свидетель. Его сервис не организует экскурсии vc.ru/legal/806543
Русский
1
4
10
7.9K
Sergey Serebryakov
Sergey Serebryakov@megaserg·
Convinced that if everyone personally filed their taxes by hand, there would be no socialists.
English
1
1
3
318
Sergey Serebryakov
Sergey Serebryakov@megaserg·
And California, when taxing nonresident income, applies tax bracket computed from the total income. Amazing.
English
1
0
0
406
Sedrak
Sedrak@sedrak_·
@megaserg Basically one needs to avoid making between $2,155,350 and $25,000,000?
English
1
0
0
52