Xiao Lin

223 posts

Xiao Lin

Xiao Lin

@much_science

Katılım Haziran 2020
31 Takip Edilen15 Takipçiler
Tyler Shaw
Tyler Shaw@boowiebear·
@VideoCardz 0.16% of the market is almost zero. I wish ARC was more successful but it is not taking off due to poor performance, price and availability. This is on Intel.
English
12
0
15
4.5K
Xiao Lin
Xiao Lin@much_science·
@LingelKun @absentprototype @wccftech If the artist will still express intent by build ing a full-blown character model, and DLSS5 is just render ing it efficiently, they should also post the full rendering and say DLSS5 is close. Like DLSS 4k and native 4k.
English
1
0
0
21
Wccftech
Wccftech@wccftech·
"If DLSS 5 was shown as a next-gen hardware reveal and not AI, you guys would be going nuts." Veteran developer JP Kellams and analyst Ryan Shrout push back against DLSS 5's social media backlash for being a mere "AI filter". 🔗 wccf.tech/1k1ea
English
69
30
396
15K
Xiao Lin
Xiao Lin@much_science·
@FelixCLC_ Will you be doing pointwise ops near memory?
English
0
0
1
118
@fclc
@fclc@FelixCLC_·
A truly good tensor ISA has never been tried
English
4
5
41
6.1K
Xiao Lin
Xiao Lin@much_science·
@hyhieu226 Papers and peer reviews so successful start ups using them as a central github.
English
0
0
0
65
Hieu Pham
Hieu Pham@hyhieu226·
AI paper authors should move from "we propose X" to "we implement X." If a reviewer complains "lack of novelty" or asks "did you compare to Y et al" (woes to them), the best response is "run our code. it gives better results." But maybe just ditch papers...
English
14
4
118
14.8K
Young Engineer
Young Engineer@YoungEngnr·
@dcominottim Come on that is a bit much. He has had a huge role in Apple and AMD architecture success. Maybe Tesla as well but hard to tell without independent testing. Unclear why they are doing this though at Tenstorrent - bug ? Early failures ?
English
1
0
0
292
Simon Willison
Simon Willison@simonw·
"This is a 600X cost reduction over 7 years, i.e. the cost to train GPT-2 is falling approximately 2.5X every year."
Andrej Karpathy@karpathy

nanochat can now train GPT-2 grade LLM for <<$100 (~$73, 3 hours on a single 8XH100 node). GPT-2 is just my favorite LLM because it's the first time the LLM stack comes together in a recognizably modern form. So it has become a bit of a weird & lasting obsession of mine to train a model to GPT-2 capability but for much cheaper, with the benefit of ~7 years of progress. In particular, I suspected it should be possible today to train one for <<$100. Originally in 2019, GPT-2 was trained by OpenAI on 32 TPU v3 chips for 168 hours (7 days), with $8/hour/TPUv3 back then, for a total cost of approx. $43K. It achieves 0.256525 CORE score, which is an ensemble metric introduced in the DCLM paper over 22 evaluations like ARC/MMLU/etc. As of the last few improvements merged into nanochat (many of them originating in modded-nanogpt repo), I can now reach a higher CORE score in 3.04 hours (~$73) on a single 8XH100 node. This is a 600X cost reduction over 7 years, i.e. the cost to train GPT-2 is falling approximately 2.5X every year. I think this is likely an underestimate because I am still finding more improvements relatively regularly and I have a backlog of more ideas to try. A longer post with a lot of the detail of the optimizations involved and pointers on how to reproduce are here: github.com/karpathy/nanoc… Inspired by modded-nanogpt, I also created a leaderboard for "time to GPT-2", where this first "Jan29" model is entry #1 at 3.04 hours. It will be fun to iterate on this further and I welcome help! My hope is that nanochat can grow to become a very nice/clean and tuned experimental LLM harness for prototyping ideas, for having fun, and ofc for learning. The biggest improvements of things that worked out of the box and simply produced gains right away were 1) Flash Attention 3 kernels (faster, and allows window_size kwarg to get alternating attention patterns), Muon optimizer (I tried for ~1 day to delete it and only use AdamW and I couldn't), residual pathways and skip connections gated by learnable scalars, and value embeddings. There were many other smaller things that stack up. Image: semi-related eye candy of deriving the scaling laws for the current nanochat model miniseries, pretty and satisfying!

English
18
58
993
97.1K
Xiao Lin
Xiao Lin@much_science·
@rajammanabrolu Transformer architecture also "evolved" out of a ton of GPU hours, benchmarking data and many years of distributed research, if we count evolution in. Also: our ancestor won't be able to explore as well as we do. Genes are the same but context is different.
English
0
0
0
20
Prithviraj (Raj) Ammanabrolu
Prithviraj (Raj) Ammanabrolu@rajammanabrolu·
Human learning is def more sample efficient than machines rn (tho not as much as what some claim given evolutionary + societal knowledge accumulation counts as compute). Again, the "pre training" for humans vs models is v different. Models currently encode a large amount of factual information verbatim whereas what it seems like what humans have done is learn how to learn better at exploration (~RL) time. Anyways, moral of the story is that trajectories are different so far and the path to AGI might have nothing in common with the human path to intelligence. We shouldn't over-index on it
Harrison Kinsley@Sentdex

A 17 year old doesn't learn to drive in 10-20 hrs. It's more like 10-20 hrs + 17 years of RL in a very robust physics env on top of an insane amount of pre-training via evolution for millions of years.

English
2
2
14
2.1K
Xiao Lin
Xiao Lin@much_science·
@karlbykarlsmith @rajammanabrolu And there's only a few billion bits that came out from this evolution process. We also know quite accurately what those bits are.
English
0
0
0
12
Pseudo Doctor Subtilis
Pseudo Doctor Subtilis@thesubtledoctor·
@rajammanabrolu Have you crunched the numbers on evolution. Gut says evolutionary information accumulation blows current pre-training out of the water.
English
1
0
0
76
Xiao Lin
Xiao Lin@much_science·
@bubbleboi I sure can still afford a 6502 processor right?
English
0
0
0
79
bubble boi
bubble boi@bubbleboi·
I’ve been thinking a lot about Clawd Bot & the race for Mac mini’s a bit over the past few days and I think I’ve come to a very scary realization that explains this crazy phenomenon. Put simply, building a gaming PC will be nearly impossible in the next 5 years… in fact it already is for the vast majority of consumers. But I will go one step farther—in the next 10 years having any type of personal computing device will be unattainable. Fab capacity will be allocated to its most productive and profitable use which is Cloud & AI data centers. Even today, most of the software you run already won’t work without an internet connection. But now with the opportunity cost being so high consumers will be shafted and will only have one option which is moving to the cloud. It’s looking increasingly likely the only hardware you will have is some terminal that connects to the cloud with no workloads running directly on your own hardware. Your device will just have the most basic single core processor and 4 GB of RAM at most.. This is what most people are missing with the fire escape race to acquire Mac Mini’s. The cost for these AI services aren’t just going up they will scale and capture the profitability of the services they provide the same way consulting and financial services extract rent from larger more productive corporations. The only way to protect yourself from the inevitable is to acquire as much hardware that can run inference as fast as fucking possible… Welcome to the computless class.
bubble boi tweet media
English
135
59
1.3K
210.4K
@fclc
@fclc@FelixCLC_·
Lesson of the last 5 years: If you can't describe your programing model to a person, how in the world do you expect to explain it to a compiler?
English
5
1
21
1.3K
Xiao Lin
Xiao Lin@much_science·
@deredleritt3r What if the algorithm discovered something fake? Will openai cover any part of the losses?
English
0
0
0
23
prinz
prinz@deredleritt3r·
Let's cut through the untrue sensationalist reporting. - OpenAI is NOT going to come after random people and demand that they share profits from discoveries made with the aid of ChatGPT. - OpenAI is NOT going to "force" anyone to hand over their profits. Here's what's actually going on: OpenAI will approach enterprise customers and offer them to sign a deal that will include a revenue share with OpenAI from products made with the aid of OpenAI's models. It's up to the enterprise customer - a sophisticated party, represented by competent outside counsel - to agree or disagree to pursue this arrangement. We have a recent quote from Sarah Friar on this: "I like [the idea of] licensing models to really align [the customer's and OpenAI's interests]. Let's say in drug discovery, if we licensed our technology, you have a breakthrough. That drug takes off and we get a licensed portion of all its sales." See also the screenshot below from OpenAI's recent blog, which talks about "licensing, IP-based agreements and outcome-based pricing" that would let OpenAI share in the value created by its models.
prinz tweet media
*Walter Bloomberg@DeItaone

OPENAI PLANS TO TAKE A CUT OF CUSTOMERS’ AI-AIDED DISCOVERIES

English
65
26
418
86.6K
Xiao Lin
Xiao Lin@much_science·
@SquashBionic Was literally waiting for ROG phone 10 to replace my Razer phone 2
English
0
0
0
114
Xiao Lin
Xiao Lin@much_science·
@jimkxa Hodl on, memory stonk going up
English
0
0
1
411
Jim Keller
Jim Keller@jimkxa·
Quad TT-Galaxies. 4TB of memory. Bigger soon
Jim Keller tweet media
English
15
18
310
22.3K
Xiao Lin
Xiao Lin@much_science·
@pmddomingos What if physics change over time in an unpredictable manner
English
0
0
0
23
Pedro Domingos
Pedro Domingos@pmddomingos·
The universe is the maximum entropy distribution given its symmetries.
English
11
2
41
5.6K
Xiao Lin
Xiao Lin@much_science·
@TrueAIHound The next physics break through might not be as impactful as the previous one. There's diminishing returns. But new species often do make some others go extinct. And revolutions can go right or wrong.
English
0
0
1
54
AGIHound
AGIHound@TrueAIHound·
In my opinion, regardless of the nonstop hype and lies, the AI community has never done any AI. We are still in the computer automation era. It really began in the 1970s and accelerated in the 1990s with the advent of fast computers and the internet. It has changed the world as we knew it. I believe that, when the true AI revolution comes, it will be Biblical, as they say in the movies. 😬 Note: I also have reasons to believe that the coming revolution will not be about AI alone. I anticipate the arrival of huge breakthroughs in physics as well.
Pedro Domingos@pmddomingos

This AI revolution isn’t the big one yet.

English
9
3
33
2.3K
Xiao Lin
Xiao Lin@much_science·
@real_deep_ml Why? Aren't they the state of the art on common sense benchmarks?
English
1
0
1
33
Deep-ML
Deep-ML@real_deep_ml·
LLMs are amazing, they just lack common sense
English
2
0
8
928
Xiao Lin
Xiao Lin@much_science·
@pmddomingos Don't need to learn if you can hardcode them. Turned out that machine learning should be called neural hardcoding
English
0
0
0
15
Pedro Domingos
Pedro Domingos@pmddomingos·
The #1 problem in AI is poor generalization, and the #1 solution is exploiting symmetries.
Pedro Domingos tweet media
English
21
39
396
24.9K
Xiao Lin
Xiao Lin@much_science·
@EERandomness @burkov At least they got their OSes working. Our GUIs have stayed 2D for too long.
English
0
0
0
13
Engineering Randomness
Engineering Randomness@EERandomness·
@burkov Also, nobody is buying them. I am stunned that even after Apple failed to sell VR, companies are still thinking they are the future. They will have niche uses, but VR just isn't exciting at all to 80% of people.
English
4
0
9
860
BURKOV
BURKOV@burkov·
When Google presented Glass 10 years ago, everyone freaked out that now someone could secretly film you, despite the fact that Glass had an easily recognizable design. Now, just 10 years later, everyone makes XR glasses that look like regular glasses, and no one cares. This is the Overton window effect: if you are rich enough or you are a government, you can gradually change the population's opinion from strictly negative to acceptable in a matter of a decade. This includes the most "difficult" topics like lowering the legal age of consent or assisted suicide.
BURKOV tweet media
English
39
25
282
25.6K
Xiao Lin
Xiao Lin@much_science·
@realmemes6 Finding the next curve of the tech stack
English
0
0
0
50
Leo Fun Facts
Leo Fun Facts@Leofunfacts·
@Rainmaker1973 Since I was a little kid, I have always read stories about child geniuses who finished college as teenagers. Where are they now, 30 years later, and what have they achieved?
GIF
English
12
2
49
19.4K
Massimo
Massimo@Rainmaker1973·
A 15-year-old has just earned a PhD in quantum physics. Laurent Simons, a Belgian child prodigy, has blazed an academic trail unmatched by almost anyone else on the planet, accelerating through education at a velocity that defies norms. He began primary school at age four and wrapped it up by six. At twelve, he already held a master's degree in quantum physics, delving into bosons, black holes, and the intricate mathematics unraveling the universe's deepest enigmas. This week, the boy hailed as Belgium's "Little Einstein" defended his doctoral thesis at the University of Antwerp, cementing his status as one of the youngest physics PhDs in recorded history. His research tackled advanced concepts—like Bose polarons in superfluids and supersolids—that most scholars wouldn't touch until their twenties or thirties. Yet for Laurent, this path has been profoundly intimate: the loss of his grandparents at eleven ignited a fire in him to unravel the secrets of longevity—not for personal gain, but to grant others extended, vibrant years. Experts marvel at his prodigious memory and IQ of 145, a rarity shared by just 0.1 percent of the population. Tech giants from the U.S. and China have dangled lucrative offers to his family, but his parents have rebuffed them, championing his right to evolve on his own terms. Laurent doesn't claim the absolute youngest PhD title—that belongs to Karl Witte, who graduated at thirteen in 1814—but in contemporary physics, his feat stands virtually unparalleled. Now fifteen, he's poised to pivot from quantum realms to medical frontiers, eyeing a second doctorate in medical AI to pioneer breakthroughs in aging. His audacious vision? Crafting "superhumans" through innovations that conquer mortality's puzzles—a domain exploding with promise yet riddled with enigmas. Quantum trailblazer or medical revolutionary, Laurent Simons is poised on the cusp: his extraordinary odyssey has only just ignited.
Massimo tweet media
English
645
2.6K
16K
1.2M