David Thomas

7

David Thomas@davidthomas426·4d

@RickT9900022 @SebAaltonen And my points in bringing up who @SebAaltonen are: - He is saying that (2) above is true, and if he is saying that of all people, believe it. - He is NOT saying that (1) isn’t true. He (and I) would agree with you there, to a point. If you look at his posts, he pushes for better!

English

0

10

Sebastian Aaltonen@SebAaltonen·5d

I am vocal about 8GB RAM laptops because I am a game dev. We were also vocal about cut down 10GB RAM in Xbox Series S. Steam Deck is a 4 year old $399 gaming device with 16GB RAM. It's the modern min spec for gaming. I would love to see more games on Macs, but 8GB alienates devs.

English

252

29

571

158.9K

David Thomas@davidthomas426·4d

@RickT9900022 @SebAaltonen Omg I just saw YOUR bio, and of course lol. 🤦‍♂️😂

English

0

12

David Thomas@davidthomas426·4d

@RickT9900022 @SebAaltonen Yep, go ahead, you can voice whatever opinion you want, even if you have no idea what you are talking about. You could yell that game devs should just make amazing games run in 256KB of RAM at 900 FPS. Why not? Obviously they are just lazy, right? lol

English

0

14

David Thomas@davidthomas426·4d

@RickT9900022 @SebAaltonen Dude, you have no idea who you are talking to lol. He is the most outspoken game dev I can think of about optimizing games to run on low spec devices. 8GB (shared between CPU and GPU on Macs) is not enough for the resolution and level of detail of modern AAA games.

English

0

16

Kcir@RickT9900022·5d

@SebAaltonen how about you damn game devs optimize the games to run on 8 GB? why does the game need to be needing more than that?

English

0

1

306

David Thomas@davidthomas426·5d

@danoboltup @ChrisO_wiki @HumanistQuaker Ok, “redditor energy”, “acting like a jerk online”, whatever you want to call your behavior is fine with me

English

16

dano@danoboltup·5d

@davidthomas426 @ChrisO_wiki @HumanistQuaker redditor energy

English

Trump asks for Britain's help to save the Strait of Hormuz as he calls for Starmer and other foreign leaders to send ships and admits Iran has closed key oil route for the first time trib.al/Op1lNRR

0

1

107

ChrisO_wiki@ChrisO_wiki·5d

Life comes at you fast

Daily Mail@DailyMail

English

237

4K

25.6K

1.8M

David Thomas@davidthomas426·5d

@danoboltup @ChrisO_wiki @HumanistQuaker Their they’re, its all rite, if there so worried im shure their going to come help, your mispelling all you’re words acting like a jerk online for nuthin

English

0

1

160

dano@danoboltup·5d

@ChrisO_wiki @HumanistQuaker he didn’t ask for “help” he called them out while their hysterical about the oil shipping lanes. so he said “if your so worried come help” they won’t, which proves their pussies that are full of shit.

English

21

0

6

1.8K

David Thomas@davidthomas426·6d

@YouJiacheng @SkyLi0n I assumed they meant the linear algebra library: developer.nvidia.com/magma

English

16

You Jiacheng@YouJiacheng·12 Mar

@SkyLi0n which Magma? Magma optimizer?

Filipino

A Streaming Power Iteration Approach to Muon kexue.fm/archives/11654

0

4

599

Aaron Gokaslan@SkyLi0n·12 Mar

All those optimizations we made to the linear algebra solves by ditching Magma already bearing fruit!

jianlin.su@Jianlin_S

English

0

16

2.4K

David Thomas@davidthomas426·12 Mar

@FPupusas @ThePrimeagen And yes, the same very well may happen with agents. i also said it’s complicated and to some degree I agree with you. Still, while beginners can immediately see the benefits, what people like Prime immediately feel are the pain points, drop in productivity, and loss of control.

English

15

FortyTwoPupusas@FPupusas·12 Mar

@davidthomas426 @ThePrimeagen supermaven has not changed in the last 6 moths because it was sunset 🤷‍♂️ it didn't get better, prime just had a few months to get over his initial "i want to write my own code" bias and now he can actually like the tool same will happen with agents

English

0

1

81

ThePrimeagen@ThePrimeagen·12 Mar

i am using supermaven again and i have something to say about this whole AI thing. I think as a group (swe) we rushed so fast into Agents when inline autocomplete + actual skills is crazy. A good autocomplete that is fast like supermaven actually makes marked proficiency gains, while saving me from cognitive debt that comes from agents. With agents you reach a point where you must fully rely on their output and your grip on the codebase slips. Its insane how good cursor Tab is. Seriously, I think we had something that genuinely makes improvement to ones code ability (if you have it). Truly acts as a multiplier, and we left it in the dust because it is not sexy. hurts me on the inside.

English

219

134

3.7K

181.9K

David Thomas@davidthomas426·12 Mar

@FPupusas @ThePrimeagen His complaint of inline autocomplete was primarily based on his use of Copilot well over 6 months ago, as he made clear in his responses to you in this thread.

English

0

36

David Thomas@davidthomas426·12 Mar

@FPupusas @ThePrimeagen I’m oversimplifying. It depends on what you are doing, how much effort you put into setting everything up, and how much you care about being hands-on vs. hands-off, and many other things.

English

9

David Thomas@davidthomas426·12 Mar

@FPupusas @ThePrimeagen For people who didn’t know what they were doing, inline autocomplete was amazing, but for someone like Prime it was not. Then things got better. Now autocomplete is great for Prime, agents are amazing for folks who don’t know what they are doing but not for people like Prime.

English

0

1

78

David Thomas retweetledi

Aryaman Arora@aryaman2020·27 Şub

I'm pretty annoyed that Hypersteer (a work by some of my friends applying hypernetworks to produce very effective steering vectors from text descriptions) has not received the appropriate amount of credit in later work pursuing basically the same idea arxiv.org/abs/2506.03292

Sakana AI@SakanaAILabs

We’re excited to introduce Doc-to-LoRA and Text-to-LoRA, two related research exploring how to make LLM customization faster and more accessible. pub.sakana.ai/doc-to-lora/ By training a Hypernetwork to generate LoRA adapters on the fly, these methods allow models to instantly internalize new information or adapt to new tasks. Biological systems naturally rely on two key cognitive abilities: durable long-term memory to store facts, and rapid adaptation to handle new tasks given limited sensory cues. While modern LLMs are highly capable, they still lack this flexibility. Traditionally, adding long-term memory or adapting an LLM to a specific downstream task requires an expensive and time-consuming model update, such as fine-tuning or context distillation, or relies on memory-intensive long prompts. To bypass these limitations, our work focuses on the concept of cost amortization. We pay the meta-training cost once to train a hypernetwork capable of producing tasks or document specific LoRAs on demand. This turns what used to be a heavy engineering pipeline into a single, inexpensive forward pass. Instead of performing per-task optimization, the hypernetwork meta-learns update rules to instantly modify an LLM given a new task description or a long document. In our experiments, Text-to-LoRA successfully specializes models to unseen tasks using just a natural language description. Building on this, Doc-to-LoRA is able to internalize factual documents. On a needle-in-a-haystack task, Doc-to-LoRA achieves near-perfect accuracy on instances five times longer than the base model's context window. It can even generalize to transfer visual information from a vision-language model into a text-only LLM, allowing it to classify images purely through internalized weights. Importantly, both methods run with sub-second latency, enabling rapid experimentation while avoiding the overhead of traditional model updates. This approach is a step towards lowering the technical barriers of model customization, allowing end-users to specialize foundation models via simple text inputs. We have released our code and papers for the community to explore. Doc-to-LoRA Paper: arxiv.org/abs/2602.15902 Code: github.com/SakanaAI/Doc-t… Text-to-LoRA Paper: arxiv.org/abs/2506.06105 Code: github.com/SakanaAI/Text-…

English

9

25

384

49.1K

David Thomas retweetledi

Sakana AI@SakanaAILabs·27 Şub

We’re excited to introduce Doc-to-LoRA and Text-to-LoRA, two related research exploring how to make LLM customization faster and more accessible. pub.sakana.ai/doc-to-lora/ By training a Hypernetwork to generate LoRA adapters on the fly, these methods allow models to instantly internalize new information or adapt to new tasks. Biological systems naturally rely on two key cognitive abilities: durable long-term memory to store facts, and rapid adaptation to handle new tasks given limited sensory cues. While modern LLMs are highly capable, they still lack this flexibility. Traditionally, adding long-term memory or adapting an LLM to a specific downstream task requires an expensive and time-consuming model update, such as fine-tuning or context distillation, or relies on memory-intensive long prompts. To bypass these limitations, our work focuses on the concept of cost amortization. We pay the meta-training cost once to train a hypernetwork capable of producing tasks or document specific LoRAs on demand. This turns what used to be a heavy engineering pipeline into a single, inexpensive forward pass. Instead of performing per-task optimization, the hypernetwork meta-learns update rules to instantly modify an LLM given a new task description or a long document. In our experiments, Text-to-LoRA successfully specializes models to unseen tasks using just a natural language description. Building on this, Doc-to-LoRA is able to internalize factual documents. On a needle-in-a-haystack task, Doc-to-LoRA achieves near-perfect accuracy on instances five times longer than the base model's context window. It can even generalize to transfer visual information from a vision-language model into a text-only LLM, allowing it to classify images purely through internalized weights. Importantly, both methods run with sub-second latency, enabling rapid experimentation while avoiding the overhead of traditional model updates. This approach is a step towards lowering the technical barriers of model customization, allowing end-users to specialize foundation models via simple text inputs. We have released our code and papers for the community to explore. Doc-to-LoRA Paper: arxiv.org/abs/2602.15902 Code: github.com/SakanaAI/Doc-t… Text-to-LoRA Paper: arxiv.org/abs/2506.06105 Code: github.com/SakanaAI/Text-…

GIF

English

74

354

2.2K

595.4K

David Thomas@davidthomas426·26 Şub

@_arohan_ How did they invent terminology? “expert parallelism” parallelizes along the expert dimension. By the way, IMO it’s not just sharding, but also includes how to reorganize the computation to minimize communication. TP combines sharding different tensors along two different dims.

English

69

rohan anil@_arohan_·25 Şub

A bit polarizing comment: its too late but I kind of think whoever named TP DP EP combinations probably slowed down progress by inventing terminology thats borderline absurd to describe basic sharding

English

9

11

156

38.1K

David Thomas@davidthomas426·25 Oca

@turbofish_pk @TheGingerBill @aramh @effectfully lol nice typo

English

17

turbofish@turbofish_pk·24 Oca

@TheGingerBill @aramh @effectfully No one better than errectfully to comment on this and he could also help you improve Odin if you wanted.

English

0

1

521

Aram Hăvărneanu@aramh·24 Oca

Sad (!) litmus test for telling whether someone has anything interesting to crcriticize about PLT (and there is plenty to criticize). They think that Haskell is some sort of epitome of PLT. It's like when discussing GPUs someone brings up the 386.

English

8

2

63

8.8K

David Thomas@davidthomas426·19 Oca

@marksaroufim @a1zhang this is perfect 😂. My mind could not figure out whether to crack up or be infuriated at it.

English