David Thomas

1.8K posts

David Thomas

David Thomas

@davidthomas426

@[email protected]

Katılım Ekim 2011
358 Takip Edilen65 Takipçiler
David Thomas
David Thomas@davidthomas426·
@RickT9900022 @SebAaltonen And my points in bringing up who @SebAaltonen are: - He is saying that (2) above is true, and if he is saying that of all people, believe it. - He is NOT saying that (1) isn’t true. He (and I) would agree with you there, to a point. If you look at his posts, he pushes for better!
English
2
0
0
10
Sebastian Aaltonen
Sebastian Aaltonen@SebAaltonen·
I am vocal about 8GB RAM laptops because I am a game dev. We were also vocal about cut down 10GB RAM in Xbox Series S. Steam Deck is a 4 year old $399 gaming device with 16GB RAM. It's the modern min spec for gaming. I would love to see more games on Macs, but 8GB alienates devs.
Sebastian Aaltonen tweet media
English
252
29
571
158.9K
David Thomas
David Thomas@davidthomas426·
@RickT9900022 @SebAaltonen Yep, go ahead, you can voice whatever opinion you want, even if you have no idea what you are talking about. You could yell that game devs should just make amazing games run in 256KB of RAM at 900 FPS. Why not? Obviously they are just lazy, right? lol
English
1
0
0
14
David Thomas
David Thomas@davidthomas426·
@RickT9900022 @SebAaltonen Dude, you have no idea who you are talking to lol. He is the most outspoken game dev I can think of about optimizing games to run on low spec devices. 8GB (shared between CPU and GPU on Macs) is not enough for the resolution and level of detail of modern AAA games.
English
1
0
0
16
Kcir
Kcir@RickT9900022·
@SebAaltonen how about you damn game devs optimize the games to run on 8 GB? why does the game need to be needing more than that?
English
2
0
1
306
dano
dano@danoboltup·
@ChrisO_wiki @HumanistQuaker he didn’t ask for “help” he called them out while their hysterical about the oil shipping lanes. so he said “if your so worried come help” they won’t, which proves their pussies that are full of shit.
English
21
0
6
1.8K
David Thomas
David Thomas@davidthomas426·
@FPupusas @ThePrimeagen And yes, the same very well may happen with agents. i also said it’s complicated and to some degree I agree with you. Still, while beginners can immediately see the benefits, what people like Prime immediately feel are the pain points, drop in productivity, and loss of control.
English
0
0
0
15
FortyTwoPupusas
FortyTwoPupusas@FPupusas·
@davidthomas426 @ThePrimeagen supermaven has not changed in the last 6 moths because it was sunset 🤷‍♂️ it didn't get better, prime just had a few months to get over his initial "i want to write my own code" bias and now he can actually like the tool same will happen with agents
English
2
0
1
81
ThePrimeagen
ThePrimeagen@ThePrimeagen·
i am using supermaven again and i have something to say about this whole AI thing. I think as a group (swe) we rushed so fast into Agents when inline autocomplete + actual skills is crazy. A good autocomplete that is fast like supermaven actually makes marked proficiency gains, while saving me from cognitive debt that comes from agents. With agents you reach a point where you must fully rely on their output and your grip on the codebase slips. Its insane how good cursor Tab is. Seriously, I think we had something that genuinely makes improvement to ones code ability (if you have it). Truly acts as a multiplier, and we left it in the dust because it is not sexy. hurts me on the inside.
English
219
134
3.7K
181.9K
David Thomas
David Thomas@davidthomas426·
@FPupusas @ThePrimeagen His complaint of inline autocomplete was primarily based on his use of Copilot well over 6 months ago, as he made clear in his responses to you in this thread.
English
1
0
0
36
David Thomas
David Thomas@davidthomas426·
@FPupusas @ThePrimeagen I’m oversimplifying. It depends on what you are doing, how much effort you put into setting everything up, and how much you care about being hands-on vs. hands-off, and many other things.
English
0
0
0
9
David Thomas
David Thomas@davidthomas426·
@FPupusas @ThePrimeagen For people who didn’t know what they were doing, inline autocomplete was amazing, but for someone like Prime it was not. Then things got better. Now autocomplete is great for Prime, agents are amazing for folks who don’t know what they are doing but not for people like Prime.
English
2
0
1
78
David Thomas retweetledi
Aryaman Arora
Aryaman Arora@aryaman2020·
I'm pretty annoyed that Hypersteer (a work by some of my friends applying hypernetworks to produce very effective steering vectors from text descriptions) has not received the appropriate amount of credit in later work pursuing basically the same idea arxiv.org/abs/2506.03292
Sakana AI@SakanaAILabs

We’re excited to introduce Doc-to-LoRA and Text-to-LoRA, two related research exploring how to make LLM customization faster and more accessible. pub.sakana.ai/doc-to-lora/ By training a Hypernetwork to generate LoRA adapters on the fly, these methods allow models to instantly internalize new information or adapt to new tasks. Biological systems naturally rely on two key cognitive abilities: durable long-term memory to store facts, and rapid adaptation to handle new tasks given limited sensory cues. While modern LLMs are highly capable, they still lack this flexibility. Traditionally, adding long-term memory or adapting an LLM to a specific downstream task requires an expensive and time-consuming model update, such as fine-tuning or context distillation, or relies on memory-intensive long prompts. To bypass these limitations, our work focuses on the concept of cost amortization. We pay the meta-training cost once to train a hypernetwork capable of producing tasks or document specific LoRAs on demand. This turns what used to be a heavy engineering pipeline into a single, inexpensive forward pass. Instead of performing per-task optimization, the hypernetwork meta-learns update rules to instantly modify an LLM given a new task description or a long document. In our experiments, Text-to-LoRA successfully specializes models to unseen tasks using just a natural language description. Building on this, Doc-to-LoRA is able to internalize factual documents. On a needle-in-a-haystack task, Doc-to-LoRA achieves near-perfect accuracy on instances five times longer than the base model's context window. It can even generalize to transfer visual information from a vision-language model into a text-only LLM, allowing it to classify images purely through internalized weights. Importantly, both methods run with sub-second latency, enabling rapid experimentation while avoiding the overhead of traditional model updates. This approach is a step towards lowering the technical barriers of model customization, allowing end-users to specialize foundation models via simple text inputs. We have released our code and papers for the community to explore. Doc-to-LoRA Paper: arxiv.org/abs/2602.15902 Code: github.com/SakanaAI/Doc-t… Text-to-LoRA Paper: arxiv.org/abs/2506.06105 Code: github.com/SakanaAI/Text-…

English
9
25
384
49.1K
David Thomas retweetledi
Sakana AI
Sakana AI@SakanaAILabs·
We’re excited to introduce Doc-to-LoRA and Text-to-LoRA, two related research exploring how to make LLM customization faster and more accessible. pub.sakana.ai/doc-to-lora/ By training a Hypernetwork to generate LoRA adapters on the fly, these methods allow models to instantly internalize new information or adapt to new tasks. Biological systems naturally rely on two key cognitive abilities: durable long-term memory to store facts, and rapid adaptation to handle new tasks given limited sensory cues. While modern LLMs are highly capable, they still lack this flexibility. Traditionally, adding long-term memory or adapting an LLM to a specific downstream task requires an expensive and time-consuming model update, such as fine-tuning or context distillation, or relies on memory-intensive long prompts. To bypass these limitations, our work focuses on the concept of cost amortization. We pay the meta-training cost once to train a hypernetwork capable of producing tasks or document specific LoRAs on demand. This turns what used to be a heavy engineering pipeline into a single, inexpensive forward pass. Instead of performing per-task optimization, the hypernetwork meta-learns update rules to instantly modify an LLM given a new task description or a long document. In our experiments, Text-to-LoRA successfully specializes models to unseen tasks using just a natural language description. Building on this, Doc-to-LoRA is able to internalize factual documents. On a needle-in-a-haystack task, Doc-to-LoRA achieves near-perfect accuracy on instances five times longer than the base model's context window. It can even generalize to transfer visual information from a vision-language model into a text-only LLM, allowing it to classify images purely through internalized weights. Importantly, both methods run with sub-second latency, enabling rapid experimentation while avoiding the overhead of traditional model updates. This approach is a step towards lowering the technical barriers of model customization, allowing end-users to specialize foundation models via simple text inputs. We have released our code and papers for the community to explore. Doc-to-LoRA Paper: arxiv.org/abs/2602.15902 Code: github.com/SakanaAI/Doc-t… Text-to-LoRA Paper: arxiv.org/abs/2506.06105 Code: github.com/SakanaAI/Text-…
GIF
English
74
354
2.2K
595.4K
David Thomas
David Thomas@davidthomas426·
@_arohan_ How did they invent terminology? “expert parallelism” parallelizes along the expert dimension. By the way, IMO it’s not just sharding, but also includes how to reorganize the computation to minimize communication. TP combines sharding different tensors along two different dims.
English
0
0
0
69
rohan anil
rohan anil@_arohan_·
A bit polarizing comment: its too late but I kind of think whoever named TP DP EP combinations probably slowed down progress by inventing terminology thats borderline absurd to describe basic sharding
English
9
11
156
38.1K
Aram Hăvărneanu
Aram Hăvărneanu@aramh·
Sad (!) litmus test for telling whether someone has anything interesting to crcriticize about PLT (and there is plenty to criticize). They think that Haskell is some sort of epitome of PLT. It's like when discussing GPUs someone brings up the 386.
English
8
2
63
8.8K
David Thomas
David Thomas@davidthomas426·
@marksaroufim @a1zhang this is perfect 😂. My mind could not figure out whether to crack up or be infuriated at it.
English
0
0
1
56
Mark Saroufim
Mark Saroufim@marksaroufim·
New paper dropped by Anthropic: "Fractal Language Models" It DESTROYS the context window narrative. The LLM doesn't just respond, it splits into self similar copies No tokens but models arguing, compressing until the prompt is not read but self reconstructed /satire @a1zhang
Mark Saroufim tweet media
English
43
85
791
156K
Dmitrii Kovanikov
Dmitrii Kovanikov@ChShersh·
SWE interview question at an HFT firm. Make the following logger as fast as possible. Great candidates suggest at least 10 improvements. Go.
Dmitrii Kovanikov tweet media
English
181
43
1.3K
376.6K
Dmitrii Kovanikov
Dmitrii Kovanikov@ChShersh·
@codewithimanshu Himanshu, writing your replies manually is crucial in building trust, and your reply provides some valuable insights.
English
10
2
485
22.2K