Ryan Kanno

23.6K posts

Ryan Kanno banner
Ryan Kanno

Ryan Kanno

@ryankanno

Sometimes I play a little chess - https://t.co/jZ7N1nX4Yt

Honolulu, HI Katılım Kasım 2006
2.1K Takip Edilen2.2K Takipçiler
Ryan Kanno retweetledi
NASA
NASA@NASA·
We're going around the Moon. Come watch with us. Artemis II's four-astronaut crew is lifting off from @NASAKennedy on an approximately 10-day mission that will bring us closer to living on the Moon and Mars. The launch window opens at 6:24pm ET (2224 UTC). twitter.com/i/broadcasts/1…
English
5.8K
29.5K
109K
16.1M
Ryan Kanno retweetledi
Illinois Men's Basketball
Illinois Men's Basketball@IlliniMBB·
ARE YOU WITH US? For the first time since 2005, we're heading to the FINAL FOUR.
Illinois Men's Basketball tweet media
English
262
3.7K
13.3K
624.1K
Ryan Kanno retweetledi
Sam Krapf
Sam Krapf@sam_gzstrength·
Oh look, a toddlers 1 day supply of berries
Sam Krapf tweet media
English
291
653
18K
835.1K
Ryan Kanno
Ryan Kanno@ryankanno·
@rubyjedi Sure, but those were classified hurricanes - I can't imagine what a hurricane would do if a storm can do this.
English
0
0
0
21
Lalee in CyberSpace
Lalee in CyberSpace@rubyjedi·
@ryankanno 'Was about like this for Iwa and Iniki, as I recall. Pretty rare to get sustained winds to the point where trees come down.
English
1
0
1
23
Ryan Kanno
Ryan Kanno@ryankanno·
Just wild how many people lost power in Hawaii from the storm. Growing up, I don't ever remember our infrastructure being so fragile. Can't imagine how much food spoilage there was. Tokyo gets hit with typhoons, and it's barely a blip. =\.
English
1
0
0
59
Ryan Kanno retweetledi
Kevin Roose
Kevin Roose@kevinroose·
We made a blind taste test to see whether NYT readers prefer human writing or AI writing. 86,000 people have taken it so far, and the results are fascinating. Overall, 54% of quiz-takers prefer AI. A real moment! nytimes.com/interactive/20…
English
436
426
3.1K
3.5M
Ryan Kanno retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.
Andrej Karpathy tweet media
English
972
2.1K
19.5K
3.6M
Ryan Kanno retweetledi
Paul Novosad
Paul Novosad@paulnovosad·
From Ezra Klein, more true than ever. You would not believe how many shortcuts everyone else is taking. In many areas, you can get way ahead of everyone just by doing the work. More true than ever now, when more people are shirking and AI lets you do 10x if you try. 1/
Paul Novosad tweet media
English
34
451
5.1K
954.9K
Ryan Kanno retweetledi
Obsidian
Obsidian@obsdmd·
Obsidian Sync now has a headless client, so you can sync vaults to a server without using the desktop app. Try the open beta:
Obsidian tweet media
English
113
319
4.1K
865.7K
Ryan Kanno retweetledi
hardmaru
hardmaru@hardmaru·
Instead of forcing models to hold everything in an active context window, we can use hypernetworks to instantly compile documents and tasks directly into the model's weights. A step towards giving language models durable memory and fast adaptation. Blog: pub.sakana.ai/doc-to-lora/
Sakana AI@SakanaAILabs

We’re excited to introduce Doc-to-LoRA and Text-to-LoRA, two related research exploring how to make LLM customization faster and more accessible. pub.sakana.ai/doc-to-lora/ By training a Hypernetwork to generate LoRA adapters on the fly, these methods allow models to instantly internalize new information or adapt to new tasks. Biological systems naturally rely on two key cognitive abilities: durable long-term memory to store facts, and rapid adaptation to handle new tasks given limited sensory cues. While modern LLMs are highly capable, they still lack this flexibility. Traditionally, adding long-term memory or adapting an LLM to a specific downstream task requires an expensive and time-consuming model update, such as fine-tuning or context distillation, or relies on memory-intensive long prompts. To bypass these limitations, our work focuses on the concept of cost amortization. We pay the meta-training cost once to train a hypernetwork capable of producing tasks or document specific LoRAs on demand. This turns what used to be a heavy engineering pipeline into a single, inexpensive forward pass. Instead of performing per-task optimization, the hypernetwork meta-learns update rules to instantly modify an LLM given a new task description or a long document. In our experiments, Text-to-LoRA successfully specializes models to unseen tasks using just a natural language description. Building on this, Doc-to-LoRA is able to internalize factual documents. On a needle-in-a-haystack task, Doc-to-LoRA achieves near-perfect accuracy on instances five times longer than the base model's context window. It can even generalize to transfer visual information from a vision-language model into a text-only LLM, allowing it to classify images purely through internalized weights. Importantly, both methods run with sub-second latency, enabling rapid experimentation while avoiding the overhead of traditional model updates. This approach is a step towards lowering the technical barriers of model customization, allowing end-users to specialize foundation models via simple text inputs. We have released our code and papers for the community to explore. Doc-to-LoRA Paper: arxiv.org/abs/2602.15902 Code: github.com/SakanaAI/Doc-t… Text-to-LoRA Paper: arxiv.org/abs/2506.06105 Code: github.com/SakanaAI/Text-…

English
66
231
2.5K
303.5K
Ryan Kanno retweetledi
dr. jack morris
dr. jack morris@jxmnop·
actually wtf somebody wrote a paper about the 491-parameter transformer they trained for 10-digit addition turns out Codex can one-shot the task. 100% with only 343 parameters. the solution is a single function 'hand_set_weights_magic' and it looks like this:
dr. jack morris tweet media
N8 Programs@N8Programs

Beat it by having Codex hand-craft weights: gist.github.com/N8python/02e41… 100% accuracy on 10 million random test cases w/ only 343 parameters. As a bonus, it uses the vanilla Qwen3 architecture, just with the right weights.

English
65
110
2.3K
638.9K
Ryan Kanno retweetledi
tetsuo
tetsuo@tetsuoai·
I can't believe someone would just steal from Anthropic like this. The millions of man-hours Anthropic spent hand-writing code, text, art, books, etc. to generate enough data for training must be taken into consideration here. Where is the respect for IP?
Anthropic@AnthropicAI

We’ve identified industrial-scale distillation attacks on our models by DeepSeek, Moonshot AI, and MiniMax. These labs created over 24,000 fraudulent accounts and generated over 16 million exchanges with Claude, extracting its capabilities to train and improve their own models.

English
366
1K
13.5K
1.2M
Ryan Kanno retweetledi
⛩ Ryo Saeba | Japon XYZ ⛩
⛩ Ryo Saeba | Japon XYZ ⛩@Ryo_Saeba_3·
Ce japonais a tenté l'expérience d'entrer dans le métro à la sortie C8 de la gare de Shinjuku Sanchome et de ressortir par la sortie E8 de la gare de Nishi-Shinjuku. Sans faire de recherche au préalable, ça lui a pris 27 minutes et 20 sec en marchant 😄
Français
87
333
5.9K
748.6K
Ryan Kanno retweetledi
Nadieh Bremer
Nadieh Bremer@NadiehBremer·
📣 NEW! I’ve just released the BIGGEST and perhaps most creative project I’ve ever worked on! “Searching for Birds” searchingforbirds.visualcinnamon.com 🐤 A #dataviz article & exploration that dives into the data that connects humans with birds, by looking at how we search for birds.
English
150
1.3K
6.6K
415.3K
Ryan Kanno retweetledi
Jeff Clune
Jeff Clune@jeffclune·
Can AI agents design better memory mechanisms for themselves? Introducing Learning to Continually Learn via Meta-learning Memory Designs. A meta agent automatically designs memory mechanisms, including what info to store, how to retrieve it, and how to update it, enabling agentic systems to continually learn across diverse domains. Led by @yimingxiong_ with @shengranhu 🧵👇 1/
GIF
English
79
188
1.3K
230.6K
Ryan Kanno retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
New art project. Train and inference GPT in 243 lines of pure, dependency-free Python. This is the *full* algorithmic content of what is needed. Everything else is just for efficiency. I cannot simplify this any further. gist.github.com/karpathy/8627f…
English
653
3.1K
25.2K
5.2M