Ryan Kanno

23.6K posts

Ryan Kanno

@ryankanno

Sometimes I play a little chess - https://t.co/jZ7N1nX4Yt

Honolulu, HI Katılım Kasım 2006

2.1K Takip Edilen2.2K Takipçiler

Sabitlenmiş Tweet

Ryan Kanno@ryankanno·2 Kas

Haha, so much truth to this. twitter.com/MalwareTechBlo…

English

Ryan Kanno retweetledi

NASA@NASA·3d

We're going around the Moon. Come watch with us. Artemis II's four-astronaut crew is lifting off from @NASAKennedy on an approximately 10-day mission that will bring us closer to living on the Moon and Mars. The launch window opens at 6:24pm ET (2224 UTC). twitter.com/i/broadcasts/1…

English

5.8K

29.5K

109K

16.1M

Ryan Kanno retweetledi

Illinois Men's Basketball@IlliniMBB·29 Mar

ARE YOU WITH US? For the first time since 2005, we're heading to the FINAL FOUR.

English

262

3.7K

13.3K

624.1K

Ryan Kanno retweetledi

Sam Krapf@sam_gzstrength·18 Mar

Oh look, a toddlers 1 day supply of berries

English

291

653

18K

835.1K

Ryan Kanno@ryankanno·16 Mar

@rubyjedi Sure, but those were classified hurricanes - I can't imagine what a hurricane would do if a storm can do this.

English

Lalee in CyberSpace@rubyjedi·16 Mar

@ryankanno 'Was about like this for Iwa and Iniki, as I recall. Pretty rare to get sustained winds to the point where trees come down.

English

Ryan Kanno@ryankanno·15 Mar

Just wild how many people lost power in Hawaii from the storm. Growing up, I don't ever remember our infrastructure being so fragile. Can't imagine how much food spoilage there was. Tokyo gets hit with typhoons, and it's barely a blip. =\.

English

Ryan Kanno retweetledi

Sebastian Raschka@rasbt·15 Mar

I (finally) put together a new LLM Architecture Gallery that collects the architecture figures all in one place! sebastianraschka.com/llm-architectu…

English

202

1.5K

8.2K

716.4K

Ryan Kanno retweetledi

Kevin Roose@kevinroose·10 Mar

We made a blind taste test to see whether NYT readers prefer human writing or AI writing. 86,000 people have taken it so far, and the results are fascinating. Overall, 54% of quiz-takers prefer AI. A real moment! nytimes.com/interactive/20…

English

436

426

3.1K

3.5M

Ryan Kanno retweetledi

Andrej Karpathy@karpathy·10 Mar

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

English

972

2.1K

19.5K

3.6M

Ryan Kanno retweetledi

Paul Novosad@paulnovosad·6 Mar

From Ezra Klein, more true than ever. You would not believe how many shortcuts everyone else is taking. In many areas, you can get way ahead of everyone just by doing the work. More true than ever now, when more people are shirking and AI lets you do 10x if you try. 1/

English

451

5.1K

954.9K

Ryan Kanno retweetledi

Obsidian@obsdmd·27 Şub

Obsidian Sync now has a headless client, so you can sync vaults to a server without using the desktop app. Try the open beta:

English

113

319

4.1K

865.7K

Ryan Kanno retweetledi

hardmaru@hardmaru·27 Şub

Instead of forcing models to hold everything in an active context window, we can use hypernetworks to instantly compile documents and tasks directly into the model's weights. A step towards giving language models durable memory and fast adaptation. Blog: pub.sakana.ai/doc-to-lora/

Sakana AI@SakanaAILabs

We’re excited to introduce Doc-to-LoRA and Text-to-LoRA, two related research exploring how to make LLM customization faster and more accessible. pub.sakana.ai/doc-to-lora/ By training a Hypernetwork to generate LoRA adapters on the fly, these methods allow models to instantly internalize new information or adapt to new tasks. Biological systems naturally rely on two key cognitive abilities: durable long-term memory to store facts, and rapid adaptation to handle new tasks given limited sensory cues. While modern LLMs are highly capable, they still lack this flexibility. Traditionally, adding long-term memory or adapting an LLM to a specific downstream task requires an expensive and time-consuming model update, such as fine-tuning or context distillation, or relies on memory-intensive long prompts. To bypass these limitations, our work focuses on the concept of cost amortization. We pay the meta-training cost once to train a hypernetwork capable of producing tasks or document specific LoRAs on demand. This turns what used to be a heavy engineering pipeline into a single, inexpensive forward pass. Instead of performing per-task optimization, the hypernetwork meta-learns update rules to instantly modify an LLM given a new task description or a long document. In our experiments, Text-to-LoRA successfully specializes models to unseen tasks using just a natural language description. Building on this, Doc-to-LoRA is able to internalize factual documents. On a needle-in-a-haystack task, Doc-to-LoRA achieves near-perfect accuracy on instances five times longer than the base model's context window. It can even generalize to transfer visual information from a vision-language model into a text-only LLM, allowing it to classify images purely through internalized weights. Importantly, both methods run with sub-second latency, enabling rapid experimentation while avoiding the overhead of traditional model updates. This approach is a step towards lowering the technical barriers of model customization, allowing end-users to specialize foundation models via simple text inputs. We have released our code and papers for the community to explore. Doc-to-LoRA Paper: arxiv.org/abs/2602.15902 Code: github.com/SakanaAI/Doc-t… Text-to-LoRA Paper: arxiv.org/abs/2506.06105 Code: github.com/SakanaAI/Text-…

English

231

2.5K

303.5K

Ryan Kanno retweetledi

dr. jack morris@jxmnop·25 Şub

actually wtf somebody wrote a paper about the 491-parameter transformer they trained for 10-digit addition turns out Codex can one-shot the task. 100% with only 343 parameters. the solution is a single function 'hand_set_weights_magic' and it looks like this:

N8 Programs@N8Programs

Beat it by having Codex hand-craft weights: gist.github.com/N8python/02e41… 100% accuracy on 10 million random test cases w/ only 343 parameters. As a bonus, it uses the vanilla Qwen3 architecture, just with the right weights.

English

110

2.3K

638.9K

Ryan Kanno retweetledi

tetsuo@tetsuoai·23 Şub

I can't believe someone would just steal from Anthropic like this. The millions of man-hours Anthropic spent hand-writing code, text, art, books, etc. to generate enough data for training must be taken into consideration here. Where is the respect for IP?

Anthropic@AnthropicAI

We’ve identified industrial-scale distillation attacks on our models by DeepSeek, Moonshot AI, and MiniMax. These labs created over 24,000 fraudulent accounts and generated over 16 million exchanges with Claude, extracting its capabilities to train and improve their own models.

English

366

13.5K

1.2M

Ryan Kanno retweetledi

Amy Tam@amytam01·17 Şub

x.com/i/article/2023…

ZXX

186

826

7.5K

2.7M

Ryan Kanno retweetledi

⛩ Ryo Saeba | Japon XYZ ⛩@Ryo_Saeba_3·17 Şub

Ce japonais a tenté l'expérience d'entrer dans le métro à la sortie C8 de la gare de Shinjuku Sanchome et de ressortir par la sortie E8 de la gare de Nishi-Shinjuku. Sans faire de recherche au préalable, ça lui a pris 27 minutes et 20 sec en marchant 😄

Français

333

5.9K

748.6K

Ryan Kanno retweetledi

Vinod Khosla@vkhosla·15 Şub

People with curiosity, agency, taste and risk taking will take advantage the best...

Derya Unutmaz, MD@DeryaTR_

I’ve said this before & feel compelled to say it again: people with curiosity & agency who fully use AI will rapidly gain expertise & accomplish things faster than others with greater intelligence, knowledge, or experience, because those advantages are becoming cheap commodities!

English

107

107.9K

Ryan Kanno@ryankanno·14 Şub

Crazy ending to a wild 4th quarter.

Brian McInnis@Brian_McInnis

BEDLAM in the Stan Sheriff Center as Kahuku storms back and wins the @HHSAAsports Div I title on this backdoor play with 4 seconds left, stunning defending champ Punahou 40-38.

English

Ryan Kanno retweetledi

Nadieh Bremer@NadiehBremer·12 Şub

📣 NEW! I’ve just released the BIGGEST and perhaps most creative project I’ve ever worked on! “Searching for Birds” searchingforbirds.visualcinnamon.com 🐤 A #dataviz article & exploration that dives into the data that connects humans with birds, by looking at how we search for birds.

English

150

1.3K

6.6K

415.3K

Ryan Kanno retweetledi

Jeff Clune@jeffclune·10 Şub

Can AI agents design better memory mechanisms for themselves? Introducing Learning to Continually Learn via Meta-learning Memory Designs. A meta agent automatically designs memory mechanisms, including what info to store, how to retrieve it, and how to update it, enabling agentic systems to continually learn across diverse domains. Led by @yimingxiong_ with @shengranhu 🧵👇 1/

GIF

English

188

1.3K

230.6K

Ryan Kanno retweetledi

Andrej Karpathy@karpathy·12 Şub

New art project. Train and inference GPT in 243 lines of pure, dependency-free Python. This is the *full* algorithmic content of what is needed. Everything else is just for efficiency. I cannot simplify this any further. gist.github.com/karpathy/8627f…

English

653

3.1K

25.2K

5.2M

Keşfet

@NASAKennedy @rubyjedi @yimingxiong_ @shengranhu @elonmusk @BarackObama @taylorswift13 @cristiano