Alfaxad

1.6K posts

Alfaxad

@alfxad

alfa. working on something beautiful!! @NadhariAI

Kyoto, Japan Katılım Mart 2020

336 Takip Edilen483 Takipçiler

Sabitlenmiş Tweet

Alfaxad@alfxad·1 Oca

happy new year!!! go make something beautiful!

English

854

Alfaxad@alfxad·23h

@thsottiaux thanks tibo, i really needed this reset 🥹

English

Tibo@thsottiaux·1d

I realize yesterday’s Codex reset came in a bit at an unfortunate time given the last one was almost perfectly a week ago. To really celebrate the 3M I’ll reset again tomorrow. Thanks for the feedback!

English

639

294

6.5K

499.3K

Alfaxad retweetledi

𝚟𝚒𝚎 ⟢@viemccoy·20 Oca

I want to bring beauty into the world. I want to bring back wonder for people on a scale that cannot be fathomed by the modern psyche. I want to show everyone the magic that I've found and help them to integrate it into their own lives. This is a wonderful world, all for us.

English

105

3.7K

Alfaxad@alfxad·5d

Spring, 2026 📍Kyoto

Norsk

Alfaxad retweetledi

Google DeepMind@GoogleDeepMind·2 Nis

Meet Gemma 4: our new family of open models you can run on your own hardware. Built for advanced reasoning and agentic workflows, we’re releasing them under an Apache 2.0 license. Here’s what’s new 🧵

GIF

English

361

1.2K

8.8K

3.8M

Alfaxad@alfxad·22 Mar

@taniiicom おめでとうございます!!

日本語

Taniii@taniiicom·22 Mar

実績解除: Computer Science (学士)を取得しました

Taniii@taniiicom

無事、大学を卒業しました

日本語

114

6.5K

Alfaxad@alfxad·20 Mar

just watched project hail mary, it’s a good movie🔥

English

174

Alfaxad@alfxad·20 Mar

@tszzl you are wrong on this one. curious though, who would you cast?

English

roon@tszzl·20 Mar

the dune movies were doomed from the start to be good and not great due to the casting of chalamet as paul. he does not have the gravitas for a child-god and is much better suited for kind of silly coming of age movies

English

1.1K

3.3K

3.1M

Alfaxad@alfxad·13 Mar

Just Watched Marty Supreme for the first time!!! Wow!!!

English

Alfaxad@alfxad·10 Mar

it is what i've noticed as well, it gets easier and parallelize experiments and get instant feedback on research direction, but one still needs deep domain understanding to pull off great results. but, in some kaggle competitions i'm doing,which have clear objectives, agents just know what to do, and i let them go off and just report results to me.

English

Kai Arulkumaran@kaixhin·10 Mar

Andrej has a very good sense of what's currently doable, and I think this needs to be taken at face value - agents can now go off and do serious ML tuning work (incl. fixing bugs and adding improvements from related research), which is amazing, but also far from the full researcher job *Taste* is the key - humans need to get out there and decide what avenues are worth pursuing

Andrej Karpathy@karpathy

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

English

3.1K

Alfaxad@alfxad·7 Mar

@GoogleResearch x.com/alfxad/status/…

Alfaxad@alfxad

x.com/i/article/2027…

QME

184

Google Research@GoogleResearch·7 Mar

The MedGemma Impact Challenge has come to a close with a remarkable 850+ submissions in six weeks. We invited developers to prototype human-centered AI applications using Google’s Health AI Developer Foundations (HAI-DEF), and the response was inspiring. These open models are designed to accelerate the development of privacy-focused, adaptable AI for the front lines of care — and the community delivered. Thank you to every participant for helping us bridge the gap between AI research and clinical impact. Stay tuned as we review the submissions and announce the winners. Also be sure to follow our Google Research Kaggle site as we plan more challenges in the coming months: kaggle.com/organizations/…

English

465

22.6K

Alfaxad@alfxad·5 Mar

@mbrendan1 😅😅

QME

Brendan McCord 🏛️ x 🤖@Brendan_McCord·5 Mar

"4 of the 5 largest defense primes are now customers" Seems like someone's dropping the ball

Molly O’Shea@MollySOShea

BREAKING: Nominal Hits $1B Valuation — Founders Fund Preempts $80M B-2 Acceleration Round Just 10 months after @Nominal_io's $75M Series B led by Sequoia's Alfred Lin CEO Cameron McCord & Trae Stephens (Founders Fund + Anduril) join Sourcery to discuss: - $155M raised in 10 months, nearly $200M total funding - Early validation at Anduril - 4 of the 5 largest defense primes are now customers - Reducing major hardware test campaigns by 50–60% (which can cost ~$3M per day) - Strategy: platform expansion, potential M&A, and aggressive hiring - Anduril LORE All systems Nominal. @CameronLMcCord @traestephens @Alfred_Lin 𝐓𝐈𝐌𝐄𝐒𝐓𝐀𝐌𝐏𝐒 (00:00) Trae Stephens & Cameron McCord (01:11) Nominal raises $80M from Founders Fund (03:32) Why Founders Fund made the investment (05:22) Palantir, FF, Anduril: Trae is a "Slashie" (07:14) Nominal: the GitHub for hardware testing (12:33) Why Sequoia believed Nominal’s TAM was much bigger (15:36) Inside Nominal’s growing defense customer base (17:32) Why government hardware testing still relies on Excel and MATLAB (22:22) Cutting hardware testing time by up to 60% (26:55) How AI changes hardware development (37:22) Why the government is backing new defense tech companies (33:35) Nominal's sales strategy (45:44) Competing with legacy software giants (46:24) Recruiting top engineers (50:56) Early Anduril LORE: Stories from the desert

English

4.7K

Alfaxad@alfxad·1 Mar

@radbackwards 🫶🏽

QME

130

dar@radbackwards·1 Mar

‘You will see your mama’s home’ - Mama Dar

English

487

13.7K

Alfaxad@alfxad·26 Şub

Sara was submitted for @kaggle MedGemma Impact challenge, @GoogleAI @GoogleDeepMind

English

107

Alfaxad@alfxad·26 Şub

x.com/alfxad/status/…

Alfaxad@alfxad

x.com/i/article/2027…

ZXX

118

Alfaxad@alfxad·26 Şub

Excited to finally launch a demo of Sara, a clinical workflow agent that can autonomously orchestrate end-to-end digital clinical tasks. Think of it like Devin, for healthcare. We built Sara by fine-tuning Google's MedGemma 1.5 (4B) to adapt medical tool-use capabilities. On MedAgentBench, Sara outperforms models 2x - 200x its size and is SOTA on several tasks. learn more: nadhari.ai/sara demo: sara.nadhari.ai