Alfaxad

1.6K posts

Alfaxad banner
Alfaxad

Alfaxad

@alfxad

alfa. working on something beautiful!! @NadhariAI

Kyoto, Japan Katılım Mart 2020
336 Takip Edilen483 Takipçiler
Sabitlenmiş Tweet
Alfaxad
Alfaxad@alfxad·
happy new year!!! go make something beautiful!
English
0
0
6
854
Tibo
Tibo@thsottiaux·
I realize yesterday’s Codex reset came in a bit at an unfortunate time given the last one was almost perfectly a week ago. To really celebrate the 3M I’ll reset again tomorrow. Thanks for the feedback!
English
639
294
6.5K
499.3K
Alfaxad retweetledi
𝚟𝚒𝚎 ⟢
𝚟𝚒𝚎 ⟢@viemccoy·
I want to bring beauty into the world. I want to bring back wonder for people on a scale that cannot be fathomed by the modern psyche. I want to show everyone the magic that I've found and help them to integrate it into their own lives. This is a wonderful world, all for us.
English
7
4
105
3.7K
Alfaxad
Alfaxad@alfxad·
Spring, 2026 📍Kyoto
Alfaxad tweet mediaAlfaxad tweet media
Norsk
1
0
2
75
Alfaxad retweetledi
Google DeepMind
Google DeepMind@GoogleDeepMind·
Meet Gemma 4: our new family of open models you can run on your own hardware. Built for advanced reasoning and agentic workflows, we’re releasing them under an Apache 2.0 license. Here’s what’s new 🧵
GIF
English
361
1.2K
8.8K
3.8M
Alfaxad
Alfaxad@alfxad·
just watched project hail mary, it’s a good movie🔥
Alfaxad tweet media
English
2
0
2
174
Alfaxad
Alfaxad@alfxad·
@tszzl you are wrong on this one. curious though, who would you cast?
English
0
0
1
75
roon
roon@tszzl·
the dune movies were doomed from the start to be good and not great due to the casting of chalamet as paul. he does not have the gravitas for a child-god and is much better suited for kind of silly coming of age movies
English
1.1K
94
3.3K
3.1M
Alfaxad
Alfaxad@alfxad·
Just Watched Marty Supreme for the first time!!! Wow!!!
English
0
0
1
96
Alfaxad
Alfaxad@alfxad·
it is what i've noticed as well, it gets easier and parallelize experiments and get instant feedback on research direction, but one still needs deep domain understanding to pull off great results. but, in some kaggle competitions i'm doing,which have clear objectives, agents just know what to do, and i let them go off and just report results to me.
English
0
0
1
77
Kai Arulkumaran
Kai Arulkumaran@kaixhin·
Andrej has a very good sense of what's currently doable, and I think this needs to be taken at face value - agents can now go off and do serious ML tuning work (incl. fixing bugs and adding improvements from related research), which is amazing, but also far from the full researcher job *Taste* is the key - humans need to get out there and decide what avenues are worth pursuing
Andrej Karpathy@karpathy

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

English
1
0
12
3.1K
Google Research
Google Research@GoogleResearch·
The MedGemma Impact Challenge has come to a close with a remarkable 850+ submissions in six weeks. We invited developers to prototype human-centered AI applications using Google’s Health AI Developer Foundations (HAI-DEF), and the response was inspiring. These open models are designed to accelerate the development of privacy-focused, adaptable AI for the front lines of care — and the community delivered. Thank you to every participant for helping us bridge the gap between AI research and clinical impact. Stay tuned as we review the submissions and announce the winners. Also be sure to follow our Google Research Kaggle site as we plan more challenges in the coming months: kaggle.com/organizations/…
Google Research tweet media
English
10
58
465
22.6K
Brendan McCord 🏛️ x 🤖
Brendan McCord 🏛️ x 🤖@Brendan_McCord·
"4 of the 5 largest defense primes are now customers" Seems like someone's dropping the ball
Molly O’Shea@MollySOShea

BREAKING: Nominal Hits $1B Valuation — Founders Fund Preempts $80M B-2 Acceleration Round Just 10 months after @Nominal_io's $75M Series B led by Sequoia's Alfred Lin CEO Cameron McCord & Trae Stephens (Founders Fund + Anduril) join Sourcery to discuss: - $155M raised in 10 months, nearly $200M total funding - Early validation at Anduril - 4 of the 5 largest defense primes are now customers - Reducing major hardware test campaigns by 50–60% (which can cost ~$3M per day) - Strategy: platform expansion, potential M&A, and aggressive hiring - Anduril LORE All systems Nominal. @CameronLMcCord @traestephens @Alfred_Lin 𝐓𝐈𝐌𝐄𝐒𝐓𝐀𝐌𝐏𝐒 (00:00) Trae Stephens & Cameron McCord (01:11) Nominal raises $80M from Founders Fund (03:32) Why Founders Fund made the investment (05:22) Palantir, FF, Anduril: Trae is a "Slashie" (07:14) Nominal: the GitHub for hardware testing (12:33) Why Sequoia believed Nominal’s TAM was much bigger (15:36) Inside Nominal’s growing defense customer base (17:32) Why government hardware testing still relies on Excel and MATLAB (22:22) Cutting hardware testing time by up to 60% (26:55) How AI changes hardware development (37:22) Why the government is backing new defense tech companies (33:35) Nominal's sales strategy (45:44) Competing with legacy software giants (46:24) Recruiting top engineers (50:56) Early Anduril LORE: Stories from the desert

English
6
2
24
4.7K
dar
dar@radbackwards·
‘You will see your mama’s home’ - Mama Dar
dar tweet media
English
18
13
487
13.7K
Alfaxad
Alfaxad@alfxad·
Excited to finally launch a demo of Sara, a clinical workflow agent that can autonomously orchestrate end-to-end digital clinical tasks. Think of it like Devin, for healthcare. We built Sara by fine-tuning Google's MedGemma 1.5 (4B) to adapt medical tool-use capabilities. On MedAgentBench, Sara outperforms models 2x - 200x its size and is SOTA on several tasks. learn more: nadhari.ai/sara demo: sara.nadhari.ai
English
1
2
3
301
Alfaxad
Alfaxad@alfxad·
@jposhaughnessy isn’t this heavily focusing on process(es) rather than outcome?
English
0
0
0
18
Alfaxad
Alfaxad@alfxad·
@stephenkolesh that's the issue, make it two if you can, and increase the number of generation to 3 or 4 and it'll work
English
1
0
1
68
Koleshjr
Koleshjr@stephenkolesh·
how long does rl take to kick in, I am about to lose my mind!
English
2
0
2
304
Koleshjr
Koleshjr@stephenkolesh·
@alfxad I mean , Its training, but No metrics improvement whatsoever, so when do you get that powerful model? sft still performs better
Koleshjr tweet media
English
1
0
0
45