Dev

10.8K posts

Dev banner
Dev

Dev

@devparagiri

@umdcs research @ gel

dc شامل ہوئے Aralık 2017
965 فالونگ1.1K فالوورز
پن کیا گیا ٹویٹ
Dev
Dev@devparagiri·
i extended this paradigm to earth system models. ed v3.0 simulates plant growth, fire, soil carbon, and water cycling globally using parameterized formulas, many unchanged since the 90s. ilamb benchmarks it against 21 other models. for each submodel, the system searches over both formula structure and continuous parameters via bayesian optimization. every candidate equation must map to a named physical mechanism. the system also selects the appropriate goodness-of-fit metric set per module. optimization runs in phases following the model's dependency graph since upstream modules (photosynthesis) feed into downstream ones (soil carbon, fire). results (spatial correlation r = how well the model's predicted global map matches gridded observations): to read the detailed implementation and improvements, blog link is attached below!
Dev tweet media
Andrej Karpathy@karpathy

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

English
1
0
8
12.6K
jonathan liu
jonathan liu@jonathanzliu·
dumpster rizz.
English
8
1
33
4K
Dev
Dev@devparagiri·
why does @googledocs not have dark mode? its 2026
English
0
0
0
80
Dev
Dev@devparagiri·
@esha_hq is this dc?
English
0
0
1
48
Esha
Esha@esha_hq·
saw some beautiful collections from south asia, the himalayas, persia, and the turkish and arab lands.
Esha tweet mediaEsha tweet mediaEsha tweet mediaEsha tweet media
English
2
0
34
864
Dev
Dev@devparagiri·
@jxnlco do u guys hire new grads? im actively applying rn (graduating in 2 weeks)
English
0
0
0
1.1K
jason liu
jason liu@jxnlco·
When I applied to OpenAI, I thought I would be working on evals. When I signed, I thought I would be working on agents. When I joined, I thought I would be working on Codex. After my first month, I thought I would be working on knowledge work, but here I am doing motion graphics.
English
62
19
1.6K
124.7K
Zenna Tavares
Zenna Tavares@ZennaTavares·
What happens when AI agents start making commitments with other agents on our behalf? Not just answering questions: negotiating, buying resources, and deciding whether to trust each other. (blog-post / talk below)
English
3
2
13
1.6K
Arav Patel
Arav Patel@aravpatel_·
@devparagiri OpenAI fucked me over tho, I can’t sign up bc I apparently have used too many phone numbers and they think I’m faking my identity or some shit I’m cooked
English
1
0
2
26
Arav Patel
Arav Patel@aravpatel_·
Okay I'm starting to notice Opus 4.7 is just not listening to me or literally writing buggy code What's going on here, did they already make it worse
English
1
0
4
309
Barefoot Student
Barefoot Student@BarefootStudent·
The 10 best cities for college grads, per Fortune. 1. Washington, D.C. 2. Omaha 3. Boston 4. Dallas 5. Chicago 6. Houston 7. St. Louis 8. San Diego 9. Miami 10. Austin
English
33
58
2.1K
138.8K
Dev
Dev@devparagiri·
@naval epic
English
0
0
1
80
Naval
Naval@naval·
Introducing USVC - a single basket of high-growth venture capital, for everyone. No accreditation required, SEC-registered, and a very low $500 minimum. Includes OpenAI, Anthropic, xAI, Sierra, Crusoe, Legora, and Vercel. As USVC adds more companies, investors will own a piece of that too. Liquidity typically comes when companies exit, but we’re aiming to let investors redeem up to 5% of the fund every quarter. This isn’t guaranteed, but if we can make it work, you won’t be locked up like in a traditional venture fund. It runs on AngelList, which already supports $125 billion of investor capital. And I’ve joined USVC as the Chairman of its Investment Committee. — Go back to the 1500s, you set sail for the new world to find tons of gold - that was adventure capital. Early-stage technology is the modern version. It says we are going to create something new, and it’s risky. It’s daring. But ordinary people can’t invest until it’s old, until it’s no longer interesting, until everybody has access to it. By the time a stock IPOs, most of the alpha is gone. The adventure is gone. Public market investors are literally last in line. This problem has become farcical in the last decade. Startups are reaching trillion dollar valuations in the private markets while ordinary investors have their noses up to the glass, wondering when they’ll be let in. Investing in private markets isn’t easy. You need feet on the ground. You need judgment built over years. Most people don’t have the patience to wait ten or twenty years for an investment to come to fruition. But there is no more productive, harder-working way to deploy a dollar than in true venture capital. USVC enables you to invest in venture capital in a broad, accessible, professionally-managed way, through a single basket of innovation, focused on high-growth startups, at all stages. It is how you bet on the future of tech: the smartest young people in the world, working insane hours, leveraged to the max, with code, hardware, capital, media, and community. Your dollar doesn’t work harder anywhere. There is an old line - in the future, either you are telling a computer what to do, or a computer is telling you what to do. You don’t want to be on the wrong side of that transaction. USVC lets you buy the future, but you buy it now. Then you wait, and if you are right, you get paid. Get access here: usvc.com
AngelList@AngelList

Announcing: USVC AngelList exists to power the innovation economy. To date, we have powered $125 billion in assets, 25,000+ funds, and 13,000+ startups. Today, we’re opening it for retail access. @usvc_ is a regulated fund that holds stakes in promising private companies. There are no accreditation requirements and anyone can get started with as little as $500. Early portfolio includes xAI, Anthropic, OpenAI, Sierra, Vercel, Crusoe, and Legora. Own a stake in the companies defining the future. Learn more: usvc.com

English
809
965
12K
5.2M
Dev
Dev@devparagiri·
@benjitaylor can we get this pls
Dev@devparagiri

@nikitabier this is amazing but it would be great if i could append multiple topics to the same timeline. or create custom timeline configs which include any no of the topics listed!

English
0
0
1
30
Benji Taylor
Benji Taylor@benjitaylor·
Today we’re introducing Custom Timelines, a new way to see more of what you care about the most on 𝕏. There’s 75+ topics available today, with more to come. Now available in early access to Premium subscribers on iOS (and Android soon).
English
155
81
2.2K
103.9K
Dev
Dev@devparagiri·
@nikitabier this is amazing but it would be great if i could append multiple topics to the same timeline. or create custom timeline configs which include any no of the topics listed!
English
0
0
0
44
Nikita Bier
Nikita Bier@nikitabier·
Ladies and gentlemen, today we're launching one of our biggest changes to 𝕏 Introducing Custom Timelines This feature allows you to pin a specific topic to your home tab. With support for over 75 topics, you can dive deep into your favorite niche on X. It's powered by Grok's understanding of every post with the algorithm's personalization—meaning every timeline is made just for you. And it works even better when it's a topic you already engage with. This was a huge undertaking across many months, so we're excited for you take it for a spin. We're giving early access to Premium subscribers on iOS (and Android coming very soon).
English
4.5K
2.9K
27.2K
5.1M
Arav Patel
Arav Patel@aravpatel_·
didn't even know anthropic acquired bun until today what a random buy lol
English
2
0
0
69
Bashiryyy
Bashiryyy@therealbashir1·
what they don't tell you about building is that you are pretty much accepting a 24/7 tech support role as well
English
2
0
1
113
Arav Patel
Arav Patel@aravpatel_·
opus 4.7 has been running for 109 minutes and counting insane
Arav Patel tweet media
English
1
0
2
78
Claude
Claude@claudeai·
Introducing Claude Design by Anthropic Labs: make prototypes, slides, and one-pagers by talking to Claude. Powered by Claude Opus 4.7, our most capable vision model. Available in research preview on the Pro, Max, Team, and Enterprise plans, rolling out throughout the day.
English
4.1K
15.1K
148.7K
62.6M
Parmita Mishra
Parmita Mishra@parmita·
Just got the ✅ from Penn; I am now publishing my first SOLO AUTHORED paper on biorxiv! this was my work from 2-3 years ago at @PennMedicine (department of genetics) @PennBiology (mathematical bio) My personal research interests have always been around EXPLAINABLE use of computers to decode cellular language and identity. A lot of my computational bio work is honestly mathematical biology more than it is AI, for that exact reason. Explainability is exactly what this work is about. I will write up a thread for my Twitter audience the second I hit publish. This one is preprint #1, but it is the foundation of preprints @precigenetic is publishing soon (why this one is being published rn!!) 🔜 🥂
English
18
10
212
9.4K
Dev
Dev@devparagiri·
@gajesh this will be especially good for async tasks
English
0
0
0
96