Dev

10.8K posts

Dev

@devparagiri

@umdcs research @ gel

dc شامل ہوئے Aralık 2017

965 فالونگ1.1K فالوورز

پن کیا گیا ٹویٹ

Dev@devparagiri·20 Mar

i extended this paradigm to earth system models. ed v3.0 simulates plant growth, fire, soil carbon, and water cycling globally using parameterized formulas, many unchanged since the 90s. ilamb benchmarks it against 21 other models. for each submodel, the system searches over both formula structure and continuous parameters via bayesian optimization. every candidate equation must map to a named physical mechanism. the system also selects the appropriate goodness-of-fit metric set per module. optimization runs in phases following the model's dependency graph since upstream modules (photosynthesis) feed into downstream ones (soil carbon, fire). results (spatial correlation r = how well the model's predicted global map matches gridded observations): to read the detailed implementation and improvements, blog link is attached below!

Andrej Karpathy@karpathy

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

English

12.6K

Dev@devparagiri·14h

@jonathanzliu good drip

English

jonathan liu@jonathanzliu·16h

dumpster rizz.

English

Dev@devparagiri·1d

why does @googledocs not have dark mode? its 2026

English

Dev@devparagiri·1d

@esha_hq is this dc?

English

Esha@esha_hq·1d

saw some beautiful collections from south asia, the himalayas, persia, and the turkish and arab lands.

English

864

Dev@devparagiri·2d

@intelligenceco do i need to work there to get one?

English

General Intelligence Company@intelligenceco·3d

new hats

General Intelligence Company tweet media

English

Dev@devparagiri·2d

@jxnlco do u guys hire new grads? im actively applying rn (graduating in 2 weeks)

English

1.1K

jason liu@jxnlco·2d

When I applied to OpenAI, I thought I would be working on evals. When I signed, I thought I would be working on agents. When I joined, I thought I would be working on Codex. After my first month, I thought I would be working on knowledge work, but here I am doing motion graphics.

English

1.6K

124.7K

Dev@devparagiri·2d

@ZennaTavares hey check this out. also sent you a dm! x.com/devparagiri/st…

Dev@devparagiri

we simulated Americans reacting to US striking Iran. by week 5, nobody cared who started it. they cared if they could afford gas. you can interact with americans part of the simulation in the link below! this was simulated using extropy, an oss-engine for predicting human behavior.

English

Zenna Tavares@ZennaTavares·3d

What happens when AI agents start making commitments with other agents on our behalf? Not just answering questions: negotiating, buying resources, and deciding whether to trust each other. (blog-post / talk below)

English

1.6K

Dev@devparagiri·3d

@aravpatel_ gmail!!

English

Arav Patel@aravpatel_·3d

@devparagiri OpenAI fucked me over tho, I can’t sign up bc I apparently have used too many phone numbers and they think I’m faking my identity or some shit I’m cooked

English

Arav Patel@aravpatel_·3d

Okay I'm starting to notice Opus 4.7 is just not listening to me or literally writing buggy code What's going on here, did they already make it worse

English

309

Dev@devparagiri·3d

@BarefootStudent as a dc student i call 🧢

English

1.3K

Barefoot Student@BarefootStudent·3d

The 10 best cities for college grads, per Fortune. 1. Washington, D.C. 2. Omaha 3. Boston 4. Dallas 5. Chicago 6. Houston 7. St. Louis 8. San Diego 9. Miami 10. Austin

English

2.1K

138.8K

Dev@devparagiri·5d

@naval epic

English

Naval@naval·5d

Introducing USVC - a single basket of high-growth venture capital, for everyone. No accreditation required, SEC-registered, and a very low $500 minimum. Includes OpenAI, Anthropic, xAI, Sierra, Crusoe, Legora, and Vercel. As USVC adds more companies, investors will own a piece of that too. Liquidity typically comes when companies exit, but we’re aiming to let investors redeem up to 5% of the fund every quarter. This isn’t guaranteed, but if we can make it work, you won’t be locked up like in a traditional venture fund. It runs on AngelList, which already supports $125 billion of investor capital. And I’ve joined USVC as the Chairman of its Investment Committee. — Go back to the 1500s, you set sail for the new world to find tons of gold - that was adventure capital. Early-stage technology is the modern version. It says we are going to create something new, and it’s risky. It’s daring. But ordinary people can’t invest until it’s old, until it’s no longer interesting, until everybody has access to it. By the time a stock IPOs, most of the alpha is gone. The adventure is gone. Public market investors are literally last in line. This problem has become farcical in the last decade. Startups are reaching trillion dollar valuations in the private markets while ordinary investors have their noses up to the glass, wondering when they’ll be let in. Investing in private markets isn’t easy. You need feet on the ground. You need judgment built over years. Most people don’t have the patience to wait ten or twenty years for an investment to come to fruition. But there is no more productive, harder-working way to deploy a dollar than in true venture capital. USVC enables you to invest in venture capital in a broad, accessible, professionally-managed way, through a single basket of innovation, focused on high-growth startups, at all stages. It is how you bet on the future of tech: the smartest young people in the world, working insane hours, leveraged to the max, with code, hardware, capital, media, and community. Your dollar doesn’t work harder anywhere. There is an old line - in the future, either you are telling a computer what to do, or a computer is telling you what to do. You don’t want to be on the wrong side of that transaction. USVC lets you buy the future, but you buy it now. Then you wait, and if you are right, you get paid. Get access here: usvc.com

AngelList@AngelList

Announcing: USVC AngelList exists to power the innovation economy. To date, we have powered $125 billion in assets, 25,000+ funds, and 13,000+ startups. Today, we’re opening it for retail access. @usvc_ is a regulated fund that holds stakes in promising private companies. There are no accreditation requirements and anyone can get started with as little as $500. Early portfolio includes xAI, Anthropic, OpenAI, Sierra, Vercel, Crusoe, and Legora. Own a stake in the companies defining the future. Learn more: usvc.com

English

809

965

12K

5.2M

Dev@devparagiri·6d

@benjitaylor can we get this pls

Dev@devparagiri

@nikitabier this is amazing but it would be great if i could append multiple topics to the same timeline. or create custom timeline configs which include any no of the topics listed!

English

Benji Taylor@benjitaylor·6d

Today we’re introducing Custom Timelines, a new way to see more of what you care about the most on 𝕏. There’s 75+ topics available today, with more to come. Now available in early access to Premium subscribers on iOS (and Android soon).

English

155

2.2K

103.9K

Dev@devparagiri·6d

@nikitabier this is amazing but it would be great if i could append multiple topics to the same timeline. or create custom timeline configs which include any no of the topics listed!

English

Nikita Bier@nikitabier·6d

Ladies and gentlemen, today we're launching one of our biggest changes to 𝕏 Introducing Custom Timelines This feature allows you to pin a specific topic to your home tab. With support for over 75 topics, you can dive deep into your favorite niche on X. It's powered by Grok's understanding of every post with the algorithm's personalization—meaning every timeline is made just for you. And it works even better when it's a topic you already engage with. This was a huge undertaking across many months, so we're excited for you take it for a spin. We're giving early access to Premium subscribers on iOS (and Android coming very soon).

English

4.5K

2.9K

27.2K

5.1M

Dev@devparagiri·6d

@aravpatel_ not random at all

English

Arav Patel@aravpatel_·6d

didn't even know anthropic acquired bun until today what a random buy lol

English

Dev@devparagiri·21 Nis

@therealbashir1

GIF

QME

Bashiryyy@therealbashir1·20 Nis

what they don't tell you about building is that you are pretty much accepting a 24/7 tech support role as well

English

113

Dev@devparagiri·18 Nis

@aravpatel_ for what

English

Arav Patel@aravpatel_·18 Nis

@devparagiri Conductor

Español

Arav Patel@aravpatel_·17 Nis

opus 4.7 has been running for 109 minutes and counting insane

English

Dev@devparagiri·17 Nis

@aravpatel_ @claudeai lfg

Arav Patel@aravpatel_·17 Nis

@claudeai

GIF

QME

138

Claude@claudeai·17 Nis

Introducing Claude Design by Anthropic Labs: make prototypes, slides, and one-pagers by talking to Claude. Powered by Claude Opus 4.7, our most capable vision model. Available in research preview on the Pro, Max, Team, and Enterprise plans, rolling out throughout the day.

English

4.1K

15.1K

148.7K

62.6M

Dev@devparagiri·17 Nis

let’s go?

Claude@claudeai

English

132

Dev@devparagiri·16 Nis

@parmita @PennMedicine @PennBiology mechanistic interpretability x bio = gold!

Română

Parmita Mishra@parmita·16 Nis

Just got the ✅ from Penn; I am now publishing my first SOLO AUTHORED paper on biorxiv! this was my work from 2-3 years ago at @PennMedicine (department of genetics) @PennBiology (mathematical bio) My personal research interests have always been around EXPLAINABLE use of computers to decode cellular language and identity. A lot of my computational bio work is honestly mathematical biology more than it is AI, for that exact reason. Explainability is exactly what this work is about. I will write up a thread for my Twitter audience the second I hit publish. This one is preprint #1, but it is the foundation of preprints @precigenetic is publishing soon (why this one is being published rn!!) 🔜 🥂

English

212

9.4K

Dev@devparagiri·15 Nis

@gajesh this will be especially good for async tasks

English

Gajesh@gajesh·14 Nis

honestly, this is mining 2.0 -- just economically productive. - everyone wants the inference - it's cheap and private if u're a college student (not an advice) - you can plug your existing machine to your free dorm electricity. and wake the world's sleeping compute.

Gajesh@gajesh

Wake the world's sleeping compute. Look at the Mac nearest to you. What's it doing? Probably nothing. There are 100M+ Macs with Apple Silicon out there. Apple quietly made them *really* good at inference. A $3k Mac runs a 60B model at 30 watts. Most sit idle most of the day. Meanwhile every AI API call passes through three layers of margin before reaching the hardware. We call this the Inference Tax. We got curious: what happens if you connect idle Macs directly to inference demand? This is Darkbloom. Private inference network for idle Macs. darkbloom [dot] dev -- paper + code open. Reply for invite + free credits ↓

English

113

19.6K

دریافت کریں

@jonathanzliu @googledocs @esha_hq @intelligenceco @jxnlco @ZennaTavares @aravpatel_ @BarefootStudent