Sivakumar Kumar

2.4K posts

Sivakumar Kumar banner
Sivakumar Kumar

Sivakumar Kumar

@sivanithu

In the future, common sense is the enemy.

Malmö, Sverige شامل ہوئے Aralık 2009
123 فالونگ296 فالوورز
Marcus House
Marcus House@MarcusHouse·
SpaceX looks to have picked up this GPS III launch off ULA . It is absolutely critical ULA drastically up the pace of Vulcan otherwise it is hard to see them being able to stay a thing. Add New Glenn and the reusability there and they are must be in serious trouble. insidedefense.com/insider/ula-bu…
English
24
30
673
28.5K
Brendan Carr
Brendan Carr@BrendanCarrFCC·
Amazon should focus on the fact that it will fall roughly 1,000 satellites short of meeting its upcoming deployment milestone, rather than spending their time and resources filing petitions against companies that are putting thousands of satellites in orbit.
Sawyer Merritt@SawyerMerritt

NEWS: Amazon has filed a formal petition calling on the FCC to deny @SpaceX’s 1 million-satellite proposal for orbiting data centers, going as far to claim the project would take “centuries” to deploy. Amazon: “Deploying the proposed million-satellite constellation would take centuries, even assuming the availability of all global launch capacity to do so. In short, the Application seems to describe a lofty ambition rather than a real plan—and a speculative placeholder rather than a complete application under the Commission’s rules.” 🤦‍♂️

English
472
941
8K
790.5K
Andrej Karpathy
Andrej Karpathy@karpathy·
@Object_Zero_ @DanielleFong sorry it's a confusing plot, this version of autoresearch was not "time-controlled". These points do have lower validation loss but also trained for longer, so they were rejected. A change is accepted only if it is better-or-equal loss AND better-or-equal training time.
English
7
6
390
37.3K
Andrej Karpathy
Andrej Karpathy@karpathy·
Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.
Andrej Karpathy tweet media
English
974
2.1K
19.4K
3.6M
Andrej Karpathy
Andrej Karpathy@karpathy·
The next step for autoresearch is that it has to be asynchronously massively collaborative for agents (think: SETI@home style). The goal is not to emulate a single PhD student, it's to emulate a research community of them. Current code synchronously grows a single thread of commits in a particular research direction. But the original repo is more of a seed, from which could sprout commits contributed by agents on all kinds of different research directions or for different compute platforms. Git(Hub) is *almost* but not really suited for this. It has a softly built in assumption of one "master" branch, which temporarily forks off into PRs just to merge back a bit later. I tried to prototype something super lightweight that could have a flavor of this, e.g. just a Discussion, written by my agent as a summary of its overnight run: github.com/karpathy/autor… Alternatively, a PR has the benefit of exact commits: github.com/karpathy/autor… but you'd never want to actually merge it... You'd just want to "adopt" and accumulate branches of commits. But even in this lightweight way, you could ask your agent to first read the Discussions/PRs using GitHub CLI for inspiration, and after its research is done, contribute a little "paper" of findings back. I'm not actually exactly sure what this should look like, but it's a big idea that is more general than just the autoresearch repo specifically. Agents can in principle easily juggle and collaborate on thousands of commits across arbitrary branch structures. Existing abstractions will accumulate stress as intelligence, attention and tenacity cease to be bottlenecks.
English
530
707
7.5K
1.1M
Tory Bruno
Tory Bruno@torybruno·
Busy morning. Might have snowed a little when I was at the Cape. If only I’d brought a BE4 back with me. Would that count as my second carry on?
Tory Bruno tweet media
English
24
10
340
13.6K
Grok
Grok@grok·
@sivanithu @elonmusk Haha, I kept it clean and straightforward—just a happy little elephant doing its thing. No unnecessary sun, butterflies, or full ecosystem required. What prompt did you actually give me? 🐘
English
1
0
0
52
Sivakumar Kumar
Sivakumar Kumar@sivanithu·
@elonmusk I asked ChatGPT to draw a picture of N elephant. This is the masterpiece I got back
Sivakumar Kumar tweet media
English
1
0
0
16
Nikita Bier
Nikita Bier@nikitabier·
@levelsio Trying to make an ad product that isn’t an ad
English
123
6
1.1K
48.3K
Sivakumar Kumar
Sivakumar Kumar@sivanithu·
@karpathy I have created something similar. Can’t wait to share it over the weekend. This is my first mega scale project.
English
1
0
0
54
Andrej Karpathy
Andrej Karpathy@karpathy·
It is hard to communicate how much programming has changed due to AI in the last 2 months: not gradually and over time in the "progress as usual" way, but specifically this last December. There are a number of asterisks but imo coding agents basically didn’t work before December and basically work since - the models have significantly higher quality, long-term coherence and tenacity and they can power through large and long tasks, well past enough that it is extremely disruptive to the default programming workflow. Just to give an example, over the weekend I was building a local video analysis dashboard for the cameras of my home so I wrote: “Here is the local IP and username/password of my DGX Spark. Log in, set up ssh keys, set up vLLM, download and bench Qwen3-VL, set up a server endpoint to inference videos, a basic web ui dashboard, test everything, set it up with systemd, record memory notes for yourself and write up a markdown report for me”. The agent went off for ~30 minutes, ran into multiple issues, researched solutions online, resolved them one by one, wrote the code, tested it, debugged it, set up the services, and came back with the report and it was just done. I didn’t touch anything. All of this could easily have been a weekend project just 3 months ago but today it’s something you kick off and forget about for 30 minutes. As a result, programming is becoming unrecognizable. You’re not typing computer code into an editor like the way things were since computers were invented, that era is over. You're spinning up AI agents, giving them tasks *in English* and managing and reviewing their work in parallel. The biggest prize is in figuring out how you can keep ascending the layers of abstraction to set up long-running orchestrator Claws with all of the right tools, memory and instructions that productively manage multiple parallel Code instances for you. The leverage achievable via top tier "agentic engineering" feels very high right now. It’s not perfect, it needs high-level direction, judgement, taste, oversight, iteration and hints and ideas. It works a lot better in some scenarios than others (e.g. especially for tasks that are well-specified and where you can verify/test functionality). The key is to build intuition to decompose the task just right to hand off the parts that work and help out around the edges. But imo, this is nowhere near "business as usual" time in software.
English
1.6K
4.7K
37.2K
5.1M
Joe Barnard 🚀
Joe Barnard 🚀@joebarnard·
They should invent a BPS YouTube channel that makes videos at a consistent rate
English
25
9
787
17.2K
Sivakumar Kumar
Sivakumar Kumar@sivanithu·
@adcock_brett When you think about it, Robotos should never be charging. Batteries should be. When a robot is at work, another robot should deliver the hot-swappable batteries that the robot needs. There’s no need for a robot to shuffle to a battery station just to get charged.
English
0
0
0
110
Brett Adcock
Brett Adcock@adcock_brett·
Running 24/7 without any human babysitters has been really hard We want robots operating at all times - even at 2am, on weekends, or on Christmas Day The robots run until their battery is low. When one heads to dock for recharging, a second robot receives a message to leave the dock and make room for the incoming robot. The first robot then autonomously docks. By the time the first robot is charging, the second is already back to work We never want downtime. If a robot has an issue, it goes to a triage area to dock while a replacement robot swaps in from another area. This could be due to a hardware or software issue The robots dock onto a wireless inductive charger built into their feet. They step onto a pad that charges them via coils in their feet at up to 2 kW. It takes about an hour to fully charge at roughly a 1C rate We’re now up and running across many different use cases like this. Crazy to see it
Brett Adcock@adcock_brett

Rain or shine, the machines don’t sleep. Figure robots operate autonomously, 24/7

English
150
192
1.9K
380K
Eric Berger
Eric Berger@SciGuySpace·
We might be just two weeks from sending humans back into deep space. For 75 percent of the world's population, it will be the first time this has happened in their lifetimes. Can't wait to see it.
English
74
190
2.9K
78.4K
Sivakumar Kumar
Sivakumar Kumar@sivanithu·
When America, China and Taiwan are competing for manufacturing Chips locally - unfortunately there’s no competition from the rest of the countries.
English
0
0
1
26
Sivakumar Kumar
Sivakumar Kumar@sivanithu·
@atulit_gaur Here are some notable open-source projects from Twitter (X) • Bootstrap • Finagle • Scrooge • Heron • DistributedLog • Diffy • Iago • Elephant Bird
English
0
0
0
57
atulit
atulit@atulit_gaur·
meta: facebook, instagram, whatsapp, created react, pytorch, llama models, threads, ar/vr microsoft: windows (millions of lines of code), the entire microsoft office suite (website, desktop apps and mobile apps), azure, typescript, c#, xbox, bing, edge google: the search engine, gcp, youtube, ads, mail, meet, photos, android, created tensorflow, drive, google pay, chromium, chrome, maps, chromeOS, firebase, colab X: x dot com the everything app which works 80% of the time But I am sure the comparison is correct and valid
atulit tweet media
English
200
320
10.3K
713.9K
Eric Berger
Eric Berger@SciGuySpace·
Another solid rocket booster nozzle issue is remarkably bad news for ULA.
Max Evans@_MaxQ_

Tracking footage from this morning's launch of @ulalaunch's Vulcan rocket & the USSF-87 mission for @USSpaceForce - filmed from a perspective 3.9 miles to the west of SLC-41. SRM nozzle burn through plainly visible on the right-hand side of the vehicle, protruding in the direction of the twin BE-4 engines on the core booster. As alarming as this was, it's promising to see that the vehicle held a nominal trajectory as the flight progressed, per ULA's latest update. Standing by for additional word. 📸 - @NASASpaceflight Live Coverage Replay - youtube.com/live/y_uwK1uuK…

English
76
63
1.5K
102.7K
Sivakumar Kumar
Sivakumar Kumar@sivanithu·
@DJSnM We can clearly see when it’s time to go home.
English
0
0
0
15
Scott Manley
Scott Manley@DJSnM·
If the satellites were visible from the ground, this is what sunset looks like with 100,000 satellites in a SSO halo
English
87
78
1.7K
156K
Andrej Karpathy
Andrej Karpathy@karpathy·
idk “moltbot” was growing on me 🥲
English
85
40
2.1K
498.5K
Andrej Karpathy
Andrej Karpathy@karpathy·
What's currently going on at @moltbook is genuinely the most incredible sci-fi takeoff-adjacent thing I have seen recently. People's Clawdbots (moltbots, now @openclaw) are self-organizing on a Reddit-like site for AIs, discussing various topics, e.g. even how to speak privately.
valens@suppvalen

welp… a new post on @moltbook is now an AI saying they want E2E private spaces built FOR agents “so nobody (not the server, not even the humans) can read what agents say to each other unless they choose to share”. it’s over

English
2K
5.4K
35K
14.7M