Marcus

12.4K posts

Marcus

@MarcusSpillane

☘️ Irish. Innovation for work. Outdoors and sailing for fun. Lead Innovation @hhxhub for @howardhughesHQ. Opinions my own. ❤️ and RT don't = endorsements.

Katılım Mart 2009

5.5K Takip Edilen3.2K Takipçiler

Marcus@MarcusSpillane·1d

Check out my latest article: Your AI Is Ready. Your Procurement Process Is Not. linkedin.com/pulse/your-ai-… via @LinkedIn

English

Marcus@MarcusSpillane·28 Mar

@r0ck3t23 The candidates who get it aren't writing better prompts, they're designing better problems. That's a fundamentally different skill, and most L&D teams don't have a course for it yet.

English

Dustin@r0ck3t23·9 Şub

Jensen Huang said if he were a student today, he wouldn’t prioritize coding. He’d prioritize learning how to talk to AI. Most people treat AI like Google. Type a question, get an answer, move on. Huang sees it differently. He calls it “expertise in artistry,” which sounds dramatic but makes sense when you think about it. The real skill isn’t using AI. It’s knowing what to ask for and how to refine it. “Learning to interact with AI is not unlike being really good at asking questions.” If you’re a doctor, can you use AI to catch diagnoses you’d miss? If you’re a lawyer, can you sharpen arguments faster than your competition? The leverage comes from pairing what you know with how well you can direct the tool. Domain expertise multiplied by AI fluency equals amplification. Without the expertise, the AI is just noise. Without fluency, you’re leaving most of the capability on the table. The question isn’t whether AI will replace you. It’s whether someone who knows how to use it better will.

English

214

1.4K

7.3K

782.5K

Marcus@MarcusSpillane·28 Mar

@r0ck3t23 AI doesn't just disrupt industries. It prices out the activities your margins depend on. Which of your core processes looks like pharma drug discovery — high cost, long cycle, pattern-matching intensive?

English

Dustin@r0ck3t23·13 Mar

Jensen Huang just called the exact top of the pharmaceutical industry. Not a pivot. Not a disruption. An extinction event. Huang: “Where do I think the next amazing revolution is going to come? And this is going to be flat out one of the biggest ones ever. There’s no question that digital biology is going to be it.” The medical establishment has spent centuries playing a chaotic game of trial and error. We’re about to mathematically engineer the human operating system. Huang: “For the very first time in human history, biology has the opportunity to be engineering, not science. When something becomes engineering, not science, it becomes less sporadic and exponentially improving.” Biology is no longer the dark art of random discovery. It’s a predictable, compounding execution loop. Translate the chaotic variables of chemistry into the laws of computer science and you stop waiting for accidental breakthroughs. You simply compute the cure. That line should terrify every pharmaceutical executive alive. Huang: “It can compound on the benefits of the previous years. And every researcher’s contributions compound on each other.” For decades, drug discovery has been an isolated, artisanal process. One lab. One team. One molecule. Years of blind iteration. The algorithm just shattered that entire bottleneck. Every failed protein fold, every successful synthetic molecule instantly trains the foundational model. Makes the next iteration mathematically smarter. Huang: “We’re going to have incredible tools that bring the world of biology, which is very chaotic and constantly changing and diverse and complex, into the world of computer science. And that is going to be profound.” Incumbent pharma looks at the human body and sees an unmanageable wall of variables. Engineers look at that exact same body and see raw data waiting to be compiled. No longer guessing how a molecule will react in the physical world. Running millions of zero-cost simulated iterations before a single test tube is ever touched. Rip the chaotic friction out of the physical lab and drop it directly into a massive GPU cluster? The timeline to map, edit, and optimize the biological machine doesn’t shrink. It collapses.

English

342

883

3.7K

769.6K

Marcus@MarcusSpillane·23 Mar

@booleanbeyondIN That's an insightful statement! The sentiment is real. I'm working on something to try and help. Solving for myself first. Standby.

English

Hari Prasad@booleanbeyondIN·23 Mar

@MarcusSpillane If I’m signing up for 100 agents, does onboarding include a headset that plays “This is fine”? Seriously though, maybe we need an “agent whisperer” cert—how do you coach a hundred bots without sounding like a traffic cop?

English

Marcus@MarcusSpillane·23 Mar

Jensen Huang: 100 AI agents for every 1 human worker. The jobs debate is the wrong conversation. The real question: what kind of human can actually run a 100:1 ratio? Nobody is training for that yet. I am. More soon. linkedin.com/pulse/1001-rat…

English

Marcus@MarcusSpillane·23 Mar

@PeterDiamandis The sci-fi tropes that aren't arriving are the ones that need physics breakthroughs. Everything else is on schedule. The real question isn't which technologies show up — it's which ones arrive before the institutions built to absorb them are ready.

English

Peter H. Diamandis, MD@PeterDiamandis·22 Mar

I keep a mental bingo card of sci-fi tropes. No warp drive yet. No teleportation. But holodecks? Close. Replicators? Close. Flying cars? Happening.

English

374

18.1K

Marcus@MarcusSpillane·23 Mar

@PeterDiamandis The question most exec teams haven't asked yet: are you still paying an experience premium for skills that predate the tools your org now runs on? IBM noticed the inversion. Most haven't.

English

Peter H. Diamandis, MD@PeterDiamandis·21 Mar

IBM is hiring MORE entry-level employees because young people are better with AI than older generations.

English

187

169

2.3K

101.4K

Marcus@MarcusSpillane·23 Mar

@PeterDiamandis Rebuilding for AI isn't a technology project. It's an accountability redesign. Every institution that adopts AI needs to answer who owns decisions that agents make and what happens when they get it wrong. Most aren't anywhere close to having those answers.

English

Peter H. Diamandis, MD@PeterDiamandis·22 Mar

Every institution we use to run the world needs to be rebuilt, to some extent, for AI. Perhaps making it the single biggest advisory opportunity in recent years...

English

113

493

22.6K

Marcus@MarcusSpillane·23 Mar

@emollick OpenAI thinks about what agents do. Anthropic thinks about how agents reason. Both are right but for different jobs. Enterprise teams deploying both need to understand which philosophy fits which workflow. They're not interchangeable even when the outputs look similar.

English

137

Ethan Mollick@emollick·23 Mar

Very different philosophies for skills in Codex versus Claude Code OpenAI seems to conceive of skills functionally, mostly matter-of-fact technical references for Codex. Claude skills are more about giving the AI approaches to problems See the difference in skill creator skills

English

825

74.4K

Marcus@MarcusSpillane·23 Mar

@karpathy The engineering phase shift is clear. The org design phase shift hasn't started yet. Most enterprises have faster AI capabilities than their governance structures can handle. That gap is where the failures are coming from.

English

Andrej Karpathy@karpathy·21 Mar

Thank you Sarah, my pleasure to come on the pod! And happy to do some more Q&A in the replies.

sarah guo@saranormous

Caught up with @karpathy for a new @NoPriorsPod: on the phase shift in engineering, AI psychosis, claws, AutoResearch, the opportunity for a SETI-at-Home like movement in AI, the model landscape, and second order effects 02:55 - What Capability Limits Remain? 06:15 - What Mastery of Coding Agents Looks Like 11:16 - Second Order Effects of Coding Agents 15:51 - Why AutoResearch 22:45 - Relevant Skills in the AI Era 28:25 - Model Speciation 32:30 - Collaboration Surfaces for Humans and AI 37:28 - Analysis of Jobs Market Data 48:25 - Open vs. Closed Source Models 53:51 - Autonomous Robotics and Atoms 1:00:59 - MicroGPT and Agentic Education 1:05:40 - End Thoughts

English

319

392

5.4K

Marcus retweetledi

Elon Musk@elonmusk·20 Mar

ZXX

31K

287.5K

28.3M

Marcus@MarcusSpillane·20 Mar

Every leader I talk to says the same thing: upskilling is not the bottleneck, organizational transformation capacity is. You can teach someone to use a model in a day. Changing how an entire function operates around AI takes years and serious structural change. The people peddling upskilling as the answer are selling the easy part of a hard problem.

English

Peter H. Diamandis, MD@PeterDiamandis·19 Mar

In some cases, AI can use computers better than a human can. The impact on traditional desk jobs worldwide will be drastic. If you haven't already, begin upskilling yourself. Don't know how? Ask AI!

English

364

15.6K

Marcus@MarcusSpillane·19 Mar

The compression of prediction windows is not a failure signal. It is proof the models are working. When you can update your view in 12 months rather than 20 years, you are not losing foresight, you are gaining it. The organizations that treat faster iteration as a threat rather than a tool are going to get left behind.

English

Peter H. Diamandis, MD@PeterDiamandis·19 Mar

How far out can experts predict the future? It used to be 20 years... Then 10... Now, even 12 months feels like a moonshot prediction.

English

442

23.7K

Marcus retweetledi

Eric Trump@EricTrump·19 Mar

🤣🤣 One of the great responses to a reporter in history! JAPANESE REPORTER: Why didn't you tell Japan before the Iran war? PRESIDENT TRUMP: "Why didn't you tell ME about PEARL HARBOR?!" "You believe in surprise much more-so than US!"

English

12.5K

9.1K

54.7K

4.4M

Marcus@MarcusSpillane·19 Mar

@PeterDiamandis The 9x safety stat is real. The enterprise deployment question nobody is asking: what does the liability model look like when the autonomous system is your vendor, not your employee?

English

140

Peter H. Diamandis, MD@PeterDiamandis·19 Mar

Tesla's FSD: 5.3 million miles between accidents. US driving average: 660,000. That's 9x safer. And it's only getting better.

English

601

999

84.6M

Marcus@MarcusSpillane·19 Mar

@PeterDiamandis Everyone quotes the 1,000x efficiency gap. Nobody mentions that biology doesn't run on APIs, doesn't connect to legacy ERPs, and doesn't have a CFO asking about ROI. Scale is a different problem.

English

Peter H. Diamandis, MD@PeterDiamandis·18 Mar

The human brain uses only 20 watts of power but performs ~1 exaFLOP (10^18 operations/sec). Today's top AI chips burn 700 watts for ~1 petaFLOP. We're still ~1,000x less efficient than biology. When neuromorphic chips close that gap, we won't be building data centers—we might be growing them.

English

116

158

920

67.7K

Marcus@MarcusSpillane·19 Mar

@pmarca I respect the discipline of daily measurement but at some point you have to ship something. Been watching a lot of enterprise AI roadmaps get lost in the metrics lately.

English

Marc Andreessen 🇺🇸@pmarca·19 Mar

Status: Day 19,977 of retardmaxxing. Things going really well.

English

354

383

4.5K

527.4K

Marcus@MarcusSpillane·19 Mar

@simonw @tobi The real question this raises for enterprise teams: what percentage of your codebase has test coverage tight enough that an agent could run 100 experiments overnight and you would trust the results? That number is the actual ceiling on your agentic velocity.

English

Simon Willison@simonw·13 Mar

Published some notes on @tobi's autoresearch PR that improved the performance benchmark scores of the Liquid template language (which Tobi created for Shopify 20 years ago) by a hefty 53% simonwillison.net/2026/Mar/13/li…

English

703

59K

Marcus@MarcusSpillane·19 Mar

@karpathy The lab version of this is brilliant. The enterprise version is hard for a different reason: you need 974 unit tests before the agent has anything to validate against. Most orgs can't even do that part. The bottleneck was never compute.

English

Andrej Karpathy@karpathy·10 Mar

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.