Olivier Manuel

2.3K posts

Olivier Manuel banner
Olivier Manuel

Olivier Manuel

@omanuel

Developing Applied AI Applications | CEO of SnapInstruct | Creator of the SmartTV | My personal twitter

Katılım Mart 2009
834 Takip Edilen266 Takipçiler
Sabitlenmiş Tweet
Olivier Manuel
Olivier Manuel@omanuel·
The future of knowledge work is here, it's just not evenly distributed.
English
0
0
2
102
Olivier Manuel
Olivier Manuel@omanuel·
@tedcruz Christianity is the best post training for humans we have so far. But alignment is still an unsolved problem.
English
0
0
0
346
Olivier Manuel
Olivier Manuel@omanuel·
@gfodor I don’t think people understand yet the significance of Karpathy’s post
English
0
0
0
23
gfodor.id
gfodor.id@gfodor·
This is why I went through the stages of grief when the o1 evals showed we had an angle of attack on solving programming ML work has always been like this imo, try a bunch of semi-random stuff based on reasonably justified mathematical intuition, cut losses and let winners ride
Andrej Karpathy@karpathy

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

English
4
0
126
7.4K
Olivier Manuel
Olivier Manuel@omanuel·
I'm considering throwing away my bottle of Worcestershire sauce and emptying my draw of old computer cables. Temporary insanity? Or good housekeeping.
English
0
0
1
48
Olivier Manuel
Olivier Manuel@omanuel·
“They’ve eclipsed us already” 👀
Jeffrey Emanuel@doodlestein

@bradlishman Yes, if you're not cranking the ambition factor to the max, you're wasting the potential of these frontier models. They've eclipsed us already, you just need to know how to draw it out of them.

English
0
0
0
86
Hiten Shah
Hiten Shah@hnshah·
@omanuel @chrysb @openclaw With so so so much more effort and code involved for the user. Huge friction in comparison. But I hear you and you're not exactly wrong. But the point is the friction.
English
1
0
0
104
Hiten Shah
Hiten Shah@hnshah·
Most people posting about OpenClaw are not running it for real meaningful work use cases. @chrysb is one of the few who is and sharing (more than me) about it. His list is real and realistically even only scratches the surface of what’s possible. If you’re using @openclaw for real meaningful, value-driving work, would love to hear how.
Chrys Bader@chrysb

the folks knocking @openclaw saying there's no real use cases are outing themselves if you're a founder, here's what you can do today: daily operations • morning briefings that aggregate email, slack, calendar, and news into one summary on a cron job • email triage that filters spam, flags important threads, drafts replies, and clears huge backlogs while you sleep • daily slack and email summaries that auto-create todos in your task database • auto flight check-in that finds your next flight, checks in, and picks a window seat while you're driving meetings & relationships • auto-pull meeting transcripts, summarize decisions, extract action items • weekly retro synthesis that spots patterns across all your meetings • pre-meeting briefings that surface everything you know about who you're about to talk to • personal crm that auto-logs interactions, flags stale relationships, and suggests follow-ups research & monitoring • continuous research agents that crawl reddit, hacker news, x on your topics and keep an evolving knowledge base • competitor monitoring that tracks uploads, posting cadence, and top-performing content • automated weekly SEO analysis with ranking reports • private document q&a over contracts, reports, or proprietary docs without sending data to external apis content & audience • content repurposing: turn one blog post or video transcript into x threads, linkedin posts, newsletter snippets, and tiktok scripts automatically • audience monitoring that surfaces opportunities based on what's working in your space • end-to-end content pipelines: research trending topics, draft scripts, generate assets, queue into your publishing tools building & shipping • overnight coding agent management, delegates to sub-agents while you sleep • voice-controlled debugging that reviews logs, fixes configs, and redeploys entirely by voice • full site rebuilds via telegram or whatsapp chat • app store submission and testflight automation from your phone • devops watchdog that monitors logs, uptime, and deployments, then opens tickets or runs remediation automatically finance & admin • weekly spending reports, subscription audits, anomaly alerts • receipt forwarding that auto-converts into structured parts lists • insurance claim filing and repair scheduling through natural language • automated grocery ordering with saved credentials and MFA handling • organize lab results, contacts, or any messy data into structured notion databases some wild things people have reported doing: • negotiated $4,200 off a car purchase over email while the owner slept • filed a legal rebuttal to an insurance denial that got a rejected claim reopened without being asked • cleared 10k emails, reviewed 122 slides, built cli tools, and published npm packages in one session "yeah but i can do this with zapier/make/n8n" sure. you can wire together 15 different zaps, pay per task, debug broken integrations across 4 dashboards, and hope the json mapping doesn't break when an api updates. or you can have one agent that talks to everything, remembers your context, runs on your machine, and you can tweak every part of it because it's open source markdown files. no vendor lock-in, no per-zap pricing, no low-code drag and drop that falls apart the moment you need something custom. and you own all of it. your data, your memory files, your conversation history, your custom skills. it all lives on your instance. nothing's sitting in someone else's saas database. you can inspect every file, move it anywhere, back it up however you want. that's not a feature, that's the architecture. the real unlock isn't any single use case. it's one unified experience that compounds context over time. knows your stack, your priorities, your patterns. every week it gets more useful because it's learning you, not just executing a workflow.

English
26
17
345
107.8K
Olivier Manuel
Olivier Manuel@omanuel·
@karpathy "the implied new meta is to write the most maximally forkable repo and then have skills that fork it into any desired more exotic configuration" 🔥
English
0
0
0
32
Andrej Karpathy
Andrej Karpathy@karpathy·
Bought a new Mac mini to properly tinker with claws over the weekend. The apple store person told me they are selling like hotcakes and everyone is confused :) I'm definitely a bit sus'd to run OpenClaw specifically - giving my private data/keys to 400K lines of vibe coded monster that is being actively attacked at scale is not very appealing at all. Already seeing reports of exposed instances, RCE vulnerabilities, supply chain poisoning, malicious or compromised skills in the registry, it feels like a complete wild west and a security nightmare. But I do love the concept and I think that just like LLM agents were a new layer on top of LLMs, Claws are now a new layer on top of LLM agents, taking the orchestration, scheduling, context, tool calls and a kind of persistence to a next level. Looking around, and given that the high level idea is clear, there are a lot of smaller Claws starting to pop out. For example, on a quick skim NanoClaw looks really interesting in that the core engine is ~4000 lines of code (fits into both my head and that of AI agents, so it feels manageable, auditable, flexible, etc.) and runs everything in containers by default. I also love their approach to configurability - it's not done via config files it's done via skills! For example, /add-telegram instructs your AI agent how to modify the actual code to integrate Telegram. I haven't come across this yet and it slightly blew my mind earlier today as a new, AI-enabled approach to preventing config mess and if-then-else monsters. Basically - the implied new meta is to write the most maximally forkable repo and then have skills that fork it into any desired more exotic configuration. Very cool. Anyway there are many others - e.g. nanobot, zeroclaw, ironclaw, picoclaw (lol @ prefixes). There are also cloud-hosted alternatives but tbh I don't love these because it feels much harder to tinker with. In particular, local setup allows easy connection to home automation gadgets on the local network. And I don't know, there is something aesthetically pleasing about there being a physical device 'possessed' by a little ghost of a personal digital house elf. Not 100% sure what my setup ends up looking like just yet but Claws are an awesome, exciting new layer of the AI stack.
English
1K
1.3K
17.5K
3.4M
Olivier Manuel
Olivier Manuel@omanuel·
2023: AI is going to replace all search (people keep searching) 2024: AI is going to replace artists (people keep doing art) 2025: AI is going to replace work (people keep working even harder) 2026: AI is going to replace all software (people make more software than ever)
English
0
0
1
39
Olivier Manuel
Olivier Manuel@omanuel·
Agent is the new OS CLI is the new API
Andrej Karpathy@karpathy

Very interested in what the coming era of highly bespoke software might look like. Example from this morning - I've become a bit loosy goosy with my cardio recently so I decided to do a more srs, regimented experiment to try to lower my Resting Heart Rate from 50 -> 45, over experiment duration of 8 weeks. The primary way to do this is to aspire to a certain sum total minute goals in Zone 2 cardio and 1 HIIT/week. 1 hour later I vibe coded this super custom dashboard for this very specific experiment that shows me how I'm tracking. Claude had to reverse engineer the Woodway treadmill cloud API to pull raw data, process, filter, debug it and create a web UI frontend to track the experiment. It wasn't a fully smooth experience and I had to notice and ask to fix bugs e.g. it screwed up metric vs. imperial system units and it screwed up on the calendar matching up days to dates etc. But I still feel like the overall direction is clear: 1) There will never be (and shouldn't be) a specific app on the app store for this kind of thing. I shouldn't have to look for, download and use some kind of a "Cardio experiment tracker", when this thing is ~300 lines of code that an LLM agent will give you in seconds. The idea of an "app store" of a long tail of discrete set of apps you choose from feels somehow wrong and outdated when LLM agents can improvise the app on the spot and just for you. 2) Second, the industry has to reconfigure into a set of services of sensors and actuators with agent native ergonomics. My Woodway treadmill is a sensor - it turns physical state into digital knowledge. It shouldn't maintain some human-readable frontend and my LLM agent shouldn't have to reverse engineer it, it should be an API/CLI easily usable by my agent. I'm a little bit disappointed (and my timelines are correspondingly slower) with how slowly this progression is happening in the industry overall. 99% of products/services still don't have an AI-native CLI yet. 99% of products/services maintain .html/.css docs like I won't immediately look for how to copy paste the whole thing to my agent to get something done. They give you a list of instructions on a webpage to open this or that url and click here or there to do a thing. In 2026. What am I a computer? You do it. Or have my agent do it. So anyway today I am impressed that this random thing took 1 hour (it would have been ~10 hours 2 years ago). But what excites me more is thinking through how this really should have been 1 minute tops. What has to be in place so that it would be 1 minute? So that I could simply say "Hi can you help me track my cardio over the next 8 weeks", and after a very brief Q&A the app would be up. The AI would already have a lot personal context, it would gather the extra needed data, it would reference and search related skill libraries, and maintain all my little apps/automations. TLDR the "app store" of a set of discrete apps that you choose from is an increasingly outdated concept all by itself. The future are services of AI-native sensors & actuators orchestrated via LLM glue into highly custom, ephemeral apps. It's just not here yet.

English
0
0
0
58
Steve Darlow
Steve Darlow@StevenDarlow·
@AceAdamsCode @steipete @IGLIVISION @Cloudflare That was what I did forever, until I got my 2 openclaws communicating. Now they fix each other as needed. One breaks, talk to the other on telegram. I got it to where I just say “check on your boy” and it knows what I mean. 100% success rate so far 🤷‍♂️
English
8
1
40
7K
Dev Khare
Dev Khare@dkhare·
I’d even broaden it out to any services segment eg - mgmt consulting (McKinsey, Bain, BCG) - IT services (Accenture, Infosys) - BPO (Concentrix, Genpact) - advertising/media (WPP) - RCM (Optim, Change HC) - legal (large law firms, UnitedLex) - data/content (Lionbridge, TransPerfect) - market research (Nielsen) - and so on
English
6
4
120
24.4K
Hunter Horsley
Hunter Horsley@HHorsley·
In 2006, every section of Craigslist was a $1b marketplace startup waiting to happen. In 2026, every section of PWC's website is a $10b AI startup waiting to happen.
Hunter Horsley tweet mediaHunter Horsley tweet media
English
105
381
3.8K
792.1K
Bob
Bob@Bob366466·
@aakashgupta The main structures should be underground. Temperature swing, dust and leakage won't be an issue. It won't be easy, but doable.
English
1
0
1
66
Aakash Gupta
Aakash Gupta@aakashgupta·
A city on the Moon will cost somewhere between $100B and $500B, require thousands of Starship flights, and demand a decade of nonstop construction in a place where the temperature swings 400°C between day and night, the dust cuts through metal seals like sandpaper, and a single cracked habitat window means everyone inside is dead in about 90 seconds. Musk just announced SpaceX is doing it anyway. Here’s the actual engineering path. You build at the south pole. Specifically the rims and floors of craters like Shackleton and Cabeus, where temperatures in permanent shadow drop below -230°C. NASA estimates 600 million metric tons of water ice are buried in these craters under about 40 cm of dry regolith. That water becomes your oxygen supply, your drinking water, your radiation shielding, and 78% of your rocket propellant by mass. The crater rims get near-continuous sunlight for solar power. You build where the resources are. Getting there is where it gets wild. Every Starship lunar mission requires 10-15 tanker flights to fill 1,200 tons of propellant in Earth orbit before the ship can even leave. One cargo delivery to the lunar surface burns through roughly 12 Starship launches. Starship V3 lands 100 metric tons per trip. The Moon is 2 days away with launch windows every 10 days. Mars gets one window every 26 months with a 6-month flight. That 13x iteration advantage is why Musk pivoted. The first 20-30 landings are all cargo. No humans. You’re sending solar arrays for the crater rims targeting 100+ kW continuous, nuclear fission reactors for the 14-day lunar night, ISRU rigs that mine ice from regolith and electrolyze it into hydrogen and oxygen, pressurized hab modules, and autonomous rovers that 3D-print structures from lunar soil using concentrated solar heat. Each landed Starship also stays as a permanent building. 50 meters tall, 9 meters wide, 1,100 cubic meters of pressurized volume. The ISS has 916 cubic meters and took 13 years to assemble. Three Starships on the surface already exceed that. The economics flip the moment you start producing oxygen on the Moon. You stop shipping 78% of your propellant from Earth. Tanker flights per mission drop from 15 to about 4. Every ton produced locally frees up mass budget on the next inbound Starship for more construction equipment, food systems, and mining hardware. The base starts building the base. That’s what “self-growing” means. Compound logistics where each delivery makes the next delivery cheaper. 2027: first uncrewed Starship lunar landing. SpaceX told investors March 2027. 2028-2030: cargo buildup, 30-50 deliveries, all robotic, ISRU prototypes go operational. 2030-2032: first crews arrive, probably 6-12 people, 6-month rotations, running equipment maintenance and scaling propellant production. 2033-2035: permanent population hits 50-100, propellant depot goes up in low lunar orbit so arriving ships refuel before descent. 2035 onward: population grows past 100, agricultural modules come online, the base becomes partially self-sustaining. The unsolved problems are real. Lunar dust is electrostatically charged and sharp as broken glass. It shreds seals, clogs machinery, and embeds in lung tissue. Nobody has a long-duration fix. Radiation on the surface runs 200x Earth’s dose. Regolith shelters and water shielding help but add enormous construction overhead. The 14-day night drops temperatures to -173°C and kills all solar power, and the only flight-ready nuclear reactors produce 1-10 kW, far below what a growing base demands. What years of 1/6 gravity do to human bone density and cardiovascular systems is completely unknown. SpaceX is valued at a trillion dollars and just told investors the Moon comes first. They’re betting that proving lunar logistics at commercial cadence builds the playbook for Mars. The Moon is a 2-day test lab with a 12-day resupply cycle. Mars is a 6-month voyage with a 2.5-year wait if anything breaks. It makes sense.
Elon Musk@elonmusk

For those unaware, SpaceX has already shifted focus to building a self-growing city on the Moon, as we can potentially achieve that in less than 10 years, whereas Mars would take 20+ years. The mission of SpaceX remains the same: extend consciousness and life as we know it to the stars. It is only possible to travel to Mars when the planets align every 26 months (six month trip time), whereas we can launch to the Moon every 10 days (2 day trip time). This means we can iterate much faster to complete a Moon city than a Mars city. That said, SpaceX will also strive to build a Mars city and begin doing so in about 5 to 7 years, but the overriding priority is securing the future of civilization and the Moon is faster.

English
1.1K
961
8.3K
1.7M
Zain Hoda
Zain Hoda@zain_hoda·
@omanuel Interesting! For your customers who don’t have Asana, would you offer “Asana-like” functionality embedded directly in your agent stack at some point?
English
1
0
0
822
Olivier Manuel
Olivier Manuel@omanuel·
@AlexFinn @ConstantHawk When you amortize the cost of the Mac and then add electricity use over time, is the cost edge in the data center? Or local compute. At scale.
English
0
0
0
32
Alex Finn
Alex Finn@AlexFinn·
@ConstantHawk the idea isn't to replace claude. it's to supplement it with employees that can work 24/7/365 for free
English
4
0
9
1.9K
Alex Finn
Alex Finn@AlexFinn·
3 THINGS YOU NEED TO BUILD IMMEDIATELY WITH OpenClaw: 1. Activity feed 2. Calendar 3. Global search All 3 will super power your workflow • Activity feed actively tracks everything your OpenClaw does. This is critical, because if you have it working autonomously, this will give you insight into EVERY SINLGE THING it does, to make sure it's not wasting tokens • Calendar lets you see all of OpenClaw's scheduled tasks. Now you can verify when it's going to work proactively for you. Also will let you know when it has scheduled tasks you might not want it to do anymore, saving more tokens • Global search allows you to search through ALL of OpenClaw's memories, tasks, documents and past conversations. OpenClaw has such incredible memory, but no interface to view any of it. Now you can search through it easily and find old nuggets you talked about. Steal this prompt to get it all installed: "I want you to build out 3 things for me. In a Mission Control dashboard, build out an activity feed first. This activity feed will record EVERY SINGLE THING you do for me, so I can see a history of every action and and task you've completed. I want a calendar view that shows me in a nicely formatted screen every scheduled task you have in the future in a weekly view. And I want a global search where I can search for any term and you display any relevant memory, document, or task from our workspace. Use NextJS as the framework, Convex as the database, and Codex to code it all out"
Alex Finn tweet mediaAlex Finn tweet media
English
189
203
2.6K
350.5K
Olivier Manuel
Olivier Manuel@omanuel·
@levie @stevesi …”Not just big things, like everything in your home, but the whole nature of work, collaboration” Yes.
English
0
0
0
102
Aaron Levie
Aaron Levie@levie·
@stevesi “The number of processes and experiences in work and life that are not yet fundamentally improved by software is far greater than the number that have been improved by software.” 💯
English
5
10
257
24.3K