Paul Sutter

919 posts

Paul Sutter banner
Paul Sutter

Paul Sutter

@paulsutter

CEO at @Neofactoryai. Founded Formlogic, Cofounded Quantcast. paul at neofactory dot ai

SF Bay Area Katılım Mart 2009
1.6K Takip Edilen1.7K Takipçiler
Sabitlenmiş Tweet
Paul Sutter
Paul Sutter@paulsutter·
Slashing costs is the goal, but you need to choose the right market otherwise its radial tires all over again. Abundance is about repeatedly 1/10th-ing costs, in markets where demand grows with an even larger multiplier (1/3)
Harry Stebbings@HarryStebbings

The False Promise of Labour Replacement Revenue @immad's take: "AI companies are charging 1/3 of labor costs — but margins will get crushed." Right now, AI pricing looks brilliant on paper. But there’s no moat. No network effects. Once competitors show up, margins will disappear. What looks like efficiency today, might be a race to the bottom tomorrow. @gdb @Altimor @karpathy — how defensible is labour automation in the long term?

English
13
50
461
155.1K
Patrick Heizer
Patrick Heizer@PatrickHeizer·
I literally have an ongoing cancer experiment where 100% of the untreated and control animals have had to be euthanized while 100% of the treatment animals are seemingly unaffected. But we're still extremely far away from "proving that it works." Science is hard.
English
185
70
2.4K
742.5K
Patrick Heizer
Patrick Heizer@PatrickHeizer·
Sorry to be the downer because this is an impressive story in some senses. But it is ~trivially easy to make a single mRNA vaccine. It's not hard. I cure mice of various cancers with various therapeutics all the time. I've made mice lose more weight in a month than tirzepatide does in a year. What is hard and expensive is proving its BOTH safe AND effective **in a randomized and controlled study in humans** while ALSO manufacturing it at clinical scale and grade. I am happy for this man and his dog. It is impressive. But y'all are overhyping it.
Séb Krier@sebkrier

This is wild. theaustralian.com.au/business/techn…

English
942
420
5.6K
5M
Paul Sutter
Paul Sutter@paulsutter·
@zhuokaiz Friston’s Active Inference is more likely a means to generate trainable environments for general world models like Deepmind or JEPA, which are the real candidates to solve physics intelligence
English
0
0
3
392
Zhuokai Zhao
Zhuokai Zhao@zhuokaiz·
AMI Labs just raised $1.03B. World Labs raised $1B a few weeks earlier. Both are betting on world models. But almost nobody means the same thing by that term. Here are, in my view, five categories of world models. --- 1. Joint Embedding Predictive Architecture (JEPA) Representatives: AMI Labs (@ylecun), V-JEPA 2 The central bet here is that pixel reconstruction alone is an inefficient objective for learning the abstractions needed for physical understanding. LeCun has been saying this for years — predicting every pixel of the future is intractable in any stochastic environment. JEPA sidesteps this by predicting in a learned latent space instead. Concretely, JEPA trains an encoder that maps video patches to representations, then a predictor that forecasts masked regions in that representation space — not in pixel space. This is a crucial design choice. A generative model that reconstructs pixels is forced to commit to low-level details (exact texture, lighting, leaf position) that are inherently unpredictable. By operating on abstract embeddings, JEPA can capture "the ball will fall off the table" without having to hallucinate every frame of it falling. V-JEPA 2 is the clearest large-scale proof point so far. It's a 1.2B-parameter model pre-trained on 1M+ hours of video via self-supervised masked prediction — no labels, no text. The second training stage is where it gets interesting: just 62 hours of robot data from the DROID dataset is enough to produce an action-conditioned world model that supports zero-shot planning. The robot generates candidate action sequences, rolls them forward through the world model, and picks the one whose predicted outcome best matches a goal image. This works on objects and environments never seen during training. The data efficiency is the real technical headline. 62 hours is almost nothing. It suggests that self-supervised pre-training on diverse video can bootstrap enough physical prior knowledge that very little domain-specific data is needed downstream. That's a strong argument for the JEPA design — if your representations are good enough, you don't need to brute-force every task from scratch. AMI Labs is LeCun's effort to push this beyond research. They're targeting healthcare and robotics first, which makes sense given JEPA's strength in physical reasoning with limited data. But this is a long-horizon bet — their CEO has openly said commercial products could be years away. --- 2. Spatial Intelligence (3D World Models) Representative: World Labs (@drfeifei) Where JEPA asks "what will happen next," Fei-Fei Li's approach asks "what does the world look like in 3D, and how can I build it?" The thesis is that true understanding requires explicit spatial structure — geometry, depth, persistence, and the ability to re-observe a scene from novel viewpoints — not just temporal prediction. This is a different bet from JEPA: rather than learning abstract dynamics, you learn a structured 3D representation of the environment that you can manipulate directly. Their product Marble generates persistent 3D environments from images, text, video, or 3D layouts. "Persistent" is the key word — unlike a video generation model that produces a linear sequence of frames, Marble's outputs are actual 3D scenes with spatial coherence. You can orbit the camera, edit objects, export meshes. This puts it closer to a 3D creation tool than to a predictive model, which is deliberate. For context, this builds on a lineage of neural 3D representation work (NeRFs, 3D Gaussian Splatting) but pushes toward generation rather than reconstruction. Instead of capturing a real scene from multi-view photos, Marble synthesizes plausible new scenes from sparse inputs. The challenge is maintaining physical plausibility — consistent geometry, reasonable lighting, sensible occlusion — across a generated world that never existed. --- 3. Learned Simulation (Generative Video + Latent-Space RL) Representatives: Google DeepMind (Genie 3, Dreamer V3/V4), Runway GWM-1 This category groups two lineages that are rapidly converging: generative video models that learn to simulate interactive worlds, and RL agents that learn world models to train policies in imagination. The video generation lineage. DeepMind's Genie 3 is the purest version — text prompt in, navigable environment out, 24 fps at 720p, with consistency for a few minutes. Rather than relying on an explicit hand-built simulator, it learns interactive dynamics from data. The key architectural property is autoregressive generation conditioned on user actions: each frame is generated based on all previous frames plus the current input (move left, look up, etc.). This means the model must maintain an implicit spatial memory — turn away from a tree and turn back, and it needs to still be there. DeepMind reports consistency up to about a minute, which is impressive but still far from what you'd need for sustained agent training. Runway's GWM-1 takes a similar foundation — autoregressive frame prediction built on Gen-4.5 — but splits into three products: Worlds, Robotics, and Avatars. The split into Worlds / Avatars / Robotics suggests the practical generality problem is still being decomposed by action space and use case. The RL lineage. The Dreamer series has the longer intellectual history. The core idea is clean: learn a latent dynamics model from observations, then roll out imagined trajectories in latent space and optimize a policy via backpropagation through the model's predictions. The agent never needs to interact with the real environment during policy learning. Dreamer V3 was the first AI to get diamonds in Minecraft without human data. Dreamer 4 did the same purely offline — no environment interaction at all. Architecturally, Dreamer 4 moves from Dreamer’s earlier recurrent-style lineage to a more scalable transformer-based world-model recipe, and introduced "shortcut forcing" — a training objective that lets the model jump from noisy to clean predictions in just 4 steps instead of the 64 typical in diffusion models. This is what makes real-time inference on a single H100 possible. These two sub-lineages used to feel distinct: video generation produces visual environments, while RL world models produce trained policies. But Dreamer 4 blurred the line — humans can now play inside its world model interactively, and Genie 3 is being used to train DeepMind's SIMA agents. The convergence point is that both need the same thing: a model that can accurately simulate how actions affect environments over extended horizons. The open question for this whole category is one LeCun keeps raising: does learning to generate pixels that look physically correct actually mean the model understands physics? Or is it pattern-matching appearance? Dreamer 4's ability to get diamonds in Minecraft from pure imagination is a strong empirical counterpoint, but it's also a game with discrete, learnable mechanics — the real world is messier. --- 4. Physical AI Infrastructure (Simulation Platform) Representative: NVIDIA Cosmos NVIDIA's play is don't build the world model, build the platform everyone else uses to build theirs. Cosmos launched at CES January 2025 and covers the full stack — data curation pipeline (process 20M hours of video in 14 days on Blackwell, vs. 3+ years on CPU), a visual tokenizer with 8x better compression than prior SOTA, model training via NeMo, and deployment through NIM microservices. The pre-trained world foundation models are trained on 9,000 trillion tokens from 20M hours of real-world video spanning driving, industrial, robotics, and human activity data. They come in two architecture families: diffusion-based (operating on continuous latent tokens) and autoregressive transformer-based (next-token prediction on discretized tokens). Both can be fine-tuned for specific domains. Three model families sit on top of this. Predict generates future video states from text, image, or video inputs — essentially video forecasting that can be post-trained for specific robot or driving scenarios. Transfer handles sim-to-real domain adaptation, which is one of the persistent headaches in physical AI — your model works great in simulation but breaks in the real world due to visual and dynamics gaps. Reason (added at GTC 2025) brings chain-of-thought reasoning over physical scenes — spatiotemporal awareness, causal understanding of interactions, video Q&A. --- 5. Active Inference Representative: VERSES AI (Karl Friston) This is the outlier on the list — not from the deep learning tradition at all, but from computational neuroscience. Karl Friston's Free Energy Principle says intelligent systems continuously generate predictions about their environment and act to minimize surprise (technically: variational free energy, an upper bound on surprise). Where standard RL is usually framed around reward maximization, active inference frames behavior as minimizing variational / expected free energy, which blends goal-directed preferences with epistemic value. This leads to natural exploration behavior: the agent is drawn to situations where it's uncertain, because resolving uncertainty reduces free energy. VERSES built AXIOM (Active eXpanding Inference with Object-centric Models) on this foundation. The architecture is fundamentally different from neural network world models. Instead of learning a monolithic function approximator, AXIOM maintains a structured generative model where each entity in the environment is a discrete object with typed attributes and relations. Inference is Bayesian — beliefs are probability distributions that get updated via message passing, not gradient descent. This makes it interpretable (you can inspect what the agent believes about each object), compositional (add a new object type without retraining), and extremely data-efficient. In their robotics work, they've shown a hierarchical multi-agent setup where each joint of a robot arm is its own active inference agent. The joint-level agents handle local motor control while higher-level agents handle task planning, all coordinating through shared beliefs in a hierarchy. The whole system adapts in real time to unfamiliar environments without retraining — you move the target object and the agent re-plans immediately, because it's doing online inference, not executing a fixed policy. They shipped a commercial product (Genius) in April 2025, and the AXIOM benchmarks against RL baselines are competitive on standard control tasks while using orders of magnitude less data. --- imo, these five categories aren't really competing — they're solving different sub-problems. JEPA compresses physical understanding. Spatial intelligence reconstructs 3D structure. Learned simulation trains agents through generated experience. NVIDIA provides the picks and shovels. Active inference offers a fundamentally different computational theory of intelligence. My guess is the lines between them blur fast.
English
57
230
1.5K
311.2K
Paul Sutter
Paul Sutter@paulsutter·
@ismailhozain Do a GR&R test to find out, it’s the only way to know (and also necessary). Rule of thumb is that measurement should be 10x better than tolerances checked.
English
1
0
4
190
Ismail Hozain
Ismail Hozain@ismailhozain·
Are 3d laser scanners good enough to use for QC of machined parts (to within +-1 thou) yet? The given creality specs seem to say yes, but I'm wondering if anyone here has had experience with this workflow. Saw a guy recently using a Keyence laser scanning system, said it was much faster than his CMM.
English
16
1
23
3K
The Information
The Information@theinformation·
.@wolfejosh, co-founder of Lux Capital, discusses why the massive build-out of AI data centers may be overextended: "The amount of spend, the amount of CapEx, the amount of build for these multi-gigawatt data centers, it to me does not make sense." “I'm just not that optimistic that all this compute is actually going to be needed."
English
10
9
79
109.8K
Peter Holderith
Peter Holderith@_baldtires·
how do you guys move 4000 lb machines around ur shops without a big fucking forklift
English
138
1
148
18.1K
Paul Sutter
Paul Sutter@paulsutter·
@ry_paddy Check back on Founders Fund II a year from now, after they distribute SpaceX
English
0
0
0
15
Patrick Ryan
Patrick Ryan@ry_paddy·
It's at least ~3x better than a16z Fund III and close to 2x Founders Fund II. Granted it was a much smaller vehicle than both - big caveat - but this is also important, it shows emerging managers that a smaller fund can be a good thing
English
2
0
8
1.4K
Patrick Ryan
Patrick Ryan@ry_paddy·
Mucker Capital Fund I returned somewhere between 43 and 53x gross TVPI. In cash terms, they turned $12m into over $500m ($636m at the top end of our estimates). This is the rarefied air of the 99.9th percentile.
Patrick Ryan tweet media
English
9
14
167
20.7K
Paul Sutter
Paul Sutter@paulsutter·
@EricVallieres84 Trochoidal toolpaths if you want to run stainless production on less rigid machines
English
0
0
0
61
Eric Vallieres
Eric Vallieres@EricVallieres84·
Anyone here running production levels of stainless in their Brother? I have questions, like is this a bad idea?
Eric Vallieres tweet media
English
21
0
62
5.2K
Paul Sutter
Paul Sutter@paulsutter·
@emm0sh Watch this space. The starting point is mini-models that understand one manufacturing process.
English
0
0
0
28
Paul Sutter
Paul Sutter@paulsutter·
@ManzTrades Well for one thing they buy GPUs out of COGS and still control the billions of GPUs they’ve sold
English
0
0
0
68
Manz🌪
Manz🌪@ManzTrades·
Every once in awhile, I ask myself why is the 2nd most valuable company in the world, completely sitting out the largest capex investment cycle ever?
staysaasy@staysaasy

Think different

English
601
297
11.4K
2.8M
Paul Sutter
Paul Sutter@paulsutter·
@hamandcheese Actually the work of manufacturing experts is better captured modeling process physics than by observation.
English
0
0
0
35
Samuel Hammond 🦉
Samuel Hammond 🦉@hamandcheese·
AGI won't know everything out of the box. Rather it will have the general competency to learn in context or via a few demonstrations. For manufacturing, this last mile of training data is embedded in the tacit knowledge of skilled workforces -- the ultimate moat, but nothing China can't extract with a few million go-pros.
English
2
0
54
5.4K
Samuel Hammond 🦉
Samuel Hammond 🦉@hamandcheese·
It's worse than than that. A pure software singularity could cause a sudden reversal of fortunes for the US: our comparative advantage in high value-added knowledge sectors radically deflates, leaving China to translate our innovation in bits to their innovation in atoms.
Sholto Douglas@_sholtodouglas

Default case right now is a software only singularity, we need to scale robots and automated labs dramatically in 28/29, or the physical world will fall far behind the digital one - and the US won’t be competitive unless we put in the investment now (fab, solar panel, actuator supply chains).

English
35
58
830
78.1K
Paul Sutter
Paul Sutter@paulsutter·
Or you could just sell something that people actually need
Termsheetinator@termsheetinator

Most B2B deals are WON by understanding these: - Kahneman & Tversky - Prospect Theory - Status Quo Bias - Inertia Effects IF YOU SELL B2B.. DOUBLE BOOKMARK THIS Decades of peer reviewed research in behavioral economics show that people are more motivated to avoid losses than to pursue equivalent gains loss is more powerful than gain - people over weight what they’re already paying - they under weight potential gains that require change - time and effort get treated like fixed taxes - inefficiency becomes invisible once teams adapt around it This bias leads people to avoid situations that feel like a loss - even when the rational or outcome is equal or better. Loss aversion signals (protecting the current state): - We spend a lot of time on this - It’s expensive, but it’s already budgeted - We’ve built the team around this process - Changing this would be disruptive These signal perceived loss tied to change - time, money, effort, political capital. Inertia signals (defaulting to the status quo): - It’s not ideal, but it works - That’s just how the process is - Everyone’s used to it now - We’ve learned how to work around it These signal adaptation Step 1 - listen for adaptation (random examples, find your real signals based on your services) Any time a prospect describes friction casually, you’ve found it - Wasted time - Extra steps - Manual work - Senior people doing junior tasks If it’s described calmly, it’s been normalized Step 2 - double click until it hurts You don’t accept abstraction. You ask: (examples, find your real questions) - How much time is actually going into that? - Who’s doing that work? - What percentage of their week is this? - What happens if that doesn’t change this quarter? Step 3 - ask the reallocation question This is the most important part “If you won that time back… where would you allocate it instead?” This forces comparison Now the prospect tells you: - what the loss is really worth - what’s being sacrificed today - how expensive ‘doing nothing’ actually is At this point, you’re not selling.. you’re exposing cost This is where most scopes go wrong... They’re ONLY anchored in gain - More leads - More deals - More volume - More upside But the buyer’s motivation lives somewhere else It lives in: - time/cash being burned - attention being diluted - effort being misallocated - risk being quietly absorbed This is why revisions start to lose energy when you run our Process Selling™ systems Nothing new is being surfaced... Nothing painful is being relieved Here’s how loss aversion should show up in your scopes: You don’t start with deliverables You start with what stops the bleed - Time reclaimed per week - Senior bandwidth recovered - Manual effort eliminated upstream - Noise removed before it hits the team Only then do you connect that reclaimed capacity to outcomes Now something changes psychologically The buyer stops asking: “Will this work?” And starts thinking: “We’re already paying for the current inefficiency.” That’s the flip Loss aversion works in B2B because: executives defend downside before chasing upside budgets exist to stop leakage before funding growth organizations move faster to remove pain than to pursue gain If you’re not double-clicking on normalized pain, you’re leaving the real buying motive untouched If your scopes don’t reflect loss prevention, they’ll always feel semi-optional And if your discovery calls don’t surface this, you’ll keep revising instead of closing Top closers don’t convince They reveal what’s already being paid, quietly.. every single day That’s loss aversion in B2B

English
0
0
2
283
Paul Sutter
Paul Sutter@paulsutter·
@zanehengsperger It’s easier to improve the performance of easy-to-buy machines than it is to wait for the perfect machine
English
2
0
6
917
Philip Johnston
Philip Johnston@PhilipJohnston·
New request for startups: Methane production For anyone with RnD chops regarding the DAC+electrolysis+sabatier process, now is a good time to start a company!
English
24
8
111
21.7K
Paul Sutter
Paul Sutter@paulsutter·
@Object_Zero_ @tszzl With islanded power (no grid, no turbines), reliability is set by storage. Multi-day planning margins push batteries to the terawatt-hour scale, where they cost more than Starship launches and exceed global production capacity.
English
0
0
0
33
Object Zero
Object Zero@Object_Zero_·
@paulsutter @tszzl But why not run the satellites on the ground? What’s the reason for launching them into space? Just costs money and makes them harder to maintain. If satellites are so great, why not put them in the desert?
English
4
0
8
358
roon
roon@tszzl·
if space compute proves to be economically superior to ground based, I would be pretty worried. that’s one of the scenarios that ends up with extreme monopolistic concentration of hyperscale compute
English
169
37
1.6K
261.8K
Object Zero
Object Zero@Object_Zero_·
@tszzl If satellites are so cheap, why not run them on the ground?
English
9
0
11
3.6K