Anders Fredriksson

6.1K posts

Anders Fredriksson

@andefred

Entrepreneur & investor, Author writing “Speed - Optimizing Startups for Speed of Iteration”. I like AI, and to debunk Tesla FUD. 500 & Alchemist alumni

Gothenburg, Sweden Katılım Nisan 2008

824 Takip Edilen887 Takipçiler

Anders Fredriksson retweetledi

Tongzhou Mu 🤖🦾🦿@tongzhou_mu·4d

Everyone is talking about "World Models" for robotics, following the buzz from GTC 2026. But the research landscape is shifting so fast it’s difficult to keep up. In my view, here are the two dominant paradigms currently grounding the video world models in robot control. --- Paradigm 1: Use the Video Model as a Simulator The first major approach is using video world models to simulate reality. In this framework, the model predicts "what happens next" in either pixel space or latent space, conditioned on text prompts or robot actions. Much like traditional analytical simulators (e.g., IsaacSim, MuJoCo, ManiSkill), these learned simulators are used for data synthesis, planning, and evaluation. 1.1 Synthesizing Data for Policy Training A representative work is DreamGen [1]. Given an initial frame and a language instruction, a fine-tuned video model synthesizes clips of a robot completing a task. An inverse dynamics model then labels these videos with actions to train a separate robot policy. GR00T N1 [2] uses a similar strategy. Alternatively, models can act as interactive simulators where agents (like UniSim [4]) or humans (like Interactive World Simulator [3]) generate data through interaction. Key Advantages: Thousands of hours of "synthetic experience" at a lower cost and the ability to safely simulate rare, dangerous edge cases. 1.2 Inference-Time Planning Instead of following a fixed path, robots can use video models to "imagine" multiple future outcomes. In V-JEPA 2 [5], an action-conditioned video model evaluates different action sequences to find the best next step. This "imagination-based planning" is also a core theme in CLASP [6], SWIM [7], VLP [8], GPC [9], DreamDojo [10], and Cosmos Policy [11]. The challenge remains fitting this heavy computation into real-time control budgets. 1.3 Policy Evaluation Video models allow us to test policies before they ever touch physical hardware. Veo Robotics [12] demonstrates that these models can accurately predict relative performance and perform "red teaming" to expose safety violations. This approach is also seen in IRASim [13], 1XWM [14], Ctrl-World [15], and others. Summary of Paradigm 1: While powerful, there is no "free lunch." These methods depend on prediction accuracy. Our physical world is complex, and teaching video models to handle every edge case without hallucinating physics remains a significant challenge. --- Paradigm 2: Use the Video Model as a Policy The second, more integrated paradigm is using the generative video model as the policy (decision-maker) itself. Because the native outputs are videos rather than robot actions, several methods have been developed to obtain control signals. 2.1 Generating Video and Action Jointly A straightforward idea is to add an action decoder to the video model backbone and run video and action denoising jointly during inference. Representative works include DreamZero [16], Cosmos Policy [11], Motus [17], PAD [18], GR-1 [19], and GR-2 [20] (note that the GR series are not diffusion models). This method leverages the rich spatiotemporal priors of pre-trained models with minimal architecture changes. 2.2 Extracting Visual Representations for Action Generation Rather than full generation, many methods use video models to extract deep visual representations to guide action generation. Example works include VPDD [21], VPP [22], UVA [23], UWM [24], Video Policy [25], and DiT4DiT [26]. A major advantage here is that you don’t necessarily need to run multiple denoising steps on giant models, making real-time control easier, though it remains unclear if the full potential of the video models is being utilized. 2.3 Open-loop Video Generation + Video-to-Action Translation A rising trend involves generating a "desired future" video and using a separate inverse dynamics model to translate that video into actions. UniPi [27] pioneered this, followed by This&That [28], TesserAct [29], and 1XWM Self-Learning [30]. Some methods generate videos of humans completing tasks (Dreamitate [31], Gen2Act [32], LVP [33]) and translate those to robot actions. This approach allows video models to do exactly what they were trained for: video generation. 2.4 Closed-loop Video Generation + Video-to-Action Translation Open-loop generation often leads to hallucinations: the model might "see" the robot picking up an apple that isn't actually there. Closed-loop generation avoids this by constantly conditioning on the latest real-world observations, replacing generated frames with real ones in the next call. Recently, mimic-video [34] and LingBot-VA [35] reached real-time speeds using KV caching and partial denoising. Most notably, the DVA [36] model released this month manages real-time generation with full video denoising, which means denoising pure noise all the way to clean video for every step. This approach seems really promising to me, because it reduces robot control into a problem of real-time video generation, which can directly benefit from large-scale video pre-training. --- To me, the key takeaway from this evolution is how we have begun bridging the gap between the digital and physical worlds. Instead of trying to manually program every physical law, we are leveraging the implicit physics embedded in billions of web videos. Whether we use these models as simulators or as direct policies, the objective is the same: providing robots with a “physical common sense.” By reformulating robot control as a challenge of real-time video generation, we may be on the verge of a new scaling law for embodied intelligence. [References in the comment]

English

553

33.4K

Anders Fredriksson retweetledi

InSpatio@InSpatio_AI·5d

We don’t generate videos. 🎬 We generate worlds from videos. 🌍 Introducing InSpatio-World — the world's first open-source real-time 4D world model‼️ Your input: a video clip Our output: a dynamic, navigable, persistent world 🕹️ explore freely across viewpoints ⏪ control time forward and backward 🔓 open-source and ready to build on :) Live demo: 🔗 world.inspatio.com Code & weights: 🔗 github.com/inspatio/inspa… Project page: 🔗 inspatio.github.io/inspatio-world

English

127

733

105.1K

Anders Fredriksson@andefred·5d

@LarsenJensenUSA @wholemars Maybe this guy is for you? Long family tradition! youtu.be/T4Upf_B9RLQ?si…

YouTube

English

Larsen Jensen@LarsenJensenUSA·5d

We are looking to invest in founders who are highly skilled in Retardmaxximization.

English

124

514

49.1K

Anders Fredriksson retweetledi

WebGL / WebGPU@webgl_webgpu·6d

🎬 Applied Showcase: Meet MasterSelects, a full video editor that runs entirely on WebGPU. No native app, no CPU roundtrip. 37 blend modes and 3D rotation in a single 618-line WGSL shader. GPU-accelerated scopes, optical flow scene detection, SAM2 segmentation on-device, and export straight from the GPU canvas. 13 dependencies total. MIT licensed. 🔗 webgpu.com/showcase/maste… #WebGPU #VideoEdit #Film #AI #Creative

English

4.6K

Anders Fredriksson@andefred·17 Mar

I guess that is quite dependent on what types of systems you are running. If stability is the top priority then speed will always be suffering. I usually say that speed is inversely proportional to the cost of an error. If errors are not critical it is usually better to opt for faster speed since deploying a fix or rolling back is faster than adding very slow QA for everything. I do agree with that slop is adding a lot of instability. But I think that problems come from not properly giving the agents the ability to verify and validate their work. I also think it’s important that we shift our frame of mind from how we worked a few months ago when ai helped write code that then was handed over to humans. The new mindset required will be code is no longer for humans to read, it’s only ever handed over to other agents. And if you have that mindset it becomes incredibly important to manage agents context and the ability to understand the code they see and tasks they receive.

English

Manthan Gupta@manthanguptaa·17 Mar

I would respectfully disagree. A lot of vibe-coded software today is more unstable, not less. Haven't seen this many crashes, outages, and bad UX across IDEs/CLIs in a while. "If it works, it works" doesn’t hold in real systems. Code isn’t just for execution, it’s for debugging, extending, and operating at scale. "Just rewrite it" ignores the cost of regressions and lost context. Speed isn’t free. You are paying for it with reliability.

English

452

Manthan Gupta@manthanguptaa·17 Mar

LLMs have made code cheap. So now people are spinning up 10 agents working on 10 features in parallel. Sounds productive. But the tradeoff is obvious: the code quality is often spaghetti + over-engineered. LLMs behave like over-eager interns. They will do more than asked, add abstractions you didn’t need, and optimize for "completeness" over simplicity. Which means you end up babysitting anyway. For anything non-trivial, I have found you still need to spend 1–3 hours upfront: • defining scope • writing clear specs • thinking through system boundaries • setting constraints Otherwise, the system drifts. And even after that, you have to review the code. They still hallucinate patterns, introduce unnecessary layers, or miss edge cases, even with detailed instructions. A lot of people advocate "just let agents cook." In practice, you're often getting 60-70% unnecessary code that increases: • cognitive load • onboarding time • surface area for bugs • long-term maintenance cost For side projects, this is fine. But for real systems with shared codebases, multiple engineers, and production traffic, this compounds fast. We are already seeing: • unstable tools • memory leaks • constant crashes • frequent rewrites This isn't just "early days", it’s a direct result of speed > discipline. Spinning up 10 agents feels like productivity. But you are often just pulling forward the cost into refactoring hell. I would rather: build slower → keep systems simple → refactor less frequently Good engineering is still about what you choose not to build.

David Cramer@zeeg

im fully convinced that LLMs are not an actual net productivity boost (today) they remove the barrier to get started, but they create increasingly complex software which does not appear to be maintainable so far, in my situations, they appear to slow down long term velocity

English

274

25.2K

Anders Fredriksson retweetledi

TBPN@tbpn·14 Mar

FULL INTERVIEW: @travisk joins TBPN to discuss his new company Atoms, physical AI, Uber, and more: 01:18 - Why he's been building in stealth for 8 years 04:32 - Atoms and the future of physical AI 08:10 - Creating a culture of builders 12:05 - Lessons from Uber 24:30 - The vision for physical AI and robotics 31:15 - Why humans will be the main beneficiaries of AI 38:20 - Mining, autonomous robots, automation 47:05 - Why Travis moved to Texas

English

287

2.3K

618.1K

Anders Fredriksson retweetledi

🌸🎵 Beautiful Melody 🎶💖@Ducnghia16·14 Mar

What a voice! ❤️❤️❤️ #Backyardsessions #Miley #Jolene

English

416

19K

Anders Fredriksson retweetledi

DrKnowItAll@DrKnowItAll16·11 Mar

Ah dayum. Ternary! I have been wanting to do a video on the advantages of ternary logic for a while now. Guess this is my sign to do so.

Guri Singh@heygurisingh

Holy shit... Microsoft open sourced an inference framework that runs a 100B parameter LLM on a single CPU. It's called BitNet. And it does what was supposed to be impossible. No GPU. No cloud. No $10K hardware setup. Just your laptop running a 100-billion parameter model at human reading speed. Here's how it works: Every other LLM stores weights in 32-bit or 16-bit floats. BitNet uses 1.58 bits. Weights are ternary just -1, 0, or +1. That's it. No floats. No expensive matrix math. Pure integer operations your CPU was already built for. The result: - 100B model runs on a single CPU at 5-7 tokens/second - 2.37x to 6.17x faster than llama.cpp on x86 - 82% lower energy consumption on x86 CPUs - 1.37x to 5.07x speedup on ARM (your MacBook) - Memory drops by 16-32x vs full-precision models The wildest part: Accuracy barely moves. BitNet b1.58 2B4T their flagship model was trained on 4 trillion tokens and benchmarks competitively against full-precision models of the same size. The quantization isn't destroying quality. It's just removing the bloat. What this actually means: - Run AI completely offline. Your data never leaves your machine - Deploy LLMs on phones, IoT devices, edge hardware - No more cloud API bills for inference - AI in regions with no reliable internet The model supports ARM and x86. Works on your MacBook, your Linux box, your Windows machine. 27.4K GitHub stars. 2.2K forks. Built by Microsoft Research. 100% Open Source. MIT License.

English

2.2K

Anders Fredriksson@andefred·12 Mar

It’s honestly crazy to get to build a company in this new world of software engineering! Today I shipped 27k lines of code, new core features of our system at @staerai and all this while handling a small crisis at one of our partners facilities in Germany and remotely hot fixed a ”robot brain” to speak a new protocol that I’ve never seen before, all from the comfort of my home office… my colleague created a few new rust kernels allowing us to generate new custom 3d ui in multiple options as an internal prototyping tool. There are no limits anymore (besides time). 1 year ago, the changes I made today would have taken 2-4 weeks to do at least. 3 years ago it would have taken 4 months…

English

Anders Fredriksson retweetledi

Bojan Tunguz@tunguz·11 Mar

AI is tuning nondevelopers into developers, developers into 10X developers, and 10X developers into 1000X developers.

English

170

16.3K

Anders Fredriksson retweetledi

Peppe Silletti@peppesilletti·10 Mar

The single most powerful exercise for shipping faster: (It takes 30 minutes and a spreadsheet.) Mapping your bottleneck. Map every single step from "someone has an idea" to "that idea is running in production." Multiply how often each step happens by how long it takes. The biggest number is your bottleneck. Fix that one. Repeat. @andefred shared this exercise on my podcast last week, and it'll help you reframe how I think about shipping speed. Here's a step by step guide 👇

English

Anders Fredriksson retweetledi

Junyi Zhang@junyi42·9 Mar

𝗢𝗻𝗲 𝗺𝗲𝗺𝗼𝗿𝘆 𝗰𝗮𝗻’𝘁 𝗿𝘂𝗹𝗲 𝘁𝗵𝗲𝗺 𝗮𝗹𝗹. We present 𝗟𝗼𝗚𝗲𝗥, a new 𝗵𝘆𝗯𝗿𝗶𝗱 𝗺𝗲𝗺𝗼𝗿𝘆 architecture for long-context geometric reconstruction. LoGeR enables stable reconstruction over up to 𝟭𝟬𝗸 𝗳𝗿𝗮𝗺𝗲𝘀 / 𝗸𝗶𝗹𝗼𝗺𝗲𝘁𝗲𝗿 𝘀𝗰𝗮𝗹𝗲, with 𝗹𝗶𝗻𝗲𝗮𝗿-𝘁𝗶𝗺𝗲 𝘀𝗰𝗮𝗹𝗶𝗻𝗴 in sequence length, 𝗳𝘂𝗹𝗹𝘆 𝗳𝗲𝗲𝗱𝗳𝗼𝗿𝘄𝗮𝗿𝗱 inference, and 𝗻𝗼 𝗽𝗼𝘀𝘁-𝗼𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻. Yet it matches or surpasses strong optimization-based pipelines. (1/5) @GoogleDeepMind @Berkeley_AI

English

449

3.4K

549.5K

Anders Fredriksson@andefred·7 Mar

@tomazstolfa @andreasklinger Hey, that’s what I’m building right now… staer.ai

English

Tomaž Štolfa@tomazstolfa·7 Mar

@andreasklinger Hey, that’s my job…

English

118

Andreas Klinger 🦾@andreasklinger·7 Mar

hey! that's my line…

NXT EU@NXT4EU

Nvidia CEO Jensen Huang advices Europe to go full in on Physical AI and robotics. "Your industrial base is so strong, this is your once in a generation opportunity"

English

128

8.2K

Anders Fredriksson@andefred·4 Mar

Looks like a great framework, but anything that takes more than a day to ship is not an MVP in 2026. Iterations are counted in minutes and hours.

Prajwal Tomar@PrajwalTomar_

Before we build ANY MVP, we use the MoSCoW method to define scope. MoSCoW = Must have, Should have, Could have, Won't have. We categorize every feature: → Must have = Core features for phase 1 → Should have = Nice to have but not critical → Could have = Future roadmap → Won't have = Out of scope This simple framework gives us a clear feature list, a lean scope, and zero ambiguity. It's the difference between shipping in 3 weeks vs getting stuck for 6 months.

English

Anders Fredriksson retweetledi

Lucas Crupi@lucas_crupi·4 Mar

Worked at Tesla. Now I'm building a wire harness factory The difference between Tesla and the other OEMs? The engineers actually think about what they're designing. Everyone else just copies what they did last time with a few tweaks. Had a customer using a super expensive multi-conductor cable. Asked why. "That's what the engineer before me used." Switched them to single conductors. Literally 90% cheaper, works the same. We made less money but they became more competitive, can continue to build cool shit and hopefully grow the pie of atoms that go to market.

Nic Cruz Patane@niccruzpatane

Ford CEO Jim Farley, in a new interview, says he realized Ford had been doing EVs all wrong after his team ripped apart a Tesla: “When we ripped apart a Tesla, I was just absolutely flabbergasted. The Mach-E's wiring harness was 70 pounds heavier and 1.6 kilometers longer. We didn't know what was going on in [Tesla engineers' ] minds. But now we understand. They had no prejudice. We had prejudice. We'd gone to our supply-chain person and said, "Buy another wiring harness." [Tesla] said, "Let's design the vehicle for the lowest, smallest battery." Totally different approach.”

English

1.6K

178.5K

Anders Fredriksson retweetledi