Harald Schäfer

775 posts

Harald Schäfer

@___Harald___

Let’s create life. CTO at @comma_ai

Katılım Kasım 2022

229 Takip Edilen5.4K Takipçiler

Sabitlenmiş Tweet

Harald Schäfer@___Harald___·6 Ara

The world is not zero sum. People can create something out of nothing, and many do. Should be encouraged more.

English

243

48.2K

Harald Schäfer@___Harald___·6h

@deanmckee757 @leothecurious You're not!

English

Dean McKee@deanmckee757·7h

Running my first fully fledged RL project and am wildly disappointed that this is where we are. Most time spent burning cpu cycles on rollouts. Brought over some representation learning tricks that helps sample efficiency but good god it feels brute force and boring Am very much hoping I’m simply missing something.

English

davinci@leothecurious·10h

scaling off-policy teleop data is boring. it's also an uphill climb, not a flywheel. i want to see on-policy self-improving robotic models work. i want to see robots that flail around, try to do things badly, learn from mistakes, do them better on the next try, and before u know it, achieve superhuman competence at a task. i want to see robots that are goal-conditioned. ones that explore optimal methods for satisfying task requirements, not just mimicking human ones. if the sucess of ur robotic model depends on perpetually scaling expert demonstrations, u're in for a rude awakening a few years down the line.

English

9.5K

Harald Schäfer@___Harald___·8h

Sad to see this. I wish more small companies would stay independent and sell services instead of getting acquired. High quality open-source tooling is incredible for society. I hope I'm wrong and the astral team continues to do great open-source work at openai!

Charlie Marsh@charliermarsh

We've entered into an agreement to join OpenAI as part of the Codex team. I'm incredibly proud of the work we've done so far, incredibly grateful to everyone that's supported us, and incredibly excited to keep building tools that make programming feel different.

English

2.8K

Harald Schäfer@___Harald___·1d

I'll try to explain how this fits in with other training approaches for self-driving, and why I think this milestone is so important. Both for us, and robotics in general. Training an end-to-end agent with RL in a fully learned simulator (aka world model) is the holy grail of robotics. It's a very generic strategy, it's expected to scale to all of robotics with very few caveats. Nobody has shipped a robotics product like this to users, but I believe we are currently the closest. An end-to-end agent is just one that takes in all available inputs (video, IMU, ...) and directly outputs the actions to take. This isn't controversial anymore, but how to train those actions is the hard part. The first instinct is to just collect data of human experts, and have your agent learn to predict the expert actions for the corresponding inputs, aka imitation learning. For driving we define those actions as acceleration and steering curvature. This is a good start, but an agent trained like this will completely fail in the real world. Why this happens is the subject of much debate, but my summary is that an agent needs to be exposed to its own mistakes during training to be able to recover from them. For our driving models trained this way this manifests as drifting out of lane with the agent making no attempt to recover. One solution to this problem is to fine-tune on a curated dataset of recoveries. One example of this is letting humans label the "ideal" place to be on an image of a road (or top-down view), and having an MPC system generate trajectories to get there smoothly. Another is to just let your broken agent drive in the real world, and let a human supervisor take over when it makes mistakes and correct them. You can then add those corrections to the dataset, retrain, and ship the updated model out. If you do this iteration enough times you get a good agent. These strategies have been how several self-driving companies have gotten great capability. But they are expensive, because they require humans in the loop or even realworld mistakes. Training in simulation allows you to do this training without needing real-world disengagements or human labeling. This is the strategy we've focused on for quite a while. You need a learned simulator that can match the diversity and fidelity of reality. This means video-game type simulations with assets are inadequate. Many companies have now trained world models as simulators for driving that do this quite well. To my knowledge, no self-driving system shipped today other than ours has trained their agents on-policy in such a simulator to achieve their capability. I would love to hear more if I'm wrong about this, I'm not always up to date with what other companies are doing. Ideally we would train our agents on-policy in such a simulator with RL on a good reward function. For example a good reward function would be a GAN-style approach where a discriminator says if the agent's driving is similar to that of a known good human driver. State-of-the-art RL doesn't seem good enough for this yet, we have not succeeded at using RL in this way. Instead we train on-policy in the learned simulator, but still provide ground-truth actions. How these actions are generated is not trivial to describe and explained in detail in our 2025 CVPR paper. We hope to move to reward-based learning soon. Learning based on rewards should allow us to train policies that are smarter, particularly at low-level control, which is a big limitation of our current approach. Reward-based learning will also scale better to generic robotic tasks other than driving. blog.comma.ai/011release/

English

155

5.6K

Harald Schäfer@___Harald___·2d

@xroma__ Yeah good chance, just not high priority at the moment. We released a very early version already.

English

225

xavi@xroma__·2d

@___Harald___ any chance you will release the worldmodel?

English

253

Harald Schäfer@___Harald___·2d

Our driving model trained on-policy in world-model simulation is now in release! Many robots will be trained like this in the future, cars are just the beginning.

comma@comma_ai

Fellow e2e enjoyers, We've got something special for you today. Introducing openpilot 0.11: the first robotics agent fully trained in a learned simulation. Shipped to real users.

English

200

29.2K

Harald Schäfer@___Harald___·6d

@nuwandavek Still gotta use vscode with autocomplete, I'm not a masochist!

English

257

Vivek Aithal@nuwandavek·6d

@___Harald___ autocomplete still or cold turkey? 😱

English

2.1K

Harald Schäfer@___Harald___·6d

I’ve only been “agentic engineering” for a couple months and I already feel the need to detox. Today is “agent-free-Friday”, code will be written by hand only. Wish me luck.

English

2.9K

Harald Schäfer@___Harald___·13 Mar

@0xSero Open source will win. In fact it already is winning. Even the frontier labs rely on countless open-source projects. 2026 open-source stack is far ahead of 2023's best stuff. If we keep open-source free and only 3 years behind, that's pretty damn amazing.

English

1.4K

0xSero@0xSero·12 Mar

Open source must win.

English

394

36.2K

Harald Schäfer@___Harald___·11 Mar

@FrameworkPuter I kept the controller wired, because I didn't want to jinx my great experience lol. Really can't express how smooth this was. Steam even instantly auto-detects the controller type.

English

505

Harald Schäfer@___Harald___·11 Mar

Good linux gaming is my favorite new technology. I bought a @FrameworkPuter desktop, installed ubuntu, installed steam, and plugged in a controller. Then I started playing black myth wukong. No config or complicated setup. Flawless experience.

English

1.5K

Harald Schäfer@___Harald___·6 Mar

70kloc for flash attention is crazy! @comma_ai 's entire training codebase and all the infra that runs our datacenter is now 98kloc!

Red Hat AI@RedHat_AI

FlashAttention 3 has ~70,000 lines of code. The vLLM Triton attention backend has ~800. On H100 it hits 100.7% of FlashAttention 3 performance. On AMD MI300 it delivers a 5.8x speedup over earlier implementations. Same kernel source code on both. IBM Research, Red Hat, and AMD built this for vLLM to solve the portability problem: one kernel, any GPU vendor. Check it out 👇

English

11.5K

Harald Schäfer@___Harald___·6 Mar

I started working on this in 2017, this timeline contains my entire professional life! Very proud we are finally shipping models trained in a learned simulator. I believe comma is the first to do this. Fun look back and see how it went from idea to reality over 10 years.

comma@comma_ai

10 years of shipping

English

110

3.5K

Harald Schäfer@___Harald___·5 Mar

@anatolykim8 Yeah it's a lot of fun! I think it's a huge net win. There's just gonna be a lot of weird problems. Personally I find 95% of work much more fun, and 5% very frustrating. My instinct is to use agents for everything, and when they fail I waste time and I'm not learning either.

English

422

Anatoly Kim@anatolykim8·5 Mar

@___Harald___ I haven't had so much fun while actually shipping some useful stuff for the last ~20 years as I had for the last 3 weeks. I am feeling so back. Just saying.

English

346

Harald Schäfer@___Harald___·5 Mar

It's hard to express how much software engineering has changed in the last 6 months. This seems clearly a huge win. So many tedious tasks solved. But I suspect there will be serious negative side-effects. Software engineers have developed a culture of taste around good code over many decades. We wince when we see a value hardcoded in multiple places, because we know that's fragile. But what is good prompt? Should I be concise or redundant? Mean or friendly? I genuinely have no idea. We will have to do it wrong to find out.

English

8.3K

Harald Schäfer@___Harald___·5 Mar

@EncodedInsight Definitely! It's still engineering. It's just a different flavor. We don't really know how to build stable long-term projects with them yet, but we already rely on them. A strange combo.

English

205

EncodedInsight@EncodedInsight·5 Mar

@___Harald___ There is still a lot of engineering to structure and build repeatable verifiable systems using these models. But, yes, it is representing taste in a different way

English

300

Keşfet

@deanmckee757 @leothecurious @xroma__ @nuwandavek @0xSero @FrameworkPuter @comma_ai @anatolykim8