Harald Schäfer

775 posts

Harald Schäfer banner
Harald Schäfer

Harald Schäfer

@___Harald___

Let’s create life. CTO at @comma_ai

Katılım Kasım 2022
229 Takip Edilen5.4K Takipçiler
Sabitlenmiş Tweet
Harald Schäfer
Harald Schäfer@___Harald___·
The world is not zero sum. People can create something out of nothing, and many do. Should be encouraged more.
English
4
22
243
48.2K
Dean McKee
Dean McKee@deanmckee757·
Running my first fully fledged RL project and am wildly disappointed that this is where we are. Most time spent burning cpu cycles on rollouts. Brought over some representation learning tricks that helps sample efficiency but good god it feels brute force and boring Am very much hoping I’m simply missing something.
English
1
0
1
50
davinci
davinci@leothecurious·
scaling off-policy teleop data is boring. it's also an uphill climb, not a flywheel. i want to see on-policy self-improving robotic models work. i want to see robots that flail around, try to do things badly, learn from mistakes, do them better on the next try, and before u know it, achieve superhuman competence at a task. i want to see robots that are goal-conditioned. ones that explore optimal methods for satisfying task requirements, not just mimicking human ones. if the sucess of ur robotic model depends on perpetually scaling expert demonstrations, u're in for a rude awakening a few years down the line.
English
6
5
56
9.5K
Harald Schäfer
Harald Schäfer@___Harald___·
Sad to see this. I wish more small companies would stay independent and sell services instead of getting acquired. High quality open-source tooling is incredible for society. I hope I'm wrong and the astral team continues to do great open-source work at openai!
Charlie Marsh@charliermarsh

We've entered into an agreement to join OpenAI as part of the Codex team. I'm incredibly proud of the work we've done so far, incredibly grateful to everyone that's supported us, and incredibly excited to keep building tools that make programming feel different.

English
0
0
49
2.8K
Harald Schäfer
Harald Schäfer@___Harald___·
I'll try to explain how this fits in with other training approaches for self-driving, and why I think this milestone is so important. Both for us, and robotics in general. Training an end-to-end agent with RL in a fully learned simulator (aka world model) is the holy grail of robotics. It's a very generic strategy, it's expected to scale to all of robotics with very few caveats. Nobody has shipped a robotics product like this to users, but I believe we are currently the closest. An end-to-end agent is just one that takes in all available inputs (video, IMU, ...) and directly outputs the actions to take. This isn't controversial anymore, but how to train those actions is the hard part. The first instinct is to just collect data of human experts, and have your agent learn to predict the expert actions for the corresponding inputs, aka imitation learning. For driving we define those actions as acceleration and steering curvature. This is a good start, but an agent trained like this will completely fail in the real world. Why this happens is the subject of much debate, but my summary is that an agent needs to be exposed to its own mistakes during training to be able to recover from them. For our driving models trained this way this manifests as drifting out of lane with the agent making no attempt to recover. One solution to this problem is to fine-tune on a curated dataset of recoveries. One example of this is letting humans label the "ideal" place to be on an image of a road (or top-down view), and having an MPC system generate trajectories to get there smoothly. Another is to just let your broken agent drive in the real world, and let a human supervisor take over when it makes mistakes and correct them. You can then add those corrections to the dataset, retrain, and ship the updated model out. If you do this iteration enough times you get a good agent. These strategies have been how several self-driving companies have gotten great capability. But they are expensive, because they require humans in the loop or even realworld mistakes. Training in simulation allows you to do this training without needing real-world disengagements or human labeling. This is the strategy we've focused on for quite a while. You need a learned simulator that can match the diversity and fidelity of reality. This means video-game type simulations with assets are inadequate. Many companies have now trained world models as simulators for driving that do this quite well. To my knowledge, no self-driving system shipped today other than ours has trained their agents on-policy in such a simulator to achieve their capability. I would love to hear more if I'm wrong about this, I'm not always up to date with what other companies are doing. Ideally we would train our agents on-policy in such a simulator with RL on a good reward function. For example a good reward function would be a GAN-style approach where a discriminator says if the agent's driving is similar to that of a known good human driver. State-of-the-art RL doesn't seem good enough for this yet, we have not succeeded at using RL in this way. Instead we train on-policy in the learned simulator, but still provide ground-truth actions. How these actions are generated is not trivial to describe and explained in detail in our 2025 CVPR paper. We hope to move to reward-based learning soon. Learning based on rewards should allow us to train policies that are smarter, particularly at low-level control, which is a big limitation of our current approach. Reward-based learning will also scale better to generic robotic tasks other than driving. blog.comma.ai/011release/
Harald Schäfer tweet media
English
2
15
155
5.6K
Harald Schäfer
Harald Schäfer@___Harald___·
@xroma__ Yeah good chance, just not high priority at the moment. We released a very early version already.
English
0
0
6
225
xavi
xavi@xroma__·
@___Harald___ any chance you will release the worldmodel?
English
1
0
3
253
Harald Schäfer
Harald Schäfer@___Harald___·
I’ve only been “agentic engineering” for a couple months and I already feel the need to detox. Today is “agent-free-Friday”, code will be written by hand only. Wish me luck.
English
4
1
76
2.9K
Harald Schäfer
Harald Schäfer@___Harald___·
@0xSero Open source will win. In fact it already is winning. Even the frontier labs rely on countless open-source projects. 2026 open-source stack is far ahead of 2023's best stuff. If we keep open-source free and only 3 years behind, that's pretty damn amazing.
English
0
2
44
1.4K
0xSero
0xSero@0xSero·
Open source must win.
English
41
40
394
36.2K
Harald Schäfer
Harald Schäfer@___Harald___·
@FrameworkPuter I kept the controller wired, because I didn't want to jinx my great experience lol. Really can't express how smooth this was. Steam even instantly auto-detects the controller type.
Harald Schäfer tweet media
English
0
0
8
505
Harald Schäfer
Harald Schäfer@___Harald___·
Good linux gaming is my favorite new technology. I bought a @FrameworkPuter desktop, installed ubuntu, installed steam, and plugged in a controller. Then I started playing black myth wukong. No config or complicated setup. Flawless experience.
English
4
1
30
1.5K
Harald Schäfer
Harald Schäfer@___Harald___·
I started working on this in 2017, this timeline contains my entire professional life! Very proud we are finally shipping models trained in a learned simulator. I believe comma is the first to do this. Fun look back and see how it went from idea to reality over 10 years.
comma@comma_ai

10 years of shipping

English
3
5
110
3.5K
Harald Schäfer
Harald Schäfer@___Harald___·
@anatolykim8 Yeah it's a lot of fun! I think it's a huge net win. There's just gonna be a lot of weird problems. Personally I find 95% of work much more fun, and 5% very frustrating. My instinct is to use agents for everything, and when they fail I waste time and I'm not learning either.
English
0
0
4
422
Anatoly Kim
Anatoly Kim@anatolykim8·
@___Harald___ I haven't had so much fun while actually shipping some useful stuff for the last ~20 years as I had for the last 3 weeks. I am feeling so back. Just saying.
English
1
0
2
346
Harald Schäfer
Harald Schäfer@___Harald___·
It's hard to express how much software engineering has changed in the last 6 months. This seems clearly a huge win. So many tedious tasks solved. But I suspect there will be serious negative side-effects. Software engineers have developed a culture of taste around good code over many decades. We wince when we see a value hardcoded in multiple places, because we know that's fragile. But what is good prompt? Should I be concise or redundant? Mean or friendly? I genuinely have no idea. We will have to do it wrong to find out.
English
11
0
95
8.3K
Harald Schäfer
Harald Schäfer@___Harald___·
@EncodedInsight Definitely! It's still engineering. It's just a different flavor. We don't really know how to build stable long-term projects with them yet, but we already rely on them. A strange combo.
English
0
0
3
205
EncodedInsight
EncodedInsight@EncodedInsight·
@___Harald___ There is still a lot of engineering to structure and build repeatable verifiable systems using these models. But, yes, it is representing taste in a different way
English
1
0
2
300