
@__tinygrad__ The operators serve hyper-specialized implementations of each model. How good is tinygrad at fusing high-level ops?
Even with some advanced compiler magic, the hand-tuned kernels with nit-picked fusions are hard to beat. It's a pristine model blueprint vs. a tuned Franken-model.
English


















