serdarml
122 posts

serdarml
@cs_serdar
Aspiring AI researcher, undergrad student @TU_Muenchen





There's a quadrillion-dollar question at the heart of AI: Why are humans so much more sample efficient compared to LLM? There are three possible answers: 1. Architecture and hyperparameters (aka transformer vs whatever ‘algo’ cortical columns are implementing) 2. Learning rule (backprop vs whatever brain is doing) 3. Reward function @AdamMarblestone believes the answer is the reward function. ML likes to use pretty simple loss functions, like cross-entropy. These are easy to work with. But they might be too simple for sample-efficient learning. Adam thinks that, in humans, the large number of highly specialised cells in the ‘lizard brain’ might actually be encoding information for sophisticated loss functions, used for ‘training’ in the more sophisticated areas like the cortex and amygdala. Like: the human genome is barely 3 gigabytes (compare that to the TBs of parameters that encode frontier LLM weights). So how can it include all the information necessary to build highly intelligent learners? Well, if the key to sample-efficient learning resides in the loss function, even very complicated loss functions can still be expressed in a couple hundred lines of Python code.







Announcing Talkie: a new, open-weight historical LLM! We trained and finetuned a 13B model on a newly-curated dataset of only pre-1930 data. Try it below! with @AlecRad and @status_effects 🧵






Open Source Bavarian AI Foundation Model is coming soon!




xLSTM more expressive than transformer, Mamba: arxiv.org/abs/2603.03612 *nonlinear RNNs: sLSTM, LSTM *DPLR linear RNNs: mLSTM, RWKV, DeltaNet *Non PNC1: Mamba, Transformer “fundamental expressivity gaps between linear and nonlinear RNNs” World models require nonlinear RNNs.









