Boochi 🇳🇬
824 posts

Boochi 🇳🇬
@boochi_dot_dev
• ICU Nurse • Computer Scientist • NeuralMind • ML Engineer

What’s a better way to evaluate a base model? If you have an ensemble of pre-trained LLMs/LLM checkpoints, your perplexity score is the most reliable metric to determine which has the strongest performance on the language you wish to further improve its ability via post-training. Cursor is a software coding, and using perplexity rating across a set of programming language tasks seems like the most stable option


she is cute, should i text her?

Perplexity-based evals? In 2026?












There’s more to the RL vs. SFT debate than just evaluative vs. instructive feedback. In most LLM post-training, RL is specifically designed to improve performance “on-policy”. SFT just mimics a static distribution(i.e “off policy”), while RL optimizes the model’s own generated policy against a reward signal (typically while using a KL divergence penalty to ensure it doesn't drift too far from the base model). This greedy reward objective forces the model to explore and master complex strategies that simple supervised imitation can’t capture.



I love rage-baiting my professors in class. Today I asked my CV professor if he thinks, supervised learning is actually just a boring, special case of reinforcement learning where the environment is static and rewards are immediate. He thought I was going insane but we had a nice discussion about evaluative vs. instructive feedback. 😆


GPT-5.4 mini is available today in ChatGPT, Codex, and the API. Optimized for coding, computer use, multimodal understanding, and subagents. And it’s 2x faster than GPT-5 mini. openai.com/index/introduc…








