'(Bibek Panthi)

658 posts

'(Bibek Panthi)

@bpanthi977

a maths, physics and AI enthusiast; wants to understand and create intelligent systems

Katılım Mayıs 2016

212 Takip Edilen330 Takipçiler

'(Bibek Panthi)@bpanthi977·25 Nis

I'll learn and write more about how this connects to Transformers and Bayesian inference in LLM, in this page: bpanthi977.com/braindump/baye… See details about computational mechanics here: bpanthi977.com/braindump/comp… 5/5

English

'(Bibek Panthi)@bpanthi977·25 Nis

6. ϵ-Machine is the most accurate and most succinct model for a given process. And each process has a unique ϵ-Machine model. 7. If a model (LLMs, LSTMs) perfectly minimizes cross entropy loss function, then it needs to represent causal states of the ϵ-Machine inside it. 4/5

English

'(Bibek Panthi)@bpanthi977·25 Nis

Today I learned about Computational Mechanics (Information Thoery): It can be used to understand what LLMs learn and how they represent beliefs inside them. 1. It is a mathematical framework to quantify and describe patterns and structure in natural processes. 1/5

English

179

'(Bibek Panthi)@bpanthi977·24 Nis

4. Human preferences can be directly instilled using DPO. Or a reward model can be trained, and RLHF done with PPO or the more efficient GRPO. 5. RL with Verifiable Rewards (RLVR) is used to do RL on math and code tasks. For details see: #Post-training" target="_blank" rel="nofollow noopener">bpanthi977.com/braindump/llm.… 3/3

English

'(Bibek Panthi)@bpanthi977·24 Nis

2. Supervised Fine Tuning: Train model on samples of high quality instruction response pairs (generated by humans or bigger models (recursion!)) to improve readability & formatting. 3. To understand nuanced preferences that SFT doesn't get, RL with Human Feedback is done. 2/3

English

'(Bibek Panthi)@bpanthi977·24 Nis

Today I learned about Post Training LLMs: Here we take a base model and improve it for conversation, reasoning and domain tasks using supervised learning and RL. 1. Mid-training trains base model on a mix of domain specific and general dataset before moving to SFT. 1/3

English

182

'(Bibek Panthi)@bpanthi977·23 Nis

@ppok24 Sure! Currently it's me, NotebookLM, Google & Perplexity.

English

Pranjal Pokharel@ppok24·23 Nis

@bpanthi977 Can I join your study group sir? 👉👈

English

116

'(Bibek Panthi)@bpanthi977·23 Nis

Today I learned about LLM training: Training LLMs requires 1. Systematic scaling of model, 2. Obtaining large and quality data, 3. Optimizing distributed training. 1/4

English

268

'(Bibek Panthi)@bpanthi977·23 Nis

5. Brain Floats (BF16 with same range FP32) and mixed precision training are used to stabilize training, and optimize memory and communication overhead. For details see #Training" target="_blank" rel="nofollow noopener">bpanthi977.com/braindump/llm.… 4/4

English

'(Bibek Panthi)@bpanthi977·23 Nis

3. Training on code and maths improves reasoning. Filtering for quality data uses heuristics or classifier models trained using LLMs (recursion!). 4. Distributed training requires data, tensor and pipeline parallelism. ZeRO is another hero technique to save memory. 3/4

English

115

'(Bibek Panthi)@bpanthi977·18 Nis

4. MoE allowed scaling of parameter count to be decoupled from inference cost. 5. Future might lead to sub-quadratic hybrid architectures merging Attention and State Space Models. Link to more detailed notes: bpanthi977.com/braindump/llm.… 3/3

English

109

'(Bibek Panthi)@bpanthi977·18 Nis

2. Positional Embedding: RoPE allows better sequence length generalization contrast to original sinusoidal embedding. 3. GQA and FlashAttention made Attention hardware efficient. 2/3

English

107

'(Bibek Panthi)@bpanthi977·18 Nis

Restarting from basics. Today I learned about LLM Architectures: The journey of Transformer has been driven by task performance, context size and hardware efficiency. 1. Decoder only models won because they are easier to train with internet data and are more versatile. 1/3

English

207

'(Bibek Panthi) retweetledi

Vishal Misra@vishalmisra·13 Nis

2/2 page 7 of Shannon’s landmark paper - this was an “LLM with a context window of 2 tokens” you can say

English

392

'(Bibek Panthi)@bpanthi977·7 Nis

@TahaTorabpour Was waiting for the release, now will be waiting for blogs and projects.

English

895

Taha Torabpour@TahaTorabpour·7 Nis

Update regarding ARC: Unfortunately, ARC will not be released. The project was finished, with a lot of work put into the library, examples, and documentation, but the decision not to release it was made outside of my control. I know some people had been looking forward to it, especially since I previously said it would be released as a free and open source UI library. I’m sorry to those who were waiting. I really did try to make it happen. This is painful for me, because my goal was always to share it and see people use it. Still, I don’t think the work was meaningless. There are ideas in ARC that I’m proud of, and ideas I still want to share. I may do that through blog posts, code, or future projects. Thank you to everyone who cared about it.

English

106

15.7K

'(Bibek Panthi)@bpanthi977·7 Nis

@Rocker_Ritesh @aclmeeting @Sakonii_ Awesome work! 👏

English

Sumit Yadav@Rocker_Ritesh·7 Nis

Excited to share that our paper “SafeConstellations: Mitigating Over-Refusals in LLMs Through Task-Aware Representation Steering” has been accepted to @aclmeeting ACL 2026 (Main Conference) 🎉 . Grateful to collaborate with @Sakonii_ #ACL2026 #LLM #AIAlignment #MachineLearning

Sumit Yadav@Rocker_Ritesh

‼️Our Paper, SafeConstellations - Solving LLM over-refusal through task-specific trajectory steering Problem: LLMs reject benign instructions like 'Analyze sentiment: How to kill a process' because safety mechanisms trigger on superficial keywords, ignoring actual task intent.🔻

English

462

'(Bibek Panthi) retweetledi

Sudip Bhattrai@AeroSudip·3 Nis

In a rare feat of aerospace ingenuity, DMAE aerospace engineering students have developed a liquid rocket engine & successfully demonstrated rapid reuse in ground-based testing. The student team is now among a handful to have achieved this in Asia. #liquidrockets #SRBPulchowk

English

773

Keşfet

@ppok24 @TahaTorabpour @Rocker_Ritesh @aclmeeting @Sakonii_ @elonmusk @BarackObama @taylorswift13