'(Bibek Panthi)

658 posts

'(Bibek Panthi)

'(Bibek Panthi)

@bpanthi977

a maths, physics and AI enthusiast; wants to understand and create intelligent systems

Katılım Mayıs 2016
212 Takip Edilen330 Takipçiler
'(Bibek Panthi)
'(Bibek Panthi)@bpanthi977·
6. ϵ-Machine is the most accurate and most succinct model for a given process. And each process has a unique ϵ-Machine model. 7. If a model (LLMs, LSTMs) perfectly minimizes cross entropy loss function, then it needs to represent causal states of the ϵ-Machine inside it. 4/5
English
1
0
1
52
'(Bibek Panthi)
'(Bibek Panthi)@bpanthi977·
Today I learned about Computational Mechanics (Information Thoery): It can be used to understand what LLMs learn and how they represent beliefs inside them. 1. It is a mathematical framework to quantify and describe patterns and structure in natural processes. 1/5
English
1
0
7
179
'(Bibek Panthi)
'(Bibek Panthi)@bpanthi977·
4. Human preferences can be directly instilled using DPO. Or a reward model can be trained, and RLHF done with PPO or the more efficient GRPO. 5. RL with Verifiable Rewards (RLVR) is used to do RL on math and code tasks. For details see: #Post-training" target="_blank" rel="nofollow noopener">bpanthi977.com/braindump/llm.… 3/3
English
0
0
1
94
'(Bibek Panthi)
'(Bibek Panthi)@bpanthi977·
2. Supervised Fine Tuning: Train model on samples of high quality instruction response pairs (generated by humans or bigger models (recursion!)) to improve readability & formatting. 3. To understand nuanced preferences that SFT doesn't get, RL with Human Feedback is done. 2/3
English
1
0
1
96
'(Bibek Panthi)
'(Bibek Panthi)@bpanthi977·
Today I learned about Post Training LLMs: Here we take a base model and improve it for conversation, reasoning and domain tasks using supervised learning and RL. 1. Mid-training trains base model on a mix of domain specific and general dataset before moving to SFT. 1/3
English
2
0
4
182
'(Bibek Panthi)
'(Bibek Panthi)@bpanthi977·
@ppok24 Sure! Currently it's me, NotebookLM, Google & Perplexity.
English
0
0
1
57
'(Bibek Panthi)
'(Bibek Panthi)@bpanthi977·
Today I learned about LLM training: Training LLMs requires 1. Systematic scaling of model, 2. Obtaining large and quality data, 3. Optimizing distributed training. 1/4
English
2
0
7
268
'(Bibek Panthi)
'(Bibek Panthi)@bpanthi977·
5. Brain Floats (BF16 with same range FP32) and mixed precision training are used to stabilize training, and optimize memory and communication overhead. For details see #Training" target="_blank" rel="nofollow noopener">bpanthi977.com/braindump/llm.… 4/4
English
0
0
2
88
'(Bibek Panthi)
'(Bibek Panthi)@bpanthi977·
3. Training on code and maths improves reasoning. Filtering for quality data uses heuristics or classifier models trained using LLMs (recursion!). 4. Distributed training requires data, tensor and pipeline parallelism. ZeRO is another hero technique to save memory. 3/4
English
1
0
2
115
'(Bibek Panthi)
'(Bibek Panthi)@bpanthi977·
4. MoE allowed scaling of parameter count to be decoupled from inference cost. 5. Future might lead to sub-quadratic hybrid architectures merging Attention and State Space Models. Link to more detailed notes: bpanthi977.com/braindump/llm.… 3/3
English
0
0
4
109
'(Bibek Panthi)
'(Bibek Panthi)@bpanthi977·
2. Positional Embedding: RoPE allows better sequence length generalization contrast to original sinusoidal embedding. 3. GQA and FlashAttention made Attention hardware efficient. 2/3
English
1
0
2
107
'(Bibek Panthi)
'(Bibek Panthi)@bpanthi977·
Restarting from basics. Today I learned about LLM Architectures: The journey of Transformer has been driven by task performance, context size and hardware efficiency. 1. Decoder only models won because they are easier to train with internet data and are more versatile. 1/3
English
1
0
9
207
'(Bibek Panthi) retweetledi
Vishal Misra
Vishal Misra@vishalmisra·
2/2 page 7 of Shannon’s landmark paper - this was an “LLM with a context window of 2 tokens” you can say
Vishal Misra tweet media
English
0
1
0
392
Taha Torabpour
Taha Torabpour@TahaTorabpour·
Update regarding ARC: Unfortunately, ARC will not be released. The project was finished, with a lot of work put into the library, examples, and documentation, but the decision not to release it was made outside of my control. I know some people had been looking forward to it, especially since I previously said it would be released as a free and open source UI library. I’m sorry to those who were waiting. I really did try to make it happen. This is painful for me, because my goal was always to share it and see people use it. Still, I don’t think the work was meaningless. There are ideas in ARC that I’m proud of, and ideas I still want to share. I may do that through blog posts, code, or future projects. Thank you to everyone who cared about it.
English
16
7
106
15.7K
Sumit Yadav
Sumit Yadav@Rocker_Ritesh·
Excited to share that our paper “SafeConstellations: Mitigating Over-Refusals in LLMs Through Task-Aware Representation Steering” has been accepted to @aclmeeting ACL 2026 (Main Conference) 🎉 . Grateful to collaborate with @Sakonii_ #ACL2026 #LLM #AIAlignment #MachineLearning
Sumit Yadav@Rocker_Ritesh

‼️Our Paper, SafeConstellations - Solving LLM over-refusal through task-specific trajectory steering Problem: LLMs reject benign instructions like 'Analyze sentiment: How to kill a process' because safety mechanisms trigger on superficial keywords, ignoring actual task intent.🔻

English
2
0
15
462
'(Bibek Panthi) retweetledi
Sudip Bhattrai
Sudip Bhattrai@AeroSudip·
In a rare feat of aerospace ingenuity, DMAE aerospace engineering students have developed a liquid rocket engine & successfully demonstrated rapid reuse in ground-based testing. The student team is now among a handful to have achieved this in Asia. #liquidrockets #SRBPulchowk
English
3
5
24
773