HannaCode Academy

31 posts

HannaCode Academy banner
HannaCode Academy

HannaCode Academy

@HannaCode_

Skills over degrees. At HannaCode, our expert instructors teach real-world skills to real people. We focus on hands-on learning and practical experience.

Worldwide🌍 Katılım Temmuz 2025
9 Takip Edilen11 Takipçiler
HannaCode Academy
HannaCode Academy@HannaCode_·
🚀 Welcome to HannaCode — Where Learning Meets Innovation We’re not just teaching code, we’re building future tech leaders. At HannaCode, we empower students with real-world skills in: • Web Development • Mobile App Development • Backend Engineering • Fullstack Development
English
1
0
0
1
HannaCode Academy
HannaCode Academy@HannaCode_·
Happy New Year! 🎆 ┏━━┓┏━━┓┏━━┓┏━━┓ ┗━┓┃┃┏┓┃┗━┓┃┃┏━┛ ┏━┛┃┃┃┃┃┏━┛┃┃┗━┓ ┃┏━┛┃┃┃┃┃┏━┛┃┏┓┃ ┃┗━┓┃┗┛┃┃┗━┓┃┗┛┃ ┗━━┛┗━━┛┗━━┛┗━━┛
English
0
0
1
12
HannaCode Academy
HannaCode Academy@HannaCode_·
HannaCode = A free online coding textbook with instant examples.
English
0
0
0
19
HannaCode Academy
HannaCode Academy@HannaCode_·
Perfect explanation🤝
Dhanian 🗯️@e_opore

Reinforcement Learning from Human Feedback (RLHF) in LLMs Step 1: Prompt Generation → Think of this as a teacher giving assignments → The pretrained LLM is asked different questions (prompts) Step 2: LLM Response Generation → The student (LLM) writes multiple answers → Example: Answer A → Answer B → Answer C → Some are good, some are weak, some are off-topic Step 3: Human Feedback & Ranking → Teachers (humans) grade and rank the answers → Best answer gets top marks, weaker ones get lower marks → Example: A > B > C Step 4: Reward Model Training → Instead of teachers grading forever, we train a teaching assistant (reward model) → This assistant learns the grading style of teachers → Now it can quickly score new answers without human effort every time Step 5: Policy Optimization (PPO Fine-Tuning) → The student (LLM) now practices with the teaching assistant’s feedback → Uses Proximal Policy Optimization (PPO) to improve step by step → Learns how to write answers closer to what teachers want Step 6: Improved LLM → The student (LLM) becomes more aligned, helpful, and safer → No longer just smart, but also well-behaved and human-friendly Flow from the Diagram Prompt (assignment) → LLM Responses (student answers) → Human Ranking (teacher grades) → Reward Model (teaching assistant) → PPO Fine-Tuning (guided practice) → Aligned LLM (improved student) 📖 For a complete deep dive into LLMs and AI foundations, check this ebook:codewithdhanian.gumroad.com/l/gbujqe

English
1
0
4
235