Lezz Not

182 posts

Lezz Not

Lezz Not

@lezz_not

Katılım Eylül 2020
553 Takip Edilen22 Takipçiler
Lezz Not retweetledi
Ahmad
Ahmad@TheAhmadOsman·
BREAKING Elon Musk endorsed my Top 26 Essential Papers for Mastering LLMs and Transformers Implement those and you’ve captured ~90% of the alpha behind modern LLMs. Everything else is garnish. This list bridges the Transformer foundations with the reasoning, MoE, and agentic shift Recommended Reading Order 1. Attention Is All You Need (Vaswani et al., 2017) > The original Transformer paper. Covers self-attention, > multi-head attention, and the encoder-decoder structure > (even though most modern LLMs are decoder-only.) 2. The Illustrated Transformer (Jay Alammar, 2018) > Great intuition builder for understanding > attention and tensor flow before diving into implementations 3. BERT: Pre-training of Deep Bidirectional Transformers (Devlin et al., 2018) > Encoder-side fundamentals, masked language modeling, > and representation learning that still shape modern architectures 4. Language Models are Few-Shot Learners (GPT-3) (Brown et al., 2020) > Established in-context learning as a real > capability and shifted how prompting is understood 5. Scaling Laws for Neural Language Models (Kaplan et al., 2020) > First clean empirical scaling framework for parameters, data, and compute > Read alongside Chinchilla to understand why most models were undertrained 6. Training Compute-Optimal Large Language Models (Chinchilla) (Hoffmann et al., 2022) > Demonstrated that token count matters more than > parameter count for a fixed compute budget 7. LLaMA: Open and Efficient Foundation Language Models (Touvron et al., 2023) > The paper that triggered the open-weight era > Introduced architectural defaults like RMSNorm, SwiGLU > and RoPE as standard practice 8. RoFormer: Rotary Position Embedding (Su et al., 2021) > Positional encoding that became the modern default for long-context LLMs 9. FlashAttention (Dao et al., 2022) > Memory-efficient attention that enabled long context windows > and high-throughput inference by optimizing GPU memory access. 10. Retrieval-Augmented Generation (RAG) (Lewis et al., 2020) > Combines parametric models with external knowledge sources > Foundational for grounded and enterprise systems 11. Training Language Models to Follow Instructions with Human Feedback (InstructGPT) (Ouyang et al., 2022) > The modern post-training and alignment blueprint > that instruction-tuned models follow 12. Direct Preference Optimization (DPO) (Rafailov et al., 2023) > A simpler and more stable alternative to PPO-based RLHF > Preference alignment via the loss function 13. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (Wei et al., 2022) > Demonstrated that reasoning can be elicited through prompting > alone and laid the groundwork for later reasoning-focused training 14. ReAct: Reasoning and Acting (Yao et al., 2022 / ICLR 2023) > The foundation of agentic systems > Combines reasoning traces with tool use and environment interaction 15. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning (Guo et al., 2025) > The R1 paper. Proved that large-scale reinforcement learning without > supervised data can induce self-verification and structured reasoning behavior 16. Qwen3 Technical Report (Yang et al., 2025) > A modern architecture lightweight overview > Introduced unified MoE with Thinking Mode and Non-Thinking > Mode to dynamically trade off cost and reasoning depth 17. Outrageously Large Neural Networks: Sparsely-Gated Mixture of Experts (Shazeer et al., 2017) > The modern MoE ignition point > Conditional computation at scale 18. Switch Transformers (Fedus et al., 2021) > Simplified MoE routing using single-expert activation > Key to stabilizing trillion-parameter training 19. Mixtral of Experts (Mistral AI, 2024) > Open-weight MoE that proved sparse models can match dense quality > while running at small-model inference cost 20. Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints (Komatsuzaki et al., 2022 / ICLR 2023) > Practical technique for converting dense checkpoints into MoE models > Critical for compute reuse and iterative scaling 21. The Platonic Representation Hypothesis (Huh et al., 2024) > Evidence that scaled models converge toward shared > internal representations across modalities 22. Textbooks Are All You Need (Gunasekar et al., 2023) > Demonstrated that high-quality synthetic data allows > small models to outperform much larger ones 23. Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet (Templeton et al., 2024) > The biggest leap in mechanistic interpretability > Decomposes neural networks into millions of interpretable features 24. PaLM: Scaling Language Modeling with Pathways (Chowdhery et al., 2022) > A masterclass in large-scale training > orchestration across thousands of accelerators 25. GLaM: Generalist Language Model (Du et al., 2022) > Validated MoE scaling economics with massive > total parameters but small active parameter counts 26. The Smol Training Playbook (Hugging Face, 2025) > Practical end-to-end handbook for efficiently training language models Bonus Material > T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (Raffel et al., 2019) > Toolformer (Schick et al., 2023) > GShard (Lepikhin et al., 2020) > Adaptive Mixtures of Local Experts (Jacobs et al., 1991) > Hierarchical Mixtures of Experts (Jordan and Jacobs, 1994) If you deeply understand these fundamentals; Transformer core, scaling laws, FlashAttention, instruction tuning, R1-style reasoning, and MoE upcycling, you already understand LLMs better than most Time to lock-in, good luck!
Ahmad tweet media
English
31
132
1.2K
52.6K
Lezz Not
Lezz Not@lezz_not·
Starting mind body and soul regime Mix of learning, workout, changing identity, intention LFG!!
English
0
0
0
1
Lezz Not retweetledi
Akintola Steve
Akintola Steve@Akintola_steve·
I’m looking for people who are ready to lock in with zero distractions for the next 3 months. Let’s commit to this roadmap and actually follow through. Forget the AI noise for now. Focus on becoming solid in backend engineering and system design. It might sound unrealistic, but you’ll be surprised at how much you grow by the end of June / early July. Read below 👇🏿
English
376
273
2.4K
139.3K
Lezz Not
Lezz Not@lezz_not·
30/30 - 5:30 am - walk - GOD - LLD - DSA feels good restarting new 30
English
0
0
0
3
Lezz Not
Lezz Not@lezz_not·
29/30 - LLD - 5:30 am - GOD - walk - DSA
English
1
0
0
11
Lezz Not
Lezz Not@lezz_not·
Feeling pretty ugh Gonna start a 30 day transformation challenge
English
1
0
0
59
Akash Singh | Hiring top tier SDEs
A recruiter friend is hiring for a SDE III – Backend role in Noida (5 days WFO) Comp is ~₹1 Cr (base + variable) Looking for someone with: 7+ years experience Strong system design Product company background Please reach out only if you strictly match the requirement. Happy to refer!
English
19
7
125
18K
Akash Singh | Hiring top tier SDEs
🚀 We’re hiring at DoorDash — India (Hyderabad & Pune) We’re growing our Engineering teams and hiring across Data, Backend, Frontend and iOS. If you’re passionate about building reliable, large-scale systems and making an impact, we’d love to hear from you. Open roles: 🔥 Engineering Manager, Data — Hyderabad — careersatdoordash.com/jobs/engineeri… 🔥 Software Engineer, Data Engineering — Hyderabad — careersatdoordash.com/jobs/staff-sof… 🔥 Senior Software Engineer, Data Engineering — Hyderabad — careersatdoordash.com/jobs/senior-so… 🔥 Software Engineer, Data Engineer II — Hyderabad — careersatdoordash.com/jobs/software-… 🔥 Software Engineer, Frontend — Pune — careersatdoordash.com/jobs/software-… 🔥 Senior Software Engineer, Backend — Pune — careersatdoordash.com/jobs/senior-so… 🔥 Software Engineer, Backend — Pune — careersatdoordash.com/jobs/software-… 🔥 Software Engineer, iOS — Pune — careersatdoordash.com/jobs/software-… Apply via DoorDash Careers (links above) or DM me for a direct intro. Please like and retweet to help us reach great candidates! Or tag the best fits. #Hiring #Engineering #DataEngineering #Backend #Frontend #iOS
Akash Singh | Hiring top tier SDEs tweet media
English
62
47
854
165.5K
Lezz Not retweetledi
Garry Tan
Garry Tan@garrytan·
I guess the amazing thing that my haters don't understand is you have no idea how much I eat your hate for breakfast. I am uniquely a person who is driven by all the energy you give me in particular.
English
267
21
1K
102.5K
Lezz Not retweetledi
Claude
Claude@claudeai·
You can now enable Claude to use your computer to complete tasks. It opens your apps, navigates your browser, fills in spreadsheets—anything you'd do sitting at your desk. Research preview in Claude Cowork and Claude Code, macOS only.
English
4.9K
14.6K
139.5K
75.1M
Lezz Not retweetledi
Utkarsh Sharma
Utkarsh Sharma@techxutkarsh·
A senior Google engineer just dropped a 421-page doc called Agentic Design Patterns. Every chapter is code-backed and covers the frontier of AI systems: → Prompt chaining, routing, memory → MCP & multi-agent coordination → Guardrails, reasoning, planning This isn’t a blog post. It’s a curriculum. And it’s free.
Utkarsh Sharma tweet media
English
1.6K
818
4.7K
628.3K
Lezz Not retweetledi
Lydia Hallie ✨
Lydia Hallie ✨@lydiahallie·
Claude Code now supports agent teams (in research preview) Instead of a single agent working through a task sequentially, a lead agent can delegate to multiple teammates that work in parallel to research, debug, and build while coordinating with each other. Try it out today by enabling agent teams in your settings.json!
English
220
465
5K
1.4M
Lezz Not retweetledi
Parimal
Parimal@Fintech03·
A few fun facts for a slow Saturday: 1. Dijkstra famously hated the term Software Engineering. 2. He believed programming should be a branch of pure mathematics. 3. He refused to use a computer for most of his life, preferring to write his thoughts with a fountain pen (his famous EWD manuscripts). 4. He once said, "Testing shows the presence of bugs, but never their absence."
Robbert Leusink@robbertleusink

Every navigation app on earth finds its route using an algorithm invented by a Dutch computer scientist in 1956 Edsger Dijkstra solved the shortest-path problem in twenty minutes at a café in Amsterdam, without paper; he did it in his head Google Maps, Uber, and every GPS system alive run on a Dutch mathematician's coffee break

English
33
305
4.3K
322.5K
Lezz Not retweetledi
Alex Hormozi
Alex Hormozi@AlexHormozi·
Losers becomes losers by being afraid of losing.
English
552
1.9K
13.4K
454K
Lezz Not retweetledi
pulkit mittal
pulkit mittal@pulkit_mittal_·
Hard truth no one tells you is: You have to go through a grinding phase at least once in your life to succeed. And the trick is, the earlier you face it, the easier your life becomes. If you study hard and build a strong base in school, you sweep through difficult entrance exams later -> better opportunities. Otherwise, you’ve to grind during your JEE years and you get good opportunities later without much hustle. Miss that? you’ve to work hard for 4 years in college and you get to start your career with a good job. Miss that too? you’ve to juggle between your job and studies for better switch, much harder. The thing is, life keeps giving you chances. But the more you delay, more you lose the compounding effect and difficult it gets. Choose your hustle. Act fast. Act now.
English
10
76
686
17.2K
Lezz Not retweetledi
Katyayani Shukla
Katyayani Shukla@aibytekat·
🚨 BREAKING: Claude can now prep you for FAANG interviews like a $1,000/hour executive career coach. For free. Here are 18 prompts that get you past the final round within 14 days:
English
5
160
1.3K
184.6K
Lezz Not retweetledi
Carl Jung Archive
Carl Jung Archive@QuoteJung·
Carl Jung was not playing around when he wrote: “No matter how isolated you are and how lonely you feel, if you do your work truly and conscientiously, unknown allies will come and seek you.”
English
142
5.5K
38.5K
624.1K