legendarylibrary

16.3K posts

legendarylibrary

@LegendaryLibr

crypto/ ai Art/ music

Offdachain เข้าร่วม Ekim 2021

2.1K กำลังติดตาม5.1K ผู้ติดตาม

legendarylibrary@LegendaryLibr·7h

@TheAhmadOsman Continuous training. Collects data and trains the small base model. Continuous fine-tuning beyond that

English

legendarylibrary@LegendaryLibr·7h

@TheAhmadOsman Finally getting some decent outputs on small model trainings. Huggingface soon

English

205

Ahmad@TheAhmadOsman·11h

BREAKING Elon Musk endorsed my Top 26 Essential Papers for Mastering LLMs and Transformers Implement those and you’ve captured ~90% of the alpha behind modern LLMs. Everything else is garnish. This list bridges the Transformer foundations with the reasoning, MoE, and agentic shift Recommended Reading Order 1. Attention Is All You Need (Vaswani et al., 2017) > The original Transformer paper. Covers self-attention, > multi-head attention, and the encoder-decoder structure > (even though most modern LLMs are decoder-only.) 2. The Illustrated Transformer (Jay Alammar, 2018) > Great intuition builder for understanding > attention and tensor flow before diving into implementations 3. BERT: Pre-training of Deep Bidirectional Transformers (Devlin et al., 2018) > Encoder-side fundamentals, masked language modeling, > and representation learning that still shape modern architectures 4. Language Models are Few-Shot Learners (GPT-3) (Brown et al., 2020) > Established in-context learning as a real > capability and shifted how prompting is understood 5. Scaling Laws for Neural Language Models (Kaplan et al., 2020) > First clean empirical scaling framework for parameters, data, and compute > Read alongside Chinchilla to understand why most models were undertrained 6. Training Compute-Optimal Large Language Models (Chinchilla) (Hoffmann et al., 2022) > Demonstrated that token count matters more than > parameter count for a fixed compute budget 7. LLaMA: Open and Efficient Foundation Language Models (Touvron et al., 2023) > The paper that triggered the open-weight era > Introduced architectural defaults like RMSNorm, SwiGLU > and RoPE as standard practice 8. RoFormer: Rotary Position Embedding (Su et al., 2021) > Positional encoding that became the modern default for long-context LLMs 9. FlashAttention (Dao et al., 2022) > Memory-efficient attention that enabled long context windows > and high-throughput inference by optimizing GPU memory access. 10. Retrieval-Augmented Generation (RAG) (Lewis et al., 2020) > Combines parametric models with external knowledge sources > Foundational for grounded and enterprise systems 11. Training Language Models to Follow Instructions with Human Feedback (InstructGPT) (Ouyang et al., 2022) > The modern post-training and alignment blueprint > that instruction-tuned models follow 12. Direct Preference Optimization (DPO) (Rafailov et al., 2023) > A simpler and more stable alternative to PPO-based RLHF > Preference alignment via the loss function 13. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (Wei et al., 2022) > Demonstrated that reasoning can be elicited through prompting > alone and laid the groundwork for later reasoning-focused training 14. ReAct: Reasoning and Acting (Yao et al., 2022 / ICLR 2023) > The foundation of agentic systems > Combines reasoning traces with tool use and environment interaction 15. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning (Guo et al., 2025) > The R1 paper. Proved that large-scale reinforcement learning without > supervised data can induce self-verification and structured reasoning behavior 16. Qwen3 Technical Report (Yang et al., 2025) > A modern architecture lightweight overview > Introduced unified MoE with Thinking Mode and Non-Thinking > Mode to dynamically trade off cost and reasoning depth 17. Outrageously Large Neural Networks: Sparsely-Gated Mixture of Experts (Shazeer et al., 2017) > The modern MoE ignition point > Conditional computation at scale 18. Switch Transformers (Fedus et al., 2021) > Simplified MoE routing using single-expert activation > Key to stabilizing trillion-parameter training 19. Mixtral of Experts (Mistral AI, 2024) > Open-weight MoE that proved sparse models can match dense quality > while running at small-model inference cost 20. Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints (Komatsuzaki et al., 2022 / ICLR 2023) > Practical technique for converting dense checkpoints into MoE models > Critical for compute reuse and iterative scaling 21. The Platonic Representation Hypothesis (Huh et al., 2024) > Evidence that scaled models converge toward shared > internal representations across modalities 22. Textbooks Are All You Need (Gunasekar et al., 2023) > Demonstrated that high-quality synthetic data allows > small models to outperform much larger ones 23. Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet (Templeton et al., 2024) > The biggest leap in mechanistic interpretability > Decomposes neural networks into millions of interpretable features 24. PaLM: Scaling Language Modeling with Pathways (Chowdhery et al., 2022) > A masterclass in large-scale training > orchestration across thousands of accelerators 25. GLaM: Generalist Language Model (Du et al., 2022) > Validated MoE scaling economics with massive > total parameters but small active parameter counts 26. The Smol Training Playbook (Hugging Face, 2025) > Practical end-to-end handbook for efficiently training language models Bonus Material > T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (Raffel et al., 2019) > Toolformer (Schick et al., 2023) > GShard (Lepikhin et al., 2020) > Adaptive Mixtures of Local Experts (Jacobs et al., 1991) > Hierarchical Mixtures of Experts (Jordan and Jacobs, 1994) If you deeply understand these fundamentals; Transformer core, scaling laws, FlashAttention, instruction tuning, R1-style reasoning, and MoE upcycling, you already understand LLMs better than most Time to lock-in, good luck!

English

832

33.1K

legendarylibrary รีทวีตแล้ว

0xSero@0xSero·17h

Qwen3.5-35B compressed 20% with 1%~ performance drop on average. Now you can fit this (4bits) with full context on 24GB of VRAM 700$~ or 1x 3090 huggingface.co/0xSero/Qwen-3.…

English

126

1.9K

111.3K

legendarylibrary รีทวีตแล้ว

Luca Maxim@ChildOfKhan_·17h

remember your purpose

English

191

1.8K

16.8K

712.8K

legendarylibrary รีทวีตแล้ว

Sam ꕤ@AI_Video_Dumps·21h

Driifting in Cosmos

Eesti

101

1.7K

6.7K

131.3K

legendarylibrary รีทวีตแล้ว

bone@boneGPT·1d

just put $1000 on this you do not fuck with FWC

MyFWC@MyFWC

The FWC is aware of a video depicting individuals in the Everglades on an airboat who appear to be discharging firearms at an alligator. FWC officers are looking into the incident and will provide additional information when available. To report wildlife violations, call the Wildlife Alert Hotline at 888-404-3922.

English

3.5K

364.8K

legendarylibrary@LegendaryLibr·16h

@FrankHassleYT @Breaking911 @TMZ You can only use a bang stick and need tags

English

1.5K

Frank Hassle@FrankHassleYT·16h

@Breaking911 @TMZ My understanding is that it was already dead, why is that an issue? Did they kill the alligator?

English

301

27.1K

Breaking911@Breaking911·19h

Kick streamer “Clavicular”, was arrested after mag dumping a gator in Florida. 📸: @TMZ

English

1.2K

532

12.9K

3.8M

legendarylibrary รีทวีตแล้ว

Darkfarms㊙️@Darkfarms1·21h

gm, frens

English

117

238

3.4K

legendarylibrary รีทวีตแล้ว

don't Buy@dontbuy_·2d

ZXX

2.8K

17K

403.1K

legendarylibrary รีทวีตแล้ว

dax@thdxr·1d

man the responses to the new claude max limits are crazy everyones expectations are so out of whack it's kind of embarrassing to get this mad, i'd just be like damn ok i'll cancel were you guys depending on this shit to keep your grandma alive i don't get it

English

218

2.7K

161.3K

legendarylibrary รีทวีตแล้ว

Andrej Karpathy@karpathy·1d

When I built menugen ~1 year ago, I observed that the hardest part by far was not the code itself, it was the plethora of services you have to assemble like IKEA furniture to make it real, the DevOps: services, payments, auth, database, security, domain names, etc... I am really looking forward to a day where I could simply tell my agent: "build menugen" (referencing the post) and it would just work. The whole thing up to the deployed web page. The agent would have to browse a number of services, read the docs, get all the api keys, make everything work, debug it in dev, and deploy to prod. This is the actually hard part, not the code itself. Or rather, the better way to think about it is that the entire DevOps lifecycle has to become code, in addition to the necessary sensors/actuators of the CLIs/APIs with agent-native ergonomics. And there should be no need to visit web pages, click buttons, or anything like that for the human. It's easy to state, it's now just barely technically possible and expected to work maybe, but it definitely requires from-scratch re-design, work and thought. Very exciting direction!

Patrick Collison@patrickc

When @karpathy built MenuGen (karpathy.bearblog.dev/vibe-coding-me…), he said: "Vibe coding menugen was exhilarating and fun escapade as a local demo, but a bit of a painful slog as a deployed, real app. Building a modern app is a bit like assembling IKEA future. There are all these services, docs, API keys, configurations, dev/prod deployments, team and security features, rate limits, pricing tiers." We've all run into this issue when building with agents: you have to scurry off to establish accounts, clicking things in the browser as though it's the antediluvian days of 2023, in order to unblock its superintelligent progress. So we decided to build Stripe Projects to help agents instantly provision services from the CLI. For example, simply run: $ stripe projects add posthog/analytics And it'll create a PostHog account, get an API key, and (as needed) set up billing. Projects is launching today as a developer preview. You can register for access (we'll make it available to everyone soon) at projects.dev. We're also rolling out support for many new providers over the coming weeks. (Get in touch if you'd like to make your service available.) projects.dev

English

527

490

5.8K

legendarylibrary รีทวีตแล้ว

goo.vision@goo_vision·1d

Legendary workstation

English

915

5.6K

97.8K

legendarylibrary รีทวีตแล้ว

Ostris@ostrisai·1d

I trained this @ltx_model LTX 2.3 LoRA of George Costanza at home on my 5090 in about a day with AI Toolkit. I generated this 30 second video with @ComfyUI on my 5090 in 6 minutes. Open source is, always has been, and always will be, the future of generative AI. (SOUND ON)

English

264

578

5.2K

360.4K

legendarylibrary รีทวีตแล้ว

ComfyUI@ComfyUI·2d

Upgrading your RAM is now unnecessary. Introducing our new ComfyUI Dynamic VRAM optimization. Running local models is now possible on even the most memory constrained hardware. Read more here: blog.comfy.org/p/dynamic-vram…

English

311

2.9K

420K

legendarylibrary รีทวีตแล้ว

batzdu@batzdu·1d

gm 🐸

246

3.5K

legendarylibrary รีทวีตแล้ว

Darkfarms㊙️@Darkfarms1·2d

GM, frens

English

131

259

legendarylibrary รีทวีตแล้ว

“paula”@paularambles·2d

“this is a significant refactor” just put the tokens in the bag lil bro

English

322

7.6K

214.7K

legendarylibrary@LegendaryLibr·2d

arxiv.org/abs/2603.00179

ZXX

legendarylibrary@LegendaryLibr·2d

Like how you may not use apple id login, but it's still there

English

legendarylibrary@LegendaryLibr·2d

An attestation does not reveal privacy, it cryptographically proves it. I would never get my iris scanned, but those that did could use an opt in attestion to prove they are human. Had ai suggest to add this to a previous repo. Realized my implementation (anti sybil for airdrops) wasn't really needed and took it down. But attestations for face id login and other biometrics will probably be used to combat bots. With proper implementation, attestations are a privacy preserving primitive. It could eliminate personal data storage on centralized servers with proper implemention. Verify and discard attestation after. Not something I'm working on but lots of misconception at the time and even still

English

100

ค้นพบ

@TheAhmadOsman @FrankHassleYT @Breaking911 @TMZ @ltx_model @ComfyUI @elonmusk @BarackObama