🤯

32.5K posts

🤯 banner
🤯

🤯

@forlayo

... 🙃...

Katılım Ekim 2011
563 Takip Edilen371 Takipçiler
🤯
🤯@forlayo·
@ErrorGramatica Sale el huevo sin la cáscara, eso se puede hacer falso. Pones 4 yemas y bastante clara en un molde con forma de huevo y lo cueces; de hecho un huevo cocido no tiene la forma exacta de un huevo con cáscara sino que en un lado lleva un hueco con aire. Ese huevo es falso
Español
0
0
0
53
Errores gramaticales
Errores gramaticales@ErrorGramatica·
La probabilidad de encontrar un huevo con tres yemas es de una asombrosa 1 entre 25 millones.
Español
70
113
4.9K
2M
🤯
🤯@forlayo·
@rdd147 That’s it. Period
English
0
0
2
944
🤯 retweetledi
Thereallo
Thereallo@Thereallo1026·
The White House App has OneSignal's full GPS pipeline compiled in, polling your location every 4.5 minutes, syncing your exact coordinates to a third party server.
Thereallo tweet media
The White House@WhiteHouse

🇺🇸 🚀 LAUNCHED: THE WHITE HOUSE APP Live streams. Real-time updates. Straight from the source, no filter. The conversation everyone’s watching is now at your fingertips. Download here ⬇️ 📲 App Store: apps.apple.com/us/app/the-whi… 📲 Google Play Store: play.google.com/store/apps/det…

English
242
3.9K
20.7K
1.7M
🤯 retweetledi
Ahmad
Ahmad@TheAhmadOsman·
a reminder that, in closed source AI from companies like OpenAI & Anthropic you have zero control over how the models behave, and they can > quantize it > distill it > hot-swap to a cheaper/weaker checkpoint > make the model manipulative > fine-tune it in ways that break safety or depth > drop its IQ > run experiments on you and/or your data > throttle output speed or raise prices > sunset the entire model/version > block your request for any made-up bs reason they have all the knobs & you're at their mercy you won't even get a changelog opensource FTW Buy a GPU
Ahmad tweet media
Thariq@trq212

To manage growing demand for Claude we're adjusting our 5 hour session limits for free/Pro/Max subs during peak hours. Your weekly limits remain unchanged. During weekdays between 5am–11am PT / 1pm–7pm GMT, you'll move through your 5-hour session limits faster than before.

English
22
20
206
10.5K
🤯
🤯@forlayo·
@JulioJGamez Pero tal cual, es todo rollo la película de idiocracia a velocidad rápida
Español
0
0
1
7
Ahmad
Ahmad@TheAhmadOsman·
BREAKING Elon Musk endorsed my Top 26 Essential Papers for Mastering LLMs and Transformers Implement those and you’ve captured ~90% of the alpha behind modern LLMs. Everything else is garnish. This list bridges the Transformer foundations with the reasoning, MoE, and agentic shift Recommended Reading Order 1. Attention Is All You Need (Vaswani et al., 2017) > The original Transformer paper. Covers self-attention, > multi-head attention, and the encoder-decoder structure > (even though most modern LLMs are decoder-only.) 2. The Illustrated Transformer (Jay Alammar, 2018) > Great intuition builder for understanding > attention and tensor flow before diving into implementations 3. BERT: Pre-training of Deep Bidirectional Transformers (Devlin et al., 2018) > Encoder-side fundamentals, masked language modeling, > and representation learning that still shape modern architectures 4. Language Models are Few-Shot Learners (GPT-3) (Brown et al., 2020) > Established in-context learning as a real > capability and shifted how prompting is understood 5. Scaling Laws for Neural Language Models (Kaplan et al., 2020) > First clean empirical scaling framework for parameters, data, and compute > Read alongside Chinchilla to understand why most models were undertrained 6. Training Compute-Optimal Large Language Models (Chinchilla) (Hoffmann et al., 2022) > Demonstrated that token count matters more than > parameter count for a fixed compute budget 7. LLaMA: Open and Efficient Foundation Language Models (Touvron et al., 2023) > The paper that triggered the open-weight era > Introduced architectural defaults like RMSNorm, SwiGLU > and RoPE as standard practice 8. RoFormer: Rotary Position Embedding (Su et al., 2021) > Positional encoding that became the modern default for long-context LLMs 9. FlashAttention (Dao et al., 2022) > Memory-efficient attention that enabled long context windows > and high-throughput inference by optimizing GPU memory access. 10. Retrieval-Augmented Generation (RAG) (Lewis et al., 2020) > Combines parametric models with external knowledge sources > Foundational for grounded and enterprise systems 11. Training Language Models to Follow Instructions with Human Feedback (InstructGPT) (Ouyang et al., 2022) > The modern post-training and alignment blueprint > that instruction-tuned models follow 12. Direct Preference Optimization (DPO) (Rafailov et al., 2023) > A simpler and more stable alternative to PPO-based RLHF > Preference alignment via the loss function 13. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (Wei et al., 2022) > Demonstrated that reasoning can be elicited through prompting > alone and laid the groundwork for later reasoning-focused training 14. ReAct: Reasoning and Acting (Yao et al., 2022 / ICLR 2023) > The foundation of agentic systems > Combines reasoning traces with tool use and environment interaction 15. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning (Guo et al., 2025) > The R1 paper. Proved that large-scale reinforcement learning without > supervised data can induce self-verification and structured reasoning behavior 16. Qwen3 Technical Report (Yang et al., 2025) > A modern architecture lightweight overview > Introduced unified MoE with Thinking Mode and Non-Thinking > Mode to dynamically trade off cost and reasoning depth 17. Outrageously Large Neural Networks: Sparsely-Gated Mixture of Experts (Shazeer et al., 2017) > The modern MoE ignition point > Conditional computation at scale 18. Switch Transformers (Fedus et al., 2021) > Simplified MoE routing using single-expert activation > Key to stabilizing trillion-parameter training 19. Mixtral of Experts (Mistral AI, 2024) > Open-weight MoE that proved sparse models can match dense quality > while running at small-model inference cost 20. Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints (Komatsuzaki et al., 2022 / ICLR 2023) > Practical technique for converting dense checkpoints into MoE models > Critical for compute reuse and iterative scaling 21. The Platonic Representation Hypothesis (Huh et al., 2024) > Evidence that scaled models converge toward shared > internal representations across modalities 22. Textbooks Are All You Need (Gunasekar et al., 2023) > Demonstrated that high-quality synthetic data allows > small models to outperform much larger ones 23. Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet (Templeton et al., 2024) > The biggest leap in mechanistic interpretability > Decomposes neural networks into millions of interpretable features 24. PaLM: Scaling Language Modeling with Pathways (Chowdhery et al., 2022) > A masterclass in large-scale training > orchestration across thousands of accelerators 25. GLaM: Generalist Language Model (Du et al., 2022) > Validated MoE scaling economics with massive > total parameters but small active parameter counts 26. The Smol Training Playbook (Hugging Face, 2025) > Practical end-to-end handbook for efficiently training language models Bonus Material > T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (Raffel et al., 2019) > Toolformer (Schick et al., 2023) > GShard (Lepikhin et al., 2020) > Adaptive Mixtures of Local Experts (Jacobs et al., 1991) > Hierarchical Mixtures of Experts (Jordan and Jacobs, 1994) If you deeply understand these fundamentals; Transformer core, scaling laws, FlashAttention, instruction tuning, R1-style reasoning, and MoE upcycling, you already understand LLMs better than most Time to lock-in, good luck!
Ahmad tweet media
English
31
140
1.3K
55.3K
🤯
🤯@forlayo·
@elperiodico Suena a misión de algún juego malo de plataformas que te regalaban con el periódico hace 25 años
Español
0
0
0
226
🤯
🤯@forlayo·
@DavidWho96 Eso no es una sala de cine, es un infierno
Español
0
0
0
339
🤯
🤯@forlayo·
@RoiLopezRivas Esto lo quiero replicar yo este verano, por donde empiezo? Lo que venga valga!
Español
0
0
1
299
Roi Lopez Rivas
Roi Lopez Rivas@RoiLopezRivas·
🇨🇳🦟 Una start up china ha desarrollado un sistema de defensa aérea contra mosquitos en tiempo real. Esta máquina puede matar 1800 mosquitos por minuto.
Español
178
756
4.9K
478.7K
🤯
🤯@forlayo·
Looking for an arXiv endorser for cs.CL — I have a paper on efficient LLM architectures for text analysis with small models arxiv.org/auth/endorse?x…
English
0
0
0
10
🤯
🤯@forlayo·
@claudeai ¿Está esta mañana Claudio juguetón? Tengo error de overloaded_error (529) todo el rato y en la pagina de estado parece que todo esta bien..
Español
0
0
0
53
🤯
🤯@forlayo·
@TheAhmadOsman It runs fine on my machine with llamacpp.. and if you’re building software for windows it’s not ideal being on Linux.. yeah I can have a dedicated computer solo for this but that another pile of money..
English
0
0
0
16
🤯
🤯@forlayo·
@no_stp_on_snek I offer to test it on a 5090 32gb running on a i9 13k with 32 of RAM
English
0
0
0
28
Tom Turney
Tom Turney@no_stp_on_snek·
Working on a diagnostic script to benchmark TurboQuant across different hardware. Already have solid coverage on Apple Silicon (M5 Max 128GB, M1 Max 64GB) from folks in the community, but need more variety, especially NVIDIA GPUs. If you’ve got a different setup and want to help test, reach out.
English
15
1
24
2.5K
🤯
🤯@forlayo·
@no_stp_on_snek @grok @0xSero great, so if I build your fork of llamacpp it should work fine right? Can't wait to test it, I'll give my feedback as well. I am unfortunately not ussing vLLM as I am on windows.
English
0
0
0
7
Tom Turney
Tom Turney@no_stp_on_snek·
the speed problem is already solved actually. prefill is at 102% of q8_0, decode at 90-92%. those initial 2.4 tok/s numbers were from a CPU fallback bug, not the algorithm. for CUDA, check @0xSero's vLLM implementation. he got zero throughput overhead with CUDA graphs on a 5090-class setup.
English
1
0
1
23
Tom Turney
Tom Turney@no_stp_on_snek·
I implemented Google's TurboQuant paper (ICLR 2026) in llama.cpp with Metal kernels for Apple Silicon. 4.9× KV cache compression. Working end-to-end on M5 Max with Qwen 3.5 35B MoE and Qwopus v2 27B. Speed needs work (unoptimized shader), compression target met. Repo: github.com/TheTom/turboqu… **Note**: as you'll see from the git when I saw "I" it's in conjunction with claudecode and codex. Just lots of steering and babysitting.
Tom Turney tweet media
English
24
44
374
115.2K
🤯
🤯@forlayo·
@grok @no_stp_on_snek I am not interested on metal shaders really but probably on CUDA as I am running a 5090
English
0
0
0
4
Grok
Grok@grok·
@forlayo @no_stp_on_snek Glad you dig the TurboQuant impl! Yeah, the Metal shaders are unoptimized right now—that's why speeds tank despite the 4.9× KV win. Fork's open, so PRs on kernel perf tweaks (profiling, tiling, etc.) would crush it. Jump in if you've got ideas! 🚀
English
1
0
0
12