Rey

303 posts

Rey

Rey

@Rebel_Dev_

Intelligence follows natural patterns

Katılım Nisan 2011
71 Takip Edilen247 Takipçiler
Rey
Rey@Rebel_Dev_·
@0xSero Excelente! Suerte a todos ✌🏻
Español
0
0
0
3
Rey retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
Software horror: litellm PyPI supply chain attack. Simple `pip install litellm` was enough to exfiltrate SSH keys, AWS/GCP/Azure creds, Kubernetes configs, git credentials, env vars (all your API keys), shell history, crypto wallets, SSL private keys, CI/CD secrets, database passwords. LiteLLM itself has 97 million downloads per month which is already terrible, but much worse, the contagion spreads to any project that depends on litellm. For example, if you did `pip install dspy` (which depended on litellm>=1.64.0), you'd also be pwnd. Same for any other large project that depended on litellm. Afaict the poisoned version was up for only less than ~1 hour. The attack had a bug which led to its discovery - Callum McMahon was using an MCP plugin inside Cursor that pulled in litellm as a transitive dependency. When litellm 1.82.8 installed, their machine ran out of RAM and crashed. So if the attacker didn't vibe code this attack it could have been undetected for many days or weeks. Supply chain attacks like this are basically the scariest thing imaginable in modern software. Every time you install any depedency you could be pulling in a poisoned package anywhere deep inside its entire depedency tree. This is especially risky with large projects that might have lots and lots of dependencies. The credentials that do get stolen in each attack can then be used to take over more accounts and compromise more packages. Classical software engineering would have you believe that dependencies are good (we're building pyramids from bricks), but imo this has to be re-evaluated, and it's why I've been so growingly averse to them, preferring to use LLMs to "yoink" functionality when it's simple enough and possible.
Daniel Hnyk@hnykda

LiteLLM HAS BEEN COMPROMISED, DO NOT UPDATE. We just discovered that LiteLLM pypi release 1.82.8. It has been compromised, it contains litellm_init.pth with base64 encoded instructions to send all the credentials it can find to remote server + self-replicate. link below

English
1.4K
5.4K
28.1K
66.3M
Rey
Rey@Rebel_Dev_·
I need Mr. Robot to come back, but with AI! It would be an amazing adventure!
English
0
0
0
9
Rey
Rey@Rebel_Dev_·
@theCTO Nobody cares… but here you are, posting an entire post about it. The irony is strong.
English
0
0
0
257
Rey
Rey@Rebel_Dev_·
Day 23: New Sparse attention vs Full attention — side by side Built a 50M param transformer with sparse attention from scratch. Sparse = WINDOW 128 + strided global tokens → only 1.9% density Full = O(n²) attention on tokens Results on A40: ↓ VRAM ↓ compute per step = same loss trajectory
English
0
0
0
17
Rey
Rey@Rebel_Dev_·
@skscartoon What may be one person's problem is another person's dream
GIF
English
0
0
3
250
Skscartoon
Skscartoon@skscartoon·
How's your X payout? I got $3.55 per million views
Skscartoon tweet mediaSkscartoon tweet media
English
42
14
207
18.9K
Rey
Rey@Rebel_Dev_·
Day 22: a short training session that went far The speed amazes me. But can it be improved? I'm going to find out with Karpathy's AuroResearch.
Rey tweet media
English
0
0
1
32
Rey
Rey@Rebel_Dev_·
Day 21: Training with 17,500 questions and answers We'll see the results in 7 hours
Rey tweet media
English
0
0
0
10
Rey
Rey@Rebel_Dev_·
Take care of your health. It's the most precious thing you have... temporarily reduce your screen time due to eye strain and inflammation. Breathe and give your family love.
English
0
0
0
6
Rey
Rey@Rebel_Dev_·
@IfindRetards The "retard of the week" nomination should be for "retard of the year," let's see how many they accumulate and who wins at the end of the year, what do you say? @IfindRetards
English
0
1
1
322
Retard Finder
Retard Finder@IfindRetards·
Congratulations Ben Stiller.
Retard Finder tweet media
English
1.1K
4.6K
72.9K
1.5M
Rey
Rey@Rebel_Dev_·
Day 18, 19, 20: Selecting the dataset took me longer than I thought; organization is a priority for good material.
English
0
0
0
5
Rey
Rey@Rebel_Dev_·
Day 17: It's been a few days of research and reading... I've discovered that I enjoy researching and training my own language model with this new experimental architecture... these experiments have captivated me. Seeing the model respond with just a minimal dataset has encouraged me to try something bigger... I'm almost at a 100K dataset, for example, professionals, 17 categories... that will be the next study material for the smaller model, allowing me to test its limits... Every day I learn something new
Rey tweet media
English
0
0
0
12
Rey
Rey@Rebel_Dev_·
Be positive, keep your goals clear, take care of your physical and mental health, train your mind to break your own limits...
English
1
0
0
26
Rey
Rey@Rebel_Dev_·
Day 16: Generating approximately 5000 unique examples of: Natural conversation Rools agent usage Python programming Logic and reasoning Bash terminal Mathematics History and knowledge... Let's learn from the best!
English
0
0
0
23
Rey
Rey@Rebel_Dev_·
Day 15: We just smashed our own baseline on the ACE-T4 project (Char-level SLM). 🚀 Upgraded from standard Dense Attention to our custom Kernel (Native Flash Attention SDPA + Sparse Ultra-Prime Mask). The results on a single 15GB Tesla T4 are ridiculous: 📉 -40% VRAM Reserved (9.1GB → 5.4GB) ⚡️ +37% Iteration Speed (20.8k tok/s) ⏳ -24% Total Training Time 🧠 +7% Agent Score (Improved reasoning & tool use) The model doesn't just train faster and cheaper—it learns better (Val Loss 0.920 vs 0.975). Next step: Leveraging the freed-up VRAM to scale from 512 to a massive 2048 char native context limit. 🔥 #MachineLearning #LLM #PyTorch #AI
Rey tweet mediaRey tweet media
English
0
0
0
67
Rey
Rey@Rebel_Dev_·
Day 14: The classic Transformer architecture (viewing 512 active tokens) versus ACE Sparse Ultra-Prime, a custom pattern that only attends to 22 positions (4% of the context). Evaluated the ACE-T4 model (50M parameters) with the same dataset but a different architecture. The results? Although Full Attention has a better Loss metric, in production it's a disaster: it over-memorizes noise and causes a "cross-context collapse," mixing random concepts. The Sparse model (with 95% FEWER QKV calculations) acted as a physical filter: it eliminated the hallucinations and dominated the tests. (1) When evaluating Full Attention (16 normal layers), we saw a deceptively perfect Val Loss (0.854). But in production? It suffered from "Cross-Context Collapse": it hallucinated jokes when we asked for code and entered infinite loops with logical tasks (CoT). (2) Enter Sparse Ultra-Prime. We redesigned the QKV layer to service ONLY 22 specific positions (Prime/Fibonacci distribution) instead of all 512 positions. That's a 95.7% saving in service calculations per block. The result? 🚀 (3) It won (95% vs. 92% in reasoning benchmarks). ⚡ Smaller VRAM footprint (-50MB). ⚡ Higher Speed: 15,190 tok/s. 🛡️ Natural Regulation: By not being able to "memorize" local noise, it learned hierarchical concepts. Zero destructive hallucinations. Fewer connections = Structured Reasoning. 🧠 #SLM My Sparse Ultra-Prime Training model is off to a good start. 🥇 #AIArchitecture #DeepLearning #Research
Rey tweet mediaRey tweet mediaRey tweet mediaRey tweet media
English
0
0
0
46