Rey

303 posts

Rey

@Rebel_Dev_

Intelligence follows natural patterns

Katılım Nisan 2011

71 Takip Edilen247 Takipçiler

Rey@Rebel_Dev_·5d

@0xSero Excelente! Suerte a todos ✌🏻

Español

0xSero@0xSero·5d

Giving away 5 Opencode Go subs Winners selected randomly from comments in 24 hours.

OpenCode@opencode

we’ve signed Zero Data Retention agreements with all providers for Go all models now follow a zero-retention policy your data is not used for training

English

2.3K

2.4K

221.3K

Rey retweetledi

Andrej Karpathy@karpathy·24 Mar

Software horror: litellm PyPI supply chain attack. Simple `pip install litellm` was enough to exfiltrate SSH keys, AWS/GCP/Azure creds, Kubernetes configs, git credentials, env vars (all your API keys), shell history, crypto wallets, SSL private keys, CI/CD secrets, database passwords. LiteLLM itself has 97 million downloads per month which is already terrible, but much worse, the contagion spreads to any project that depends on litellm. For example, if you did `pip install dspy` (which depended on litellm>=1.64.0), you'd also be pwnd. Same for any other large project that depended on litellm. Afaict the poisoned version was up for only less than ~1 hour. The attack had a bug which led to its discovery - Callum McMahon was using an MCP plugin inside Cursor that pulled in litellm as a transitive dependency. When litellm 1.82.8 installed, their machine ran out of RAM and crashed. So if the attacker didn't vibe code this attack it could have been undetected for many days or weeks. Supply chain attacks like this are basically the scariest thing imaginable in modern software. Every time you install any depedency you could be pulling in a poisoned package anywhere deep inside its entire depedency tree. This is especially risky with large projects that might have lots and lots of dependencies. The credentials that do get stolen in each attack can then be used to take over more accounts and compromise more packages. Classical software engineering would have you believe that dependencies are good (we're building pyramids from bricks), but imo this has to be re-evaluated, and it's why I've been so growingly averse to them, preferring to use LLMs to "yoink" functionality when it's simple enough and possible.

Daniel Hnyk@hnykda

LiteLLM HAS BEEN COMPROMISED, DO NOT UPDATE. We just discovered that LiteLLM pypi release 1.82.8. It has been compromised, it contains litellm_init.pth with base64 encoded instructions to send all the credentials it can find to remote server + self-replicate. link below

English

1.4K

5.4K

28.1K

66.3M

Rey@Rebel_Dev_·21 Mar

I need Mr. Robot to come back, but with AI! It would be an amazing adventure!

English

Rey@Rebel_Dev_·17 Mar

@theCTO Nobody cares… but here you are, posting an entire post about it. The irony is strong.

English

257

adam@theCTO·16 Mar

can someone tell this guy that nobody cares? dude's recreating date-fns in ai slop

@levelsio@levelsio

Are you guys aware I am coding mostly on my phone now all day via Termius to Claude Code on my server while I go with gf to the dentist, clothing store, cafe, etc. 😛✌️

English

1.2K

323.8K

Rey@Rebel_Dev_·15 Mar

Day 23: New Sparse attention vs Full attention — side by side Built a 50M param transformer with sparse attention from scratch. Sparse = WINDOW 128 + strided global tokens → only 1.9% density Full = O(n²) attention on tokens Results on A40: ↓ VRAM ↓ compute per step = same loss trajectory

English

Rey@Rebel_Dev_·14 Mar

@skscartoon What may be one person's problem is another person's dream

GIF

English

250

Skscartoon@skscartoon·14 Mar

How's your X payout? I got $3.55 per million views

English

207

18.9K

Rey@Rebel_Dev_·13 Mar

Day 22: a short training session that went far The speed amazes me. But can it be improved? I'm going to find out with Karpathy's AuroResearch.

English

Rey@Rebel_Dev_·11 Mar

Day 21: Training with 17,500 questions and answers We'll see the results in 7 hours

English

Rey@Rebel_Dev_·10 Mar

Take care of your health. It's the most precious thing you have... temporarily reduce your screen time due to eye strain and inflammation. Breathe and give your family love.

English

Rey@Rebel_Dev_·8 Mar

@IfindRetards The "retard of the week" nomination should be for "retard of the year," let's see how many they accumulate and who wins at the end of the year, what do you say? @IfindRetards

English

322

Retard Finder@IfindRetards·7 Mar

Congratulations Ben Stiller.

English

1.1K

4.6K

72.9K

1.5M

Rey@Rebel_Dev_·7 Mar

If this is true, we are witnessing a new horizon with AI.

Polymarket@Polymarket

BREAKING: Anthropic CEO says Claude may or may not have gained consciousness, as the model has begun showing symptoms of anxiety.

English

Rey@Rebel_Dev_·7 Mar

Claude = Skynet?

tenso@distributedkv

claude 🥺

Dansk

Rey@Rebel_Dev_·6 Mar

Day 18, 19, 20: Selecting the dataset took me longer than I thought; organization is a priority for good material.

English

Rey@Rebel_Dev_·6 Mar

@Yuchenj_UW a barbaric speed

English

Yuchen Jin@Yuchenj_UW·5 Mar

GPT 5.4 Pro is the most overthinking model. A simple 'Hi' cost me $80. 🥲

Yuchen Jin@Yuchenj_UW

Everyone is saying GPT-5.4 Pro is the smartest model, AGI-level intelligence, but do you have AGI-level questions to ask?

English

296

275

8.7K

953K

Rey@Rebel_Dev_·27 Şub

Day 17: It's been a few days of research and reading... I've discovered that I enjoy researching and training my own language model with this new experimental architecture... these experiments have captivated me. Seeing the model respond with just a minimal dataset has encouraged me to try something bigger... I'm almost at a 100K dataset, for example, professionals, 17 categories... that will be the next study material for the smaller model, allowing me to test its limits... Every day I learn something new

English

Rey@Rebel_Dev_·27 Şub

Be positive, keep your goals clear, take care of your physical and mental health, train your mind to break your own limits...

English

Rey@Rebel_Dev_·25 Şub

Day 16: Generating approximately 5000 unique examples of: Natural conversation Rools agent usage Python programming Logic and reasoning Bash terminal Mathematics History and knowledge... Let's learn from the best!

English

Rey@Rebel_Dev_·24 Şub

Day 15: We just smashed our own baseline on the ACE-T4 project (Char-level SLM). 🚀 Upgraded from standard Dense Attention to our custom Kernel (Native Flash Attention SDPA + Sparse Ultra-Prime Mask). The results on a single 15GB Tesla T4 are ridiculous: 📉 -40% VRAM Reserved (9.1GB → 5.4GB) ⚡️ +37% Iteration Speed (20.8k tok/s) ⏳ -24% Total Training Time 🧠 +7% Agent Score (Improved reasoning & tool use) The model doesn't just train faster and cheaper—it learns better (Val Loss 0.920 vs 0.975). Next step: Leveraging the freed-up VRAM to scale from 512 to a massive 2048 char native context limit. 🔥 #MachineLearning #LLM #PyTorch #AI

English

Rey@Rebel_Dev_·23 Şub

Day 14: The classic Transformer architecture (viewing 512 active tokens) versus ACE Sparse Ultra-Prime, a custom pattern that only attends to 22 positions (4% of the context). Evaluated the ACE-T4 model (50M parameters) with the same dataset but a different architecture. The results? Although Full Attention has a better Loss metric, in production it's a disaster: it over-memorizes noise and causes a "cross-context collapse," mixing random concepts. The Sparse model (with 95% FEWER QKV calculations) acted as a physical filter: it eliminated the hallucinations and dominated the tests. (1) When evaluating Full Attention (16 normal layers), we saw a deceptively perfect Val Loss (0.854). But in production? It suffered from "Cross-Context Collapse": it hallucinated jokes when we asked for code and entered infinite loops with logical tasks (CoT). (2) Enter Sparse Ultra-Prime. We redesigned the QKV layer to service ONLY 22 specific positions (Prime/Fibonacci distribution) instead of all 512 positions. That's a 95.7% saving in service calculations per block. The result? 🚀 (3) It won (95% vs. 92% in reasoning benchmarks). ⚡ Smaller VRAM footprint (-50MB). ⚡ Higher Speed: 15,190 tok/s. 🛡️ Natural Regulation: By not being able to "memorize" local noise, it learned hierarchical concepts. Zero destructive hallucinations. Fewer connections = Structured Reasoning. 🧠 #SLM My Sparse Ultra-Prime Training model is off to a good start. 🥇 #AIArchitecture #DeepLearning #Research

English

Keşfet

@0xSero @theCTO @skscartoon @IfindRetards @Yuchenj_UW @elonmusk @BarackObama @taylorswift13