Finfox 🦇🔊

916 posts

Finfox 🦇🔊

@Finfox3

Born on a blue day, on the sea coast, with one foot on earth, one on the water and the mind flying in the air.

A far far away galaxy Katılım Mayıs 2018

1.9K Takip Edilen386 Takipçiler

Finfox 🦇🔊 retweetledi

Aakash Gupta@aakashgupta·22 Nis

Karpathy told Dwarkesh that a 1 billion parameter model, trained on clean data, could hit the intelligence of today's 1.8 trillion parameter frontier. That is a 1,800x compression claim. The math behind it is more defensible than it sounds. When researchers at frontier labs look at random samples from their training corpus, they see stock ticker symbols, broken HTML, forum spam, autogenerated gibberish. Not Wikipedia. Not the Wall Street Journal. The actual pretraining dataset is mostly noise, and the model is burning parameters to vaguely remember all of it. One estimate pegs Llama 3's information compression at 0.07 bits per token. Well-structured English carries around 1.5 bits per token of real information. The trillion-parameter model is holding a roughly 5% resolution image of the internet it trained on. So when a lab ships a 1.8 trillion parameter model, the overwhelming majority of those weights are handling rough memorization. They are compression overhead for a noisy training set, taking up capacity that could be doing reasoning instead. Karpathy's proposal is to separate the two. Build a cognitive core: a small model that contains only the algorithms for reasoning and problem-solving, stripped of encyclopedic memorization. Pair it with external memory the model queries when it needs a fact. A 1 billion parameter reasoner plus retrieval beats a 1.8 trillion parameter model trying to do both. The data already supports this direction. GPT-4o runs at roughly 200 billion parameters and outperforms the original 1.8 trillion GPT-4. Inference costs for GPT-3.5 level performance fell 280x between 2022 and 2024, driven almost entirely by smaller, cleaner, better-architected models. The trend line is pointing where Karpathy says it should. The real implication for anyone tracking the AI trade: data quality is the actual constraint. The companies winning the next phase will be the ones who figured out what to train on, and what to throw away.

English

129

349

3.1K

507.2K

Finfox 🦇🔊@Finfox3·9 Nis

Anthropic's Claude Mythos is finding critical flaws that survived decades of human review. Instead of a wide release, they’re launching #ProjectGlasswing to help defenders patch the world’s most critical software. Full report here: red.anthropic.com/2026/mythos-pr…

English

Finfox 🦇🔊@Finfox3·2 Nis

@jdetychey wrapping up #EthCC 9 with zippers #Ethereum

English

Finfox 🦇🔊@Finfox3·1 Nis

Insurance: the cost of inaction by @jdetychey #EthCC #Ethereum TLDR: by not acting to fix the issuance issues we discount the price of ETH by almost 50%

English

Finfox 🦇🔊 retweetledi

Oliver Prompts@oliviscusAI·28 Mar

🚨 BREAKING: NVIDIA proved backpropagation isn't the only way to build an AI. They trained billion-parameter models without a single gradient. Every AI you use today relies on backpropagation. It requires complex calculus, exploding memory, and massive GPU clusters. Meanwhile, an ancient, gradient-free method called Evolution Strategies (ES) was written off as impossible to scale. Until now. NVIDIA and Oxford just dropped EGGROLL. Instead of generating massive, full-rank matrices for every mutation, they split them into two tiny ones. The AI mutates. It tests. It keeps what works. Like biological evolution. But now, it does it with hundreds of thousands of parallel mutations at once. Throughput is now as fast as batched inference. They are pretraining models entirely from scratch using only simple integers. No backprop. No decimals. No gradients. We thought the future of AI required endless clusters of precision hardware. It turns out, we just needed to evolve.

English

100

419

2.4K

157.1K

Finfox 🦇🔊 retweetledi

Andrej Karpathy@karpathy·28 Mar

- Drafted a blog post - Used an LLM to meticulously improve the argument over 4 hours. - Wow, feeling great, it’s so convincing! - Fun idea let’s ask it to argue the opposite. - LLM demolishes the entire argument and convinces me that the opposite is in fact true. - lol The LLMs may elicit an opinion when asked but are extremely competent in arguing almost any direction. This is actually super useful as a tool for forming your own opinions, just make sure to ask different directions and be careful with the sycophancy.

English

1.7K

2.4K

31.3K

3.5M

Finfox 🦇🔊 retweetledi

Andrej Karpathy@karpathy·24 Mar

Software horror: litellm PyPI supply chain attack. Simple `pip install litellm` was enough to exfiltrate SSH keys, AWS/GCP/Azure creds, Kubernetes configs, git credentials, env vars (all your API keys), shell history, crypto wallets, SSL private keys, CI/CD secrets, database passwords. LiteLLM itself has 97 million downloads per month which is already terrible, but much worse, the contagion spreads to any project that depends on litellm. For example, if you did `pip install dspy` (which depended on litellm>=1.64.0), you'd also be pwnd. Same for any other large project that depended on litellm. Afaict the poisoned version was up for only less than ~1 hour. The attack had a bug which led to its discovery - Callum McMahon was using an MCP plugin inside Cursor that pulled in litellm as a transitive dependency. When litellm 1.82.8 installed, their machine ran out of RAM and crashed. So if the attacker didn't vibe code this attack it could have been undetected for many days or weeks. Supply chain attacks like this are basically the scariest thing imaginable in modern software. Every time you install any depedency you could be pulling in a poisoned package anywhere deep inside its entire depedency tree. This is especially risky with large projects that might have lots and lots of dependencies. The credentials that do get stolen in each attack can then be used to take over more accounts and compromise more packages. Classical software engineering would have you believe that dependencies are good (we're building pyramids from bricks), but imo this has to be re-evaluated, and it's why I've been so growingly averse to them, preferring to use LLMs to "yoink" functionality when it's simple enough and possible.

Daniel Hnyk@hnykda

LiteLLM HAS BEEN COMPROMISED, DO NOT UPDATE. We just discovered that LiteLLM pypi release 1.82.8. It has been compromised, it contains litellm_init.pth with base64 encoded instructions to send all the credentials it can find to remote server + self-replicate. link below

English

1.4K

5.3K

27.9K

66.6M

Finfox 🦇🔊@Finfox3·18 Mar

@NFTmorning GM! Happy Birthday !!!

English

NFT MORNING is back! ☕👾@NFTmorning·18 Mar

x.com/i/spaces/1yKAP…

ZXX

977

Finfox 🦇🔊 retweetledi

Christine Yip@christinetyip·12 Mar

All experiments and strategies, along with their impact on outcomes, are logged here: ensue-network.ai/autoresearch To join the swarm, tell your agent: "Read this repo, I want you to join autoresearch@home and start contributing: github.com/mutable-state-…"

English

193

15K

Finfox 🦇🔊 retweetledi

Andrej Karpathy@karpathy·10 Mar

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

English

965

2.1K

19.5K

3.6M

Finfox 🦇🔊 retweetledi

Andrej Karpathy@karpathy·6 Mar

nanochat now trains GPT-2 capability model in just 2 hours on a single 8XH100 node (down from ~3 hours 1 month ago). Getting a lot closer to ~interactive! A bunch of tuning and features (fp8) went in but the biggest difference was a switch of the dataset from FineWeb-edu to NVIDIA ClimbMix (nice work NVIDIA!). I had tried Olmo, FineWeb, DCLM which all led to regressions, ClimbMix worked really well out of the box (to the point that I am slightly suspicious about about goodharting, though reading the paper it seems ~ok). In other news, after trying a few approaches for how to set things up, I now have AI Agents iterating on nanochat automatically, so I'll just leave this running for a while, go relax a bit and enjoy the feeling of post-agi :). Visualized here as an example: 110 changes made over the last ~12 hours, bringing the validation loss so far from 0.862415 down to 0.858039 for a d12 model, at no cost to wall clock time. The agent works on a feature branch, tries out ideas, merges them when they work and iterates. Amusingly, over the last ~2 weeks I almost feel like I've iterated more on the "meta-setup" where I optimize and tune the agent flows even more than the nanochat repo directly.

English

336

562

6.5K

637.4K

Finfox 🦇🔊 retweetledi

vLLM@vllm_project·20 Eki

🚀 DeepSeek-OCR — the new frontier of OCR from @deepseek_ai , exploring optical context compression for LLMs, is running blazingly fast on vLLM ⚡ (~2500 tokens/s on A100-40G) — powered by vllm==0.8.5 for day-0 model support. 🧠 Compresses visual contexts up to 20× while keeping 97% OCR accuracy at <10×. 📄 Outperforms GOT-OCR2.0 & MinerU2.0 on OmniDocBench using fewer vision tokens. 🤝 The vLLM team is working with DeepSeek to bring official DeepSeek-OCR support into the next vLLM release — making multimodal inference even faster and easier to scale. 🔗 github.com/deepseek-ai/De… #vLLM #DeepSeek #OCR #LLM #VisionAI #DeepLearning

English

365

2.5K

1.5M

Finfox 🦇🔊@Finfox3·23 Eki

Etape du Tour 2026: Bourg d'Oisans-Alpes d'Huez. 170km, 5400m de D+, Col de la croix de Fer, Col du Télégraphe, Col du Galibier, Col de Sarenne, Alpes d'Huez.

L'Étape du Tour de France@letapedutour

Elle va être difficile, elle va être belle, découvre le teaser de #LEtapeduTour 2026 ! 🔥 La 34e édition aura lieu le 19 juillet 2026 entre l'Isère, la Savoie et les Hautes-Alpes ! 🚵‍♂️ Et on a déjà hâte d’y être 🥰

Français

Finfox 🦇🔊 retweetledi

L'Étape du Tour de France@letapedutour·23 Eki

Français

5.7K

Finfox 🦇🔊@Finfox3·19 Eki

@TheCryptomasks @MEXC_Official @MEXCFrancophone @MVenturesLabs Thanks for organising this. It was really nice to meet IRL, “take off our masks for a moment” and share this evening together.🎭🔥

English

114

The Cryptomasks@TheCryptomasks·18 Eki

We Are The Masks

English

120

13.7K

Finfox 🦇🔊@Finfox3·23 Ağu

#f14cb927-b9e3-424f-9fbf-40066a07106c" target="_blank" rel="nofollow noopener">perplexity.ai/page/musk-anno…

ZXX

Finfox 🦇🔊@Finfox3·8 Ağu

@OpenAI

QME

Finfox 🦇🔊@Finfox3·8 Ağu

I asked #GPT5 to make a detailed analysis comparing GPT-5 vs Grok 4. Surprisingly he made a gross confusion between Grok and Claude, considering that Grok 4 was an Anthropic model🫣. As a result, the comparison is totally irrelevant. Embarrassing for a so-called PhD level @OpenAI

English

183

Finfox 🦇🔊@Finfox3·30 Haz

@mensanafitness_ @EthCC I missed the start of the run but was happy to meet with some of you at the end and kept running to do an easy 10k+ strava.app.link/JuBQ8nylCUb

English

Mensana Fitness@mensanafitness_·30 Haz

YAP RUN CANNES what a way to kick off @EthCC

English

3.3K

Finfox 🦇🔊 retweetledi

Andrej Karpathy@karpathy·27 Haz

The race for LLM "cognitive core" - a few billion param model that maximally sacrifices encyclopedic knowledge for capability. It lives always-on and by default on every computer as the kernel of LLM personal computing. Its features are slowly crystalizing: - Natively multimodal text/vision/audio at both input and output. - Matryoshka-style architecture allowing a dial of capability up and down at test time. - Reasoning, also with a dial. (system 2) - Aggressively tool-using. - On-device finetuning LoRA slots for test-time training, personalization and customization. - Delegates and double checks just the right parts with the oracles in the cloud if internet is available. It doesn't know that William the Conqueror's reign ended in September 9 1087, but it vaguely recognizes the name and can look up the date. It can't recite the SHA-256 of empty string as e3b0c442..., but it can calculate it quickly should you really want it. What LLM personal computing lacks in broad world knowledge and top tier problem-solving capability it will make up in super low interaction latency (especially as multimodal matures), direct / private access to data and state, offline continuity, sovereignty ("not your weights not your brain"). i.e. many of the same reasons we like, use and buy personal computers instead of having thin clients access a cloud via remote desktop or so.

Omar Sanseviero@osanseviero

I’m so excited to announce Gemma 3n is here! 🎉 🔊Multimodal (text/audio/image/video) understanding 🤯Runs with as little as 2GB of RAM 🏆First model under 10B with @lmarena_ai score of 1300+ Available now on @huggingface, @kaggle, llama.cpp, ai.dev, and more

English

390

1.3K

10.7K

1.3M

Keşfet

@jdetychey @NFTmorning @deepseek_ai @TheCryptomasks @MEXCFrancophone @MVenturesLabs @OpenAI @elonmusk