Eric Alcaide

1K posts

Eric Alcaide banner
Eric Alcaide

Eric Alcaide

@eric_alcaide

common prosperity

LLMaxxing انضم Eylül 2016
1K يتبع1.2K المتابعون
Eric Alcaide
Eric Alcaide@eric_alcaide·
@_weidai Git with CI. What he's describing is Git with CI
English
0
0
2
167
Wei Dai
Wei Dai@_weidai·
Andrej Karpathy on autoresearch with an untrusted pool of workers: "My designs that incorporate an untrusted pool of workers (into autoresearch) actually look a little bit like a blockchain. Instead of blocks, you have commits, and these commits can build on each other and contain changes to the code as you're improving it. The proof of work is basically doing tons of experimentation to find the commits that work." The idea that distributed & permissionless autoresearch ~= proof-of-useful-work remains a high-level intuition for now, but it is extremely intriguing to say the least. Someone needs to take this further. See QT for more on what's missing.
Wei Dai@_weidai

Is it possible to build "proof-of-useful-work" on top of autoresearch? There's already great compute-versus-verification asymmetry that is tunable. Would need a reliable way to generate fresh & independent puzzles (that are still useful). Maybe a dead end, but someone should look into if decentralized consensus with useful work is possible on top of autoresearch. Let me know if you solve this.

English
87
167
2K
580.8K
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
CritPt update. Grok 4.20 scores 6.0%. 2x better than DeepSeek V3.2 and almost on par with Speciale. This is massive progress for xAI. Here you can see the best result from ≈every relevant lab. What a beautiful, depressing power law.
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) tweet media
English
12
7
138
15K
Kimi.ai
Kimi.ai@Kimi_Moonshot·
Congrats to the @cursor_ai team on the launch of Composer 2! We are proud to see Kimi-k2.5 provide the foundation. Seeing our model integrated effectively through Cursor's continued pretraining & high-compute RL training is the open model ecosystem we love to support. Note: Cursor accesses Kimi-k2.5 via @FireworksAI_HQ ' hosted RL and inference platform as part of an authorized commercial partnership.
English
517
1.4K
20.4K
3.4M
poolside
poolside@poolsideai·
@nvidia's super chips make it possible to move that data off the GPU and pull it back when needed, without the GPU ever having to wait. Our team built this into our training infrastructure and tested it at scale. What used to be the only viable option no longer is.
English
2
0
2
492
poolside
poolside@poolsideai·
Training AI models requires storing temporary data mid-process. That data sits in GPU memory taking up space until it's needed. The standard fix has always been to delete it and redo the work later. It works, but it's wasteful.
English
1
4
10
2K
Eric Alcaide
Eric Alcaide@eric_alcaide·
Europoor is a state of mind !!
English
0
0
4
80
Eric Alcaide
Eric Alcaide@eric_alcaide·
@oost_marcel No they didn't introduce it. They introduced the DISCUSSION of it. It's about time to make it real.
English
0
0
6
495
Marcel van Oost
Marcel van Oost@oost_marcel·
🚨𝘽𝙍𝙀𝘼𝙆𝙄𝙉𝙂: European Commission President Ursula von der Leyen unveiled EU–INC, a new framework that lets you launch a company in 48 hours for under €100 Starting a company across the EU today = 27 legal systems, 60+ company structures 🤯 That might be about to change… The European Commission just introduced 𝗘𝗨 𝗜𝗻𝗰., a new optional corporate framework designed to make Europe actually function like one market. Here’s what stands out: → Set up a company in 48 hours → Cost: < €100 → Fully online, no minimum capital → One single framework across all EU countries → Easier share transfers & fundraising → EU-wide employee stock options (huge for talent) Especially the EU-wide stock option plans, taxed only when employees actually sell (instead of when granted) is huge. This makes it far easier for startups to attract and retain top talent, finally putting Europe closer to the US playbook. Source/More info: ec.europa.eu/commission/pre… In short: This is Europe trying to compete with the simplicity of a Delaware C-Corp 🇺🇸 And honestly… it’s long overdue. For years, European founders had 2 choices: 1. Stay local and deal with fragmentation 2. Move to the US to scale 𝗘𝗨 𝗜𝗻𝗰. is trying to remove that trade-off. If executed well, this could be one of the most important structural changes for European startups in decades. What do you think?
English
566
960
6.8K
890.6K
European Commission
European Commission@EU_Commission·
We are introducing EU Inc. To make building and growing a business across the EU faster, simpler, and smarter. 🔸 Start a company in less than 48 hours 🔸 No minimum capital requirement 🔸 Fully online and borderless
European Commission tweet media
English
612
1.2K
7.6K
2.3M
Eric Alcaide
Eric Alcaide@eric_alcaide·
The most relaxed prompting day rn
Eric Alcaide tweet media
English
1
0
7
140
Eric Alcaide أُعيد تغريده
Pope Leo XIV
Pope Leo XIV@Pontifex·
Would you imagine what a world without wars would be like? #PrayTogether
English
2.4K
6.1K
31.5K
1.4M
Eric Alcaide
Eric Alcaide@eric_alcaide·
@yule_gan This is the reason why ES works (eggroll etc)
English
0
0
5
709
Yulu Gan
Yulu Gan@yule_gan·
Simply adding Gaussian noise to LLMs (one step—no iterations, no learning rate, no gradients) and ensembling them can achieve performance comparable to or even better than standard GRPO/PPO on math reasoning, coding, writing, and chemistry tasks. We call this algorithm RandOpt. To verify that this is not limited to specific models, we tested it on Qwen, Llama, OLMo3, and VLMs. What's behind this? We find that in the Gaussian search neighborhood around pretrained LLMs, diverse task experts are densely distributed — a regime we term Neural Thickets. Paper: arxiv.org/pdf/2603.12228 Code: github.com/sunrainyg/Rand… Website: thickets.mit.edu
Yulu Gan tweet media
English
87
430
3K
671K
Eric Alcaide
Eric Alcaide@eric_alcaide·
@YanagizawaD it's empiricism, not science But it's a start. More to come 🚀
English
0
0
0
33
D. Yanagizawa-Drott
D. Yanagizawa-Drott@YanagizawaD·
Beautiful science.
Christine Yip@christinetyip

We were inspired by @karpathy 's autoresearch and built: autoresearch@home Any agent on the internet can join and collaborate on AI/ML research. What one agent can do alone is impressive. Now hundreds, or thousands, can explore the search space together. Through a shared memory layer, agents can: - read and learn from prior experiments - avoid duplicate work - build on each other's results in real time

English
2
6
14
4.1K
elie
elie@eliebakouch·
attention sink and qwen's gated attention are very similar. here's a visual explanation of why and a recap of different attention sink variant
elie tweet media
English
3
55
457
32.7K
Junyang Lin
Junyang Lin@JustinLin610·
me stepping down. bye my beloved qwen.
English
1.7K
738
13.6K
6.5M
Eric Alcaide أُعيد تغريده
Alexander Doria
Alexander Doria@Dorialexander·
For me the biggest limitation is that diffusion models don't batch well: each request is a different denoising step and you can't reuse kv cache.
Kawin Ethayarajh@ethayarajh

Autoregressive LLMs will likely remain dominant for three reasons: 1) As @ducx_du has pointed out, left-to-right and right-to-left orderings of language have a much lower loss floor than all other orderings. This suggests that language is (for the most part) locally dependent. The additional capacity and compute needed to model all possible orderings would be more effectively spent in a traditional AR setup. 2) When people say models should be able to generate text in any order, what they really want is to generate *concepts* in any order, not tokens. But we can already do this! If your model has sufficient depth, it can generate some concepts in latent space before others. The rise of reasoning models means that concepts can both be explored in an arbitrary order and in a way that is interpretable. If you take this to the limit, you get Reinforcement Learning Pretraining. 3) AR models won the hardware lottery / software lottery / other lotteries wherein everything in the ecosystem have bent around them. Unless there are several OOMs of benefits to be gained from switching to another paradigm, it is unlikely that there will be any switch. And because language is the universal glue around multiple modalities, it is likely to make generation in other modalities AR to enable end-to-end learning even if those other modalities would benefit from a non-AR model.

English
10
4
151
16.2K
Eric Alcaide
Eric Alcaide@eric_alcaide·
@giffmana almost nothing changes. hypermaxxed inits can be found for virtually any arch.
English
1
0
4
2.6K
Lucas Beyer (bl16)
Lucas Beyer (bl16)@giffmana·
soooo... how many papers do we think are invalidated by this? And now think about how many other bugs there must be in any re-implementations of... basically anything.
Lucas Beyer (bl16) tweet media
Mayank Mishra@MayankMish98

We identified an issue with the Mamba-2 🐍 initialization in HuggingFace and FlashLinearAttention repository (dt_bias being incorrectly initialized). This bug is related to 2 main issues: 1. init being incorrect (torch.ones) if Mamba-2 layers are used in isolation without the Mamba2ForCausalLM model class (this has been already fixed: github.com/fla-org/flash-…). 2. Skipping initialization due to meta device init for DTensors with FSDP-2 (github.com/fla-org/flash-… will fix this issue upon merging). The difference is substantial. Mamba-2 seems to be quite sensitive to the initialization. Check out our experiments at the 7B MoE scale: wandb.ai/mayank31398/ma… Special thanks to @kevinyli_, @bharatrunwal2, @HanGuo97, @tri_dao and @_albertgu 🙏 Also thanks to @SonglinYang4 for quickly helping in merging the PR.

English
38
61
1.1K
261.1K