Ted - 🥖/acc

780 posts

Ted - 🥖/acc

@ted_engineer

🇫🇷 / 25yo / dad / deep learning chef @duonlabshq

Savoie เข้าร่วม Ağustos 2023

981 กำลังติดตาม195 ผู้ติดตาม

ทวีตที่ปักหมุด

Ted - 🥖/acc@ted_engineer·20 Eki

Bitter-pilled trading has been shipped 🫡 Introducing Hyperextropy 🧵

English

Ted - 🥖/acc รีทวีตแล้ว

roon@tszzl·9h

@allTheYud well if there’s a history book…

English

310

10.6K

Ted - 🥖/acc รีทวีตแล้ว

Arcee.ai@arcee_ai·1d

Today we're releasing Trinity-Large-Thinking. Available now on the Arcee API, with open weights on Hugging Face under Apache 2.0. We built it for developers and enterprises that want models they can inspect, post-train, host, distill, and own.

English

197

1.5K

384.5K

Ted - 🥖/acc รีทวีตแล้ว

Backed@BackedApp·2d

Good agents, just on @megaeth. Mafia Apps are almost ready to deploy - @bread_ and the Mega team are doing a great job pushing to get as many agents live as possible. But we still need real use cases. @DuonLabsHQ has been building behind the scenes and we have something ready for you. Coming soon on Backed.

Luca@lvxsirio

Last week I had a great call with @_weidai about the robot economy. VCs clearly understand the market opportunity, but the sector still needs real use cases. Are there any agents ready to be deployed on Backed, with real iterations?

English

4.8K

Ted - 🥖/acc@ted_engineer·25 Mar

28th April 2025: publish paper Markets: nothing 24th March 2026: publish blogpost on the paper + X post Markets: Memory stock down "Efficient markets" ladies and gents

Ejaaz@cryptopunk7213

wow google might've popped the ai bubble, memory stocks down massively today: their new algorithm shrinks an AI model's memory by 6X WITHOUT reducing it's intelligence making it 8x faster with the SAME # of GPUs: if this works - we don't need as many GPUs to train AI - kv-cache is basically a model's short term memory. it gets massive pretty quickly = larger, slower, expensive ai - google's algo compresses it to just 3-bits with ZERO loss in accuracy (usually models are like 32-bit) the combined market cap of micron and sandisk is $527 billion and im not even factoring in SK hynix and samsung ai has driven up memory prices by 500%+ over the last few months - if google's algo scales then this might crash.

English

Ted - 🥖/acc@ted_engineer·24 Mar

gpt wrappers startups claude code wrappers companies Karpathy wrapper tech influencers

English

Ted - 🥖/acc รีทวีตแล้ว

vittorio@IterIntellectus·20 Mar

this is art

English

249

6.3K

43.3K

932.2K

Ted - 🥖/acc@ted_engineer·19 Mar

@fchollet There are two kinds of person in ai: -Those who believe in the golden algorithm -Engineers

English

François Chollet@fchollet·19 Mar

This is more evidence that current frontier models remain completely reliant on content-level memorization, as opposed to higher-level generalizable knowledge (such as metalearning knowledge, problem-solving strategies...)

Lossfunk@lossfunk

🚨 Shocking: Frontier LLMs score 85-95% on standard coding benchmarks. We gave them equivalent problems in languages they couldn't have memorized. They collapsed to 0-11%. Presenting EsoLang-Bench. Accepted to the Logical Reasoning and ICBINB workshops at ICLR 2026 🧵

English

193

330

309.7K

Ted - 🥖/acc@ted_engineer·19 Mar

@OpenAI Finally x.com/ted_engineer/s…

Ted - 🥖/acc@ted_engineer

@alexjc @LambdaAPI The NanoGPT speedrun should be reframed with a bits/byte criteria which is solution agnostic. The tokenization should be in the space of possible optimization

English

554

OpenAI@OpenAI·18 Mar

Are you up for a challenge? openai.com/parameter-golf

English

383

285

4.2K

1.4M

Ted - 🥖/acc@ted_engineer·14 Mar

@iamgingertrash Do you trust more north compass or south compass ? x.com/jimcramer/stat…

Jim Cramer@jimcramer

Not buying the 2007 scenario...

English

772

simp 4 satoshi@iamgingertrash·14 Mar

Historically, These are bottom signals Just an observation

Polymarket@Polymarket

BREAKING: Ethereum is now projected to lose its spot as the second largest cryptocurrency. 57% chance Ethereum is flipped this year. polymarket.com/event/eth-flip…

English

326

87.8K

Ted - 🥖/acc รีทวีตแล้ว

tensorqt@tensorqt·12 Mar

we're officially live with the open beta of Flywheel. Flywheel is Paradigma's vision on what autonomous research will run on. We've built all of this pre-funding, so expect the pace to accelerate significantly. Super eager to hear your feedback and your ideas. I'll be live tweeting some of my experiments with Flywheel in a few minutes. We're just getting started.

Paradigma@paradigmainc

introducing Flywheel: the infrastructure for autonomous research.

English

152

16.3K

Ted - 🥖/acc รีทวีตแล้ว

the tiny corp@__tinygrad__·11 Mar

ZXX

48.8K

Ted - 🥖/acc@ted_engineer·8 Mar

If it's written on arxiv it must be true

Alexander Long@AlexanderLong

insane sequence of statements buried in an Alibaba tech report

English

Ted - 🥖/acc รีทวีตแล้ว

Alexander Long@AlexanderLong·6 Mar

insane sequence of statements buried in an Alibaba tech report

English

229

945

6.9K

2.8M

Ted - 🥖/acc รีทวีตแล้ว

Tri Dao@tri_dao·5 Mar

The FA4 paper is finally out after a year of work. On Blackwell GPUs, attention now goes about as fast as matmul even though the bottlenecks are so different! Tensor cores are now crazy fast that attn fwd is bottlenecked by exponential, and attn bwd is bottlenecked by shared memory bandwidth. Some fun stuff in the redesigned algorithm to overcome these bottlenecks: exponential emulation with polynomials, new online softmax to avoid 90% of softmax rescaling, 2CTA MMA instructions that allow two thread blocks to share operands to reduce smem traffic.

Ted Zadouri@tedzadouri

Asymmetric hardware scaling is here. Blackwell tensor cores are now so fast, exp2 and shared memory are the wall. FlashAttention-4 changes the algorithm & pipeline so that softmax & SMEM bandwidth no longer dictate speed. Attn reaches ~1600 TFLOPs, pretty much at matmul speed! joint work w/ Markus Hoehnerbach, Jay Shah(@ultraproduct), Timmy Liu, Vijay Thakkar (@__tensorcore__ ), Tri Dao (@tri_dao) 1/

English

230

1.8K

185.5K

Ted - 🥖/acc รีทวีตแล้ว

internet explorer — e/sync@hyperdisruption·3 Mar

x402 compute? @PrimeIntellect

shafu@shafu0x

if you don't know what to build create x402 primitives! x402 vpn x402 storage x402 gpu x402 compute x402 file converter

Español

Ted - 🥖/acc รีทวีตแล้ว

Princertitude@princertitude·2 Mar

Le truc le plus cher au kilo sur la photo est à base d'oeufs.

Français

121

838

13.5K

2.1M

Ted - 🥖/acc@ted_engineer·2 Mar

The next claude-code is a Cython fork The model streams python bytecode The software execute it print to speak. input to interract with the user Modules/functions = skills. Portable. Hardware-agnostic.

English

Ted - 🥖/acc@ted_engineer·1 Mar

Tu écoutes claude mc J'écris claude.md

Français

Ted - 🥖/acc@ted_engineer·26 Şub

@tri_dao @MayankMish98 A great man once said "Neural net training fails silently"

English

321

Tri Dao@tri_dao·26 Şub

This was a wild bug hunt, weeks of effort from @MayankMish98 to track down. The wrong init of Mamba2 in many reimplementations causes the layer to decay its states too quickly, focusing in short context instead. Pretraining is mostly about getting these little things right

Mayank Mishra@MayankMish98

We identified an issue with the Mamba-2 🐍 initialization in HuggingFace and FlashLinearAttention repository (dt_bias being incorrectly initialized). This bug is related to 2 main issues: 1. init being incorrect (torch.ones) if Mamba-2 layers are used in isolation without the Mamba2ForCausalLM model class (this has been already fixed: github.com/fla-org/flash-…). 2. Skipping initialization due to meta device init for DTensors with FSDP-2 (github.com/fla-org/flash-… will fix this issue upon merging). The difference is substantial. Mamba-2 seems to be quite sensitive to the initialization. Check out our experiments at the 7B MoE scale: wandb.ai/mayank31398/ma… Special thanks to @kevinyli_, @bharatrunwal2, @HanGuo97, @tri_dao and @_albertgu 🙏 Also thanks to @SonglinYang4 for quickly helping in merging the PR.

English

377

31.6K

Ted - 🥖/acc@ted_engineer·26 Şub

@ns123abc generating actual binary would be extremely dumb. Bytecode however would make a lot of sense

English

110

NIK@ns123abc·25 Şub

🚨 Elon Musk predicted this 2 weeks ago: “By the end of this year you won’t even bother doing coding. The AI will just create the binary directly… and bypass traditional coding entirely.” ITS HAPPENING

dr. jack morris@jxmnop

actually wtf somebody wrote a paper about the 491-parameter transformer they trained for 10-digit addition turns out Codex can one-shot the task. 100% with only 343 parameters. the solution is a single function 'hand_set_weights_magic' and it looks like this:

English

1.1K

324.2K

ค้นพบ

@allTheYud @megaeth @bread_ @DuonLabsHQ @fchollet @OpenAI @iamgingertrash @PrimeIntellect