Ted - 🥖/acc

780 posts

Ted - 🥖/acc banner
Ted - 🥖/acc

Ted - 🥖/acc

@ted_engineer

🇫🇷 / 25yo / dad / deep learning chef @duonlabshq

Savoie 가입일 Ağustos 2023
981 팔로잉195 팔로워
고정된 트윗
Ted - 🥖/acc
Ted - 🥖/acc@ted_engineer·
Bitter-pilled trading has been shipped 🫡 Introducing Hyperextropy 🧵
Ted - 🥖/acc tweet media
English
1
6
12
2K
Ted - 🥖/acc 리트윗함
roon
roon@tszzl·
@allTheYud well if there’s a history book…
English
14
3
329
11.4K
Ted - 🥖/acc 리트윗함
Arcee.ai
Arcee.ai@arcee_ai·
Today we're releasing Trinity-Large-Thinking. Available now on the Arcee API, with open weights on Hugging Face under Apache 2.0. We built it for developers and enterprises that want models they can inspect, post-train, host, distill, and own.
English
84
202
1.5K
412.6K
Ted - 🥖/acc 리트윗함
Backed
Backed@BackedApp·
Good agents, just on @megaeth. Mafia Apps are almost ready to deploy - @bread_ and the Mega team are doing a great job pushing to get as many agents live as possible. But we still need real use cases. @DuonLabsHQ has been building behind the scenes and we have something ready for you. Coming soon on Backed.
Luca@lvxsirio

Last week I had a great call with @_weidai about the robot economy. VCs clearly understand the market opportunity, but the sector still needs real use cases. Are there any agents ready to be deployed on Backed, with real iterations?

English
0
6
22
4.9K
Ted - 🥖/acc
Ted - 🥖/acc@ted_engineer·
gpt wrappers startups claude code wrappers companies Karpathy wrapper tech influencers
English
0
1
2
77
Ted - 🥖/acc 리트윗함
vittorio
vittorio@IterIntellectus·
this is art
vittorio tweet media
English
249
6.3K
43.3K
932.3K
Ted - 🥖/acc
Ted - 🥖/acc@ted_engineer·
@fchollet There are two kinds of person in ai: -Those who believe in the golden algorithm -Engineers
English
0
0
0
46
François Chollet
François Chollet@fchollet·
This is more evidence that current frontier models remain completely reliant on content-level memorization, as opposed to higher-level generalizable knowledge (such as metalearning knowledge, problem-solving strategies...)
Lossfunk@lossfunk

🚨 Shocking: Frontier LLMs score 85-95% on standard coding benchmarks. We gave them equivalent problems in languages they couldn't have memorized. They collapsed to 0-11%. Presenting EsoLang-Bench. Accepted to the Logical Reasoning and ICBINB workshops at ICLR 2026 🧵

English
193
330
3K
309.7K
Ted - 🥖/acc 리트윗함
tensorqt
tensorqt@tensorqt·
we're officially live with the open beta of Flywheel. Flywheel is Paradigma's vision on what autonomous research will run on. We've built all of this pre-funding, so expect the pace to accelerate significantly. Super eager to hear your feedback and your ideas. I'll be live tweeting some of my experiments with Flywheel in a few minutes. We're just getting started.
Paradigma@paradigmainc

introducing Flywheel: the infrastructure for autonomous research.

English
11
20
152
16.3K
Ted - 🥖/acc 리트윗함
the tiny corp
the tiny corp@__tinygrad__·
the tiny corp tweet media
ZXX
46
36
1K
48.8K
Ted - 🥖/acc 리트윗함
Alexander Long
Alexander Long@AlexanderLong·
insane sequence of statements buried in an Alibaba tech report
Alexander Long tweet media
English
229
945
6.9K
2.8M
Ted - 🥖/acc 리트윗함
Tri Dao
Tri Dao@tri_dao·
The FA4 paper is finally out after a year of work. On Blackwell GPUs, attention now goes about as fast as matmul even though the bottlenecks are so different! Tensor cores are now crazy fast that attn fwd is bottlenecked by exponential, and attn bwd is bottlenecked by shared memory bandwidth.  Some fun stuff in the redesigned algorithm to overcome these bottlenecks: exponential emulation with polynomials, new online softmax to avoid 90% of softmax rescaling, 2CTA MMA instructions that allow two thread blocks to share operands to reduce smem traffic.
Ted Zadouri@tedzadouri

Asymmetric hardware scaling is here. Blackwell tensor cores are now so fast, exp2 and shared memory are the wall. FlashAttention-4 changes the algorithm & pipeline so that softmax & SMEM bandwidth no longer dictate speed. Attn reaches ~1600 TFLOPs, pretty much at matmul speed! joint work w/ Markus Hoehnerbach, Jay Shah(@ultraproduct), Timmy Liu, Vijay Thakkar (@__tensorcore__ ), Tri Dao (@tri_dao) 1/

English
31
230
1.8K
185.5K
Ted - 🥖/acc 리트윗함
Princertitude
Princertitude@princertitude·
Le truc le plus cher au kilo sur la photo est à base d'oeufs.
Princertitude tweet media
Français
121
838
13.5K
2.1M
Ted - 🥖/acc
Ted - 🥖/acc@ted_engineer·
The next claude-code is a Cython fork The model streams python bytecode The software execute it print to speak. input to interract with the user Modules/functions = skills. Portable. Hardware-agnostic.
English
0
0
1
53
Ted - 🥖/acc
Ted - 🥖/acc@ted_engineer·
Tu écoutes claude mc J'écris claude.md
Ted - 🥖/acc tweet media
Français
0
0
0
40
Tri Dao
Tri Dao@tri_dao·
This was a wild bug hunt, weeks of effort from @MayankMish98 to track down. The wrong init of Mamba2 in many reimplementations causes the layer to decay its states too quickly, focusing in short context instead. Pretraining is mostly about getting these little things right
Mayank Mishra@MayankMish98

We identified an issue with the Mamba-2 🐍 initialization in HuggingFace and FlashLinearAttention repository (dt_bias being incorrectly initialized). This bug is related to 2 main issues: 1. init being incorrect (torch.ones) if Mamba-2 layers are used in isolation without the Mamba2ForCausalLM model class (this has been already fixed: github.com/fla-org/flash-…). 2. Skipping initialization due to meta device init for DTensors with FSDP-2 (github.com/fla-org/flash-… will fix this issue upon merging). The difference is substantial. Mamba-2 seems to be quite sensitive to the initialization. Check out our experiments at the 7B MoE scale: wandb.ai/mayank31398/ma… Special thanks to @kevinyli_, @bharatrunwal2, @HanGuo97, @tri_dao and @_albertgu 🙏 Also thanks to @SonglinYang4 for quickly helping in merging the PR.

English
2
20
377
31.6K
Ted - 🥖/acc
Ted - 🥖/acc@ted_engineer·
@ns123abc generating actual binary would be extremely dumb. Bytecode however would make a lot of sense
English
0
0
1
110
NIK
NIK@ns123abc·
🚨 Elon Musk predicted this 2 weeks ago: “By the end of this year you won’t even bother doing coding. The AI will just create the binary directly… and bypass traditional coding entirely.” ITS HAPPENING
dr. jack morris@jxmnop

actually wtf somebody wrote a paper about the 491-parameter transformer they trained for 10-digit addition turns out Codex can one-shot the task. 100% with only 343 parameters. the solution is a single function 'hand_set_weights_magic' and it looks like this:

English
71
90
1.1K
324.2K