Tensor Fiend

441 posts

Tensor Fiend banner
Tensor Fiend

Tensor Fiend

@tensorfiend

life revolving around tensors

Local Minima Beigetreten Eylül 2025
137 Folgt23 Follower
Angehefteter Tweet
Tensor Fiend
Tensor Fiend@tensorfiend·
Tensor Fiend tweet media
ZXX
0
0
3
451
Kimi.ai
Kimi.ai@Kimi_Moonshot·
Introducing 𝑨𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 𝑹𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔: Rethinking depth-wise aggregation. Residual connections have long relied on fixed, uniform accumulation. Inspired by the duality of time and depth, we introduce Attention Residuals, replacing standard depth-wise recurrence with learned, input-dependent attention over preceding layers. 🔹 Enables networks to selectively retrieve past representations, naturally mitigating dilution and hidden-state growth. 🔹 Introduces Block AttnRes, partitioning layers into compressed blocks to make cross-layer attention practical at scale. 🔹 Serves as an efficient drop-in replacement, demonstrating a 1.25x compute advantage with negligible (<2%) inference latency overhead. 🔹 Validated on the Kimi Linear architecture (48B total, 3B activated parameters), delivering consistent downstream performance gains. 🔗Full report: github.com/MoonshotAI/Att…
Kimi.ai tweet media
English
326
2K
13.4K
4.8M
Tensor Fiend
Tensor Fiend@tensorfiend·
Training a Qwen 3 style 35M parameter model on @jarvislabsai . Training looks ok so far. @LightningAI makes everything super simple 🙇🏽‍♂️
Tensor Fiend tweet mediaTensor Fiend tweet media
Tensor Fiend@tensorfiend

Simple Thought Experiments: A tiny dataset for end to end LLM training. First step of building a tiny LLM (inspired by @karpathy). Had many learnings in the process of creating synthetic data. Tried my best to make the data better in my budget. tensorwrites.com/posts/ste-data…

English
1
0
3
161
Tensor Fiend
Tensor Fiend@tensorfiend·
You don’t choose your company name, domain availability chooses it for you.
GIF
English
0
0
0
27
Tensor Fiend
Tensor Fiend@tensorfiend·
@rasbt is the guy who every researcher wants to be🫡. He is the GOAT
English
0
0
1
7
Rahul
Rahul@selfawareatom·
Now that our 15 member llm team is infamous, time to expand for next time! If you have done one or more of the following, then please reach out. - pretrained a model of any size, from scratch - posttrained any base model, end to end (data curation, sft, rl) - are a pytorch wizard - are a cuda kernel master - you have any other relevant skills and work to back it up firstnamesarvamai
English
34
35
701
81.7K
Tensor Fiend
Tensor Fiend@tensorfiend·
Working on some cool looking project.
English
0
0
1
115
Tensor Fiend
Tensor Fiend@tensorfiend·
@skydotcs Sleep - Wakeup for “veg/non-veg” sound - Eat - Sleep
English
0
0
1
42
sky
sky@skydotcs·
what does one do on a 15 hour flight
English
16
0
12
3.1K
Andrej Karpathy
Andrej Karpathy@karpathy·
@JTMcG3 looks great! :) TinyStories is the right thing to train on for very small models / Apple Silicon, where you can actually get somewhere. I might even make a note about that in the README. I would use this dataset in particular, it's the cleanest one afaik huggingface.co/datasets/karpa…
English
25
49
736
42.7K