Tensor Fiend

441 posts

Tensor Fiend

@tensorfiend

life revolving around tensors

Local Minima เข้าร่วม Eylül 2025

137 กำลังติดตาม23 ผู้ติดตาม

ทวีตที่ปักหมุด

Tensor Fiend@tensorfiend·13 Eki

ZXX

449

Tensor Fiend@tensorfiend·2d

2 - added ✅. 32 more to go ⏳. arxiviz.com

English

Tensor Fiend@tensorfiend·4d

@Kimi_Moonshot @zephyr_z9 @rasbt could be added to the last available empty space in your LLM collage 😅

English

1.1K

Kimi.ai@Kimi_Moonshot·4d

Introducing 𝑨𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 𝑹𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔: Rethinking depth-wise aggregation. Residual connections have long relied on fixed, uniform accumulation. Inspired by the duality of time and depth, we introduce Attention Residuals, replacing standard depth-wise recurrence with learned, input-dependent attention over preceding layers. 🔹 Enables networks to selectively retrieve past representations, naturally mitigating dilution and hidden-state growth. 🔹 Introduces Block AttnRes, partitioning layers into compressed blocks to make cross-layer attention practical at scale. 🔹 Serves as an efficient drop-in replacement, demonstrating a 1.25x compute advantage with negligible (<2%) inference latency overhead. 🔹 Validated on the Kimi Linear architecture (48B total, 3B activated parameters), delivering consistent downstream performance gains. 🔗Full report: github.com/MoonshotAI/Att…

English

326

13.4K

4.8M

Tensor Fiend@tensorfiend·4d

@rasbt Wowww, makes my job easier to add models in arxiviz.com 🙌🏾 x.com/tensorfiend/st…

Tensor Fiend@tensorfiend

Working on some cool looking project.

English

Sebastian Raschka@rasbt·4d

I (finally) put together a new LLM Architecture Gallery that collects the architecture figures all in one place! sebastianraschka.com/llm-architectu…

English

198

1.5K

8.2K

691.3K

Tensor Fiend@tensorfiend·4d

@jarvislabsai @LightningAI Damnn, CLI makes things easier . Will try it out🙌🏾

English

JarvisLabsAI@jarvislabsai·4d

@tensorfiend @LightningAI Take a look at our CLI - github.com/jarvislabsai/j… It's in beta, we hope you will love it.

English

Tensor Fiend@tensorfiend·5d

Training a Qwen 3 style 35M parameter model on @jarvislabsai . Training looks ok so far. @LightningAI makes everything super simple 🙇🏽‍♂️

Tensor Fiend@tensorfiend

Simple Thought Experiments: A tiny dataset for end to end LLM training. First step of building a tiny LLM (inspired by @karpathy). Had many learnings in the process of creating synthetic data. Tried my best to make the data better in my budget. tensorwrites.com/posts/ste-data…

English

161

Tensor Fiend@tensorfiend·4d

You don’t choose your company name, domain availability chooses it for you.

GIF

English

Tensor Fiend@tensorfiend·4d

@gordic_aleksa's blog on Flash Attention is a pretty good introduction for anyone who is new to Flash Attention topic. gordicaleksa.medium.com/eli5-flash-att…

English

Tensor Fiend@tensorfiend·5d

@AnthropicAI @ChatGPTapp @antigravity @antigravity , @ChatGPTapp , @GeminiApp, @vercel (🙇🏽‍♂️), and Cloudflare have been very useful for developing and deploying. Great time to build whatever you think of!!!

English

Tensor Fiend@tensorfiend·5d

@AnthropicAI @ChatGPTapp @antigravity For now, only Computer Vision models are available. But I will be adding more models everytime I have time. Checkout arxiviz.com and do give feedback 🙂

English

Tensor Fiend@tensorfiend·5d

Introducing ArxiViz - an interactive platform to understand SOTA AI Models Visually, Mathematically, and in Code I have been thinking about building this when I made ML Roadmap few years back (github.com/shanmukh05/Mac…).

Tensor Fiend@tensorfiend

Working on some cool looking project.

English

Tensor Fiend@tensorfiend·5d

Model config. Tie Embeddings and GQA feels so illegal😂, they definitely are helping this 35M model

Tensor Fiend@tensorfiend

Generated samples after 5000 steps. Actually these are better than what my prev experiment generated

English

Tensor Fiend@tensorfiend·5d

Generated samples after 5000 steps. Actually these are better than what my prev experiment generated

Tensor Fiend@tensorfiend

Training a Qwen 3 style 35M parameter model on @jarvislabsai . Training looks ok so far. @LightningAI makes everything super simple 🙇🏽‍♂️

English

Tensor Fiend@tensorfiend·6d

It could never come up with an actual solution that works (Research problem). But it heps to test 99 other ideas that won’t work very quickly which eventually leads us to actual solution faster

neural nets.@cneuralnetwork

Have been using Claude Code with Opus 4.6 for few days I am in fear of losing my job

English

Tensor Fiend@tensorfiend·12 Mar

@rasbt is the guy who every researcher wants to be🫡. He is the GOAT

English

Tensor Fiend@tensorfiend·11 Mar

I myself feel proud when I don’t use AI tools for certain things. I was learning RL and could have used @NotebookLM but decided to read the RL book. Same happened when I was implementing Qwen 3 from scratch. Idk if it’s same for others, but urge to use AI is so real

Raj Dabre@prajdabre

Doomer post! It's over! It's JOE over! Last night I installed antigravity and simply asked it this: implement a mini transformer model optimized with kernels for my M4 MacBook. In 15 mins it gave me a fully tested, well optimized implementation. I saw the damn thing make mistakes, test, fix mistakes, log information, pull up system information, search online, think, on and on till it gave me exactly what I wanted. I spent 15 mins with my jaw wide open. It would have taken me at least a week 2 years ago. Now it's 15 mins. Now all you need is a verifiable idea. Don't even get me started on the internal version of Antigravity with internal models optimized for Googlers. It's a new era. Maybe I was too hard on openclaw.

English

Tensor Fiend@tensorfiend·10 Mar

@selfawareatom Working on it. Definitely will reach out with results (not for job 😬) x.com/tensorfiend/st…

Tensor Fiend@tensorfiend

@karpathy @JTMcG3 I was inspired by TinyStories and built one synthetic dataset myself (still not open sourced but will do it soon). Currently working on pretraining (this was inspired by nanochat 🫡)! tensorwrites.com/posts/ste-data…

English

1.1K

Rahul@selfawareatom·10 Mar

Now that our 15 member llm team is infamous, time to expand for next time! If you have done one or more of the following, then please reach out. - pretrained a model of any size, from scratch - posttrained any base model, end to end (data curation, sft, rl) - are a pytorch wizard - are a cuda kernel master - you have any other relevant skills and work to back it up firstnamesarvamai

English

701

81.7K

Tensor Fiend@tensorfiend·9 Mar

Working on some cool looking project.

English

113

Tensor Fiend@tensorfiend·9 Mar

@skydotcs Sleep - Wakeup for “veg/non-veg” sound - Eat - Sleep

English

sky@skydotcs·9 Mar

what does one do on a 15 hour flight

English

3.1K

Tensor Fiend@tensorfiend·9 Mar

English

1.3K

Andrej Karpathy@karpathy·9 Mar

@JTMcG3 looks great! :) TinyStories is the right thing to train on for very small models / Apple Silicon, where you can actually get somewhere. I might even make a note about that in the README. I would use this dataset in particular, it's the cleanest one afaik huggingface.co/datasets/karpa…

English

736

42.7K

Jim McGinley@JTMcG3·9 Mar

Ran a customized version of this locally using MLX, TinyStories & a 20M-parameter model on my Mac - super fun!! “Once upon a time, there was a big family who was very nice to the table. One day, the girl wanted to go on an adventure to be copters.” 10/10

Andrej Karpathy@karpathy

I packaged up the "autoresearch" project into a new self-contained minimal repo if people would like to play over the weekend. It's basically nanochat LLM training core stripped down to a single-GPU, one file version of ~630 lines of code, then: - the human iterates on the prompt (.md) - the AI agent iterates on the training code (.py) The goal is to engineer your agents to make the fastest research progress indefinitely and without any of your own involvement. In the image, every dot is a complete LLM training run that lasts exactly 5 minutes. The agent works in an autonomous loop on a git feature branch and accumulates git commits to the training script as it finds better settings (of lower validation loss by the end) of the neural network architecture, the optimizer, all the hyperparameters, etc. You can imagine comparing the research progress of different prompts, different agents, etc. github.com/karpathy/autor… Part code, part sci-fi, and a pinch of psychosis :)

English

360

68.6K

ค้นพบ

@Kimi_Moonshot @zephyr_z9 @rasbt @jarvislabsai @LightningAI @gordic_aleksa @AnthropicAI @ChatGPTapp