Tensor Cruncher

310 posts

Tensor Cruncher

@tensorcruncher

Open source | Mechanistic Interpretability and Model Internals Other interests: Math, Systems, Music

Mumbai Katılım Aralık 2025

437 Takip Edilen23 Takipçiler

Tensor Cruncher@tensorcruncher·14h

@rishdotuk 😂

QME

Rishu Kumar@rishdotuk·17h

@tensorcruncher I hope they implement it, because Gemini still keeps remembering that I workout with resistance band. I asked it once, almost a year ago.

English

Tensor Cruncher@tensorcruncher·17h

Is there a feature where you can set your fav chatbot to omit context that is beyond a certain point in the past? I don't want to start a new session and miss out on the context from the near past. Or is this already done?

English

Tensor Cruncher retweetledi

Paul Graham@paulg·5d

A Brown professor gave his students a take-home midterm exam. After suspecting many cheated using AI, he made the final in-person. The orange dots are the midterm scores and the gray dots are the final scores. Looks like all but 3 cheated on the midterm.

English

1.9K

2.7K

36.1K

20.3M

Tensor Cruncher@tensorcruncher·4d

If I didn't need to work and didn't have responsibilities I would just travel the world and study math

English

Tensor Cruncher@tensorcruncher·4d

@ariG23498 @stevhliu More proof that MLE is more E than ML

English

Aritra 🤗@ariG23498·6d

I have always admired @stevhliu's work. I consider his technical writeups to be among the best there is. In the latest series he covers how `transformers` loads a model. This might seem to be very simple, but there are a lot of moving parts involved which makes loading insanely fast. > meta device > safetensors lazy loading > cuda cache allocator warmup > device and dtype prediction > weight fusing If you want to know how a SoTA library's core component works, I definitely recommend reading the blogs.

English

254

10K

Tensor Cruncher@tensorcruncher·4d

The person who does graphics for hugging face needs a raise

Aritra 🤗@ariG23498

Read about the changes: huggingface.co/blog/native-sp…

English

Tensor Cruncher@tensorcruncher·6 Tem

@ariG23498 @hmellor_ 😂

QME

Aritra 🤗@ariG23498·6 Tem

@hmellor_ @tensorcruncher Hate it when that happens.

English

124

Aritra 🤗@ariG23498·6 Tem

vLLM deep dive anyone? 🧑‍🍳

English

379

28K

Tensor Cruncher retweetledi

Tech Bro Memes@techbromemes·4 Tem

ZXX

847

24.4K

434.8K

Tensor Cruncher@tensorcruncher·4 Tem

This makes so much sense now. Going through @liquidai's LFM2 modelling file in the @huggingface transformers repo atm. It uses a mix of some attention and mostly double gated conv1D in its decoder layers. See the architecture diagram from the LFM2 technical report. Currently not dealing with MoE, there's another architecture folder for that.

English

Tensor Cruncher@tensorcruncher·3 Tem

This is basically how I've gotten as far as I have.

Adam Mainz@MainzOnX

People always ask for a list of what they should learn to get into ML systems, inference, etc. any of these lists you see for someone starting out is WRONG so let me tell you how you should be thinking. First off the reason they are wrong even if the list is genuinely accurate problems is the size of the space. In comparison with ML this is like having a set of hyper parameters that are far larger than necessary and with a search space in each that is overly large. Sure you might find the right combination eventually but you are more likely to run out of compute or hit a local minima. The real way to break into any space or to learn more is thinking in layers. You want 1. Strong fundamentals in ML if you are going to anything in the space 2. Strong programming fundamentals Ok cool so you are through layer one now what? From here we need to learn the specialization. This comes in T shaped development. You need to learn a small amount of everything you can (not too deep) and then really triple down on one or two topics. You might branch out and find interesting things you want to learn along the way and you can explore those a bit too. Find your path. So next time you see a giant scary list of topics don’t freak out and don’t really listen to all of it. Find what YOU want to focus on and don’t freak out. The only other place here is keeping up. So much new tech I would highly suggest having a small bit of familiarity. Still too much to learn but just know about what’s going on. Good luck friends. If you want advice on small sets of things to learn based on your interests let me know. Besides that don’t listen to any silly engagement bait lists

English

Tensor Cruncher@tensorcruncher·3 Tem

This is insane

ℏεsam@Hesamation

Meta burns $2.65B a year on AI tokens. at $300K for a Meta engineer, that's enough to pay ~9,000 engineers for a full year. now ask yourself: since the layoffs, has Meta shipped anything that feels like 9,000 engineers’ worth of output?

English

Tensor Cruncher retweetledi

Indra@IndraVahan·30 Haz

local gym owner warns people lifting weights at home are on a “very dangerous path”

First Squawk@FirstSquawk

ANTHROPIC CEO WARNS OPEN-SOURCE AI IS ON A "VERY DANGEROUS PATH"

English

2.8K

42.7K

852K

Tensor Cruncher retweetledi

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex·30 Haz

"how do these seemingly random Chinese companies have the resources/talent to make their own models?" …

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) tweet media

orcus108@orcus108

a TRILLION parameter model by a FOOD DELIVERY startup btw. maybe a dumb question, but how do these seemingly random Chinese companies have the resources/talent to make their own models?

English

937

72.8K

Tensor Cruncher@tensorcruncher·29 Haz

Just the kind of thing that makes me feel warm and fuzzy on a Mumbai monsoon morning

clem 🤗@ClementDelangue

The HF science team just made async RL weight sync ~100x cheaper on bandwidth, and you don't need a shared cluster anymore. The problem: every RL step, the trainer typically has to sync fresh weights to the inference engine. for a 7B in bf16 that's ~14GB. for a frontier 1T fp8 checkpoint, that's ~1TB; in bf16 it would be ~2TB. per sync. The insight: between two RL steps, ~99% of bf16 weights are bit-identical. at RL learning rates, the optimizer is whispering and bf16 literally cannot hear most of it. the stored bf16 bits don't change. What they shipped in TRL: only the changed elements get encoded as a sparse safetensors file, dropped into a Hugging Face Bucket, and fetched by vLLM. on Qwen3-0.6B, per-step payload goes from 1.2 GB to 20 to 35 MB. This is exactly what we built Buckets for: S3-like object storage on the Hub, Xet-backed (so even full snapshots only transfer the changed chunks). The cherry on top: we ran a FULL disaggregated training where: - the trainer lived on one box - vLLM ran inside a Hugging Face Space - the Wordle environment ran in another Space - weights flowed through one Hub bucket no shared cluster. no RDMA. no VPN. no NCCL across clouds. just HTTPS and a bucket. one GPU + a Hugging Face account is now enough to do real disaggregated RL. multi-replica inference fleets across regions become a small devops exercise, not a research project. Full write-up: huggingface.co/blog/delta-wei… Open source RL keeps eating the moat!

English

Tensor Cruncher retweetledi

Sumner L Norman@SumnerLN·26 Haz

Tl;dr -- Aleph's result is mega-cool. A vial of microbubbles + a chip the size of a coin mapping a living human brain. It’s built on a decade-long revolution in ultrasound science and technology. And we’re just getting started 🔊🔊🔊

English

1.1K

Tensor Cruncher@tensorcruncher·27 Haz

Beautiful. For some reason this evokes the image of the sophon from the 3 body problem being constructed.

YB@yb_effect

and OpenAI's pre-training team acknowledge how the mother of all pre-training runs will eventually have to be decentralized protocols.

English

Tensor Cruncher@tensorcruncher·26 Haz

Beautiful

Aleph@alephneuro

We recently obtained the highest-resolution 3D images of the human brain ever taken from outside the skull. This is the first look. Introducing Aleph, a research lab building brain interfaces for the telepathic future. (1/n)

English

Tensor Cruncher retweetledi

Aritra 🤗@ariG23498·26 Haz

[HF ML Club India] We are proud to announce the second IRL event from the HF ML Club India in Bengaluru. This time we have parterned with the @RedHat_AI India team (particularly their PyTorch Engg team). It is happening on the 25th of July. Talks: > @adithya_s_k speaks on OpenEnv > @adarshxs talks about SGLang > I cover torch profiling > Mansi (RedHat) takes on torch distributed > Arkadip (RedHat) provides in-depth knowledge about gpu2gpu comms in distributed setups This is also golden opportunity for folks who were not accepted for the first event, to come and enjoy the talks and network with peers.

English

12.4K

Tensor Cruncher@tensorcruncher·25 Haz

@ariG23498 Registered. Not skipping this one lmao

English

Aritra 🤗@ariG23498·25 Haz

If you know where to find this, I really respect you. Formal details dropping tomorrow 🤗

English

Keşfet

@rishdotuk @ariG23498 @stevhliu @hmellor_ @liquidai @huggingface @elonmusk @BarackObama