Shiv

5

Ashh!! 🧋@AnshikaK7·1d

Does the process know I'm trusting it ? 😶‍🌫️

English

3

1

18

270

Shiv@TensorTunesAI·7h

Hey @lambdaviking Recently trained a small transformer (~472M tokens) with BPE, RoPE, GQA, etc., and now I’m exploring applying similar ideas to Indian classical music. Specifically looking at representing ragas as sequence data (starting with MIDI, possibly moving to audio later). Curious about whether transformers can actually capture deeper raga structure , not just note sequences, but progression, mood, and inherent constraints. do you think this is something transformers can learn with scale, or would it require a different modeling approach / inductive bias? Or i focus on our classic ml , cnn-rnn-lstm ? Would love your perspective x.com/i/status/20331…

English

10

William Merrill@lambdaviking·11h

[8/8] Paper link: arxiv.org/abs/2603.03612

English

6

313

William Merrill@lambdaviking·11h

[1/8] New paper with Hongjian Jiang, @YanhongLi2062, Anthony Lin, @Ashish_S_AI: 📜Why Are Linear RNNs More Parallelizable? We identify expressivity differences between linear/nonlinear RNNs and, conversely, barriers to parallelizing nonlinear RNNs 🧵👇

English

3

18

118

8.2K

Shiv@TensorTunesAI·7h

Recently trained a small transformer (~472M tokens) with BPE, RoPE, GQA, etc., and now I’m exploring applying similar ideas to Indian classical music. Specifically looking at representing ragas as sequence data (starting with MIDI, possibly moving to audio later). Curious about whether transformers can actually capture deeper raga structure , not just note sequences, but progression, mood, and inherent constraints. do you think this is something transformers can learn with scale, or would it require a different modeling approach / inductive bias? Or i focus on our classic ml , cnn-rnn-lstm ? Would love your perspective. x.com/i/status/20331…

English

William Merrill@lambdaviking

17

Mayank Mishra@MayankMish98·8h

RNNs are cooking today! 🚀🚀🚀

[1/8] New paper with Hongjian Jiang, @YanhongLi2062, Anthony Lin, @Ashish_S_AI: 📜Why Are Linear RNNs More Parallelizable? We identify expressivity differences between linear/nonlinear RNNs and, conversely, barriers to parallelizing nonlinear RNNs 🧵👇

English

1

23

1.8K

Shiv@TensorTunesAI·7h

@MayankMish98 Let's connect

English

5

Mayank Mishra@MayankMish98·1 Ara

Big news! 🎉 TPU support for pretraining is now live on lm-engine, powered by PyTorch-XLA. Faster, scalable training is just a clone away: github.com/open-lm-engine… (tested on TPU v6e)

English

2

13

903

Shiv@TensorTunesAI·9h

@jojokompella Let's connect

English

7

Ramakrishna kompella@jojokompella·2d

I did some tests myself, putting it out soon. Expected it to be significantly better than the competition for Indian languages. For lower resource ones, it is. But not for high resource. Sarvam 30B is not significantly worse than 105B though

nullptr@resetptr

ran some quick weekend experiments on @SarvamAI's 105B model on a subset of the IndicMMLU-Pro dataset Sarvam's model is really good at reasoning efficiency. uses ~2.5x less tokens to reach ~same accuracy

English

0

2

575

Shiv@TensorTunesAI·9h

@ashanviii what abt sleep🙂

English

17

Ashanvi@ashanviii·10h

@TensorTunesAI Thank you for your kind words sir

GIF

English

0

1

52

Ashanvi@ashanviii·11h

i tried designing something in 20 mins and yeah… not really sure what direction i was going in, kinda just winged it

English

11

0

36

445

Shiv@TensorTunesAI·10h

@ItsRoboki Good explaination

English

1

10

Jagrit@ItsRoboki·13h

"99% of Javascript developers think async is parallel" -> Javascript is single threaded which means that any operation will block the main thread -> Even "console.log" blocks the main thread if spammed -> Worker threads is the only way to get parallelism in javascript How does Javascript work? -> Javascript internally uses a Call Stack (LIFO), last on the stack will be executed first. -> If the function takes too long it will block the thread So why does the UI not freeze? -> While javascript is single threaded, the environment it runs on (Chrome, Firefox, Node.js) are multi threaded. -> For the most part JS hands over the task to the Browsers APIs, allowing async tasks to run. -> Once task is finished it is moved to a queue (eg. macro task queue) -> And this process keeps going. Through this JS can stay single threaded while the browser does all the task. And why is this not considered parallel? -> Because this is not parallel execution -> But task assignment and waiting for it to finish is a totally different concept known as Event Loop which is not parallel So where does the issue come? -> When we're not using the Browser APIs. -> As then the JS needs to run code on it's own in a single threaded environment. -> If the task is big the thread is blocked! -> Examples: "JSON.parse", or maps on huge arrays. are simple silent killers

English

4

1

16

181

Shiv@TensorTunesAI·12h

@mihirss2 @lossfunk Heyyy man , nice to meet you . Me too a Certified Sitar and tabla player (8 yrs) i just wanted to mix this Music of mine into equations and machines

English

0

2

28

mihir@mihirss2·12h

@TensorTunesAI @lossfunk Yeah I am an Indian classical vocalist. A Raga is complex. Yet, it has many elements of adaptation from past renditions; lot of scope for 'learning.' Students borrow styles from their gurus all the time. It is a curious problem how this notion translates to LMs/ML in general.

English

0

1

28

Lossfunk@lossfunk·15h

🚨 Shocking: Frontier LLMs score 85-95% on standard coding benchmarks. We gave them equivalent problems in languages they couldn't have memorized. They collapsed to 0-11%. Presenting EsoLang-Bench. Accepted to the Logical Reasoning and ICBINB workshops at ICLR 2026 🧵

English

111

210

1.6K

757.7K

Shiv@TensorTunesAI·12h

@mihirss2 @lossfunk you too into music ?

English

0

50

mihir@mihirss2·12h

@TensorTunesAI @lossfunk Would love to explore the idea of representing ragas in models further with you. This is literally what I was trying to brainstorm over yesterday lol. Lmk

English

0

1

50

Shiv@TensorTunesAI·13h

x.com/TensorTunesAI/…

Shiv@TensorTunesAI

Day 15 of building Neural Networks from first principles Today: Backpropagation through categorical cross-entropy (step-by-step) -> Notes and Documentation attached Here’s the intuition Loss function L = −∑ y · log(ŷ) Since y is one-hot, only ONE term survives → L = −log(ŷ_correct) Key gradient ∂L/∂ŷ = −y / ŷ Meaning: • Only the correct class gets gradient • Smaller ŷ → larger penalty → stronger learning Batch version (vectorized) d_inputs = −y_true / y_pred Stabilization Divide by number of samples (N) → prevents exploding gradients This is the exact signal that flows backward and updates weights in the network

ZXX

2

68

Shiv@TensorTunesAI·13h

Derivation:

English

0

2

72

Shiv@TensorTunesAI·13h

Day 16 of building Neural Networks from first principles Today: Why Softmax + Cross-Entropy gives such a clean gradient → Notes and NumPy implementation attached Most people memorize this: gradient = (y_pred - y_true) But here’s the real insight: Softmax + Cross-Entropy aren’t separate. They’re designed to collapse the chain rule. Instead of messy derivatives across layers, everything simplifies to: → error signal = prediction − truth That’s why: • Only the correct class gets strong correction • Wrong classes get proportional penalties • Training becomes stable and efficient Batch version: → (y_pred - y_true) / N This is the exact signal that flows backward and updates the network.

English

0

7

268

Shiv@TensorTunesAI·14h

@TensorTonic Well i did it withOUT Pytorch and loss.backward() 😎 x.com/i/status/20339…

Shiv@TensorTunesAI

Day 15 of building Neural Networks from first principles Today: Backpropagation through categorical cross-entropy (step-by-step) -> Notes and Documentation attached Here’s the intuition Loss function L = −∑ y · log(ŷ) Since y is one-hot, only ONE term survives → L = −log(ŷ_correct) Key gradient ∂L/∂ŷ = −y / ŷ Meaning: • Only the correct class gets gradient • Smaller ŷ → larger penalty → stronger learning Batch version (vectorized) d_inputs = −y_true / y_pred Stabilization Divide by number of samples (N) → prevents exploding gradients This is the exact signal that flows backward and updates weights in the network

English

1

7

TensorTonic@TensorTonic·15h

Ever wondered how PyTorch actually handles backpropagation? It builds a computational graph. Every operation you write, every multiply, every add, gets recorded as a node. Then it walks backward through that graph, applying the chain rule at every step. That's autograd. Most people treat it like magic. They call loss.backward() and move on. Read more here on TensorTonic: tensortonic.com/ml-math/graph-…

English

3

7

94

4.5K

Shiv@TensorTunesAI·16h

@LearnerA24985 hey , thanks man

English

27

Krishna@LearnerA24985·16h

@TensorTunesAI Keep it up 💪

English

0

1

34

Shiv@TensorTunesAI·16h

Starting a 10-day mini series on NLP Day 1 of learning NLP in 2 mins Starting with the basics. Before any model,we clean and structure the text. Bad text → bad model. Next: Text preprocessing and tokenization

English

0

8

302

Shiv@TensorTunesAI·16h

ZXX