Shiv

3.3K posts

Shiv

@TensorTunesAI

Agentic AI , building PolyMind core interest - music and deep reinforcement learning

Music Katılım Eylül 2024

509 Takip Edilen379 Takipçiler

Sabitlenmiş Tweet

Shiv@TensorTunesAI·15 Mar

I just trained my first LLM from scratch. No APIs or pre-trained models. A real transformer training pipeline. But the interesting part isn’t the model. It’s the hours of debugging before it finally worked. I wanted to deeply understand how modern language models actually work. Model: huggingface.co/hey-shiv/mini-… So I built a mini LLM pipeline myself: Dataset → BPE Tokenizer → Transformer → Training loop → Text generation This journey was inspired by Rishab sir’s AI classes, which pushed me to actually implement these systems instead of just learning the theory. Step 1 — Dataset I used the TinyStories dataset, which contains millions of short stories designed for training small language models. Challenges I ran into: • dataset shards (~250MB each) • slow downloads • broken scripts • environment issues Eventually everything was merged into one training corpus. Final dataset: ~1.78 GB text ~472M training tokens Step 2 — BPE Tokenizer Before the model can read text, it must convert it into tokens. I trained a Byte Pair Encoding (BPE) tokenizer. Vocabulary size: ~2000 tokens Example: "I love machine learning" → [41, 893, 176, 512] Tokenization is critical because it defines how the model sees language. Step 3 — Modern Transformer Architecture The model itself is a small transformer implemented in PyTorch. I implemented several modern transformer improvements used in today’s LLMs: • RoPE (Rotary Positional Embeddings) • RMSNorm • SwiGLU feed-forward layers • Grouped Query Attention (GQA) Even though the model is small, the architecture follows modern LLM design patterns. Step 4 — Training Training setup: Dataset size: 1.78 GB Training tokens: 472M Validation tokens: 52M Vocabulary size: ~2000 Hardware used: Apple Silicon GPU (MPS) Then came the best moment. Watching the model actually learn. Training loss: Start → 7.63 After training → ~2.29 That drop means the model is actually learning patterns from the dataset. Biggest lesson Building ML systems is 90% debugging infrastructure. Not fancy models. You spend most of your time fighting with: • datasets • tokenizers • training pipelines • environment issues But that’s where the real learning happens. Next experiments: • try different datasets • scale the model • improve tokenizer quality • experiment with RL approaches This was just the beginning. Huge thanks to @rishabh10x sir for the classes that inspired me to build this from scratch instead of just using APIs. If you're learning AI, try building at least one model pipeline yourself. It’s chaotic.

English

Shiv@TensorTunesAI·21h

@Pseudo_Sid26 New insight and fairly agreeable 🤞

English

Siddharth@Pseudo_Sid26·1d

My friend is doing a 20k/month job. The role she is at is very easily replaceable by AI. All she has to do is some 6-8 hours of session monitoring and log all of it in an excel file ( being non - biased plays a huge role here ). Now, if you see you can easily spin up an agent and get this task automated. I am sure there must be more such employees in the company doing this at scale cuz the company is pretty big. But the question is will she actually get replaced !? Maybe not !! Reason - The company might be burning an X amount on n number of employees for this same task. If they opt for any AI tool, the cost to operate it daily for 8 hrs at continous screen monitoring would cost them much more than what they are right now spending. Sometimes if is about affordability and profitability as well instead of blindly following trends. Most of the AI companies are burning VC money cuz of this reason as well. Opting for AI solutions where not necessarily needed will lead to nothing but cash burn. Very few companies do understand this.

English

1.5K

Shiv@TensorTunesAI·1d

@ashanviii @uncagedspirit_ My real life on my timeline now(graduation next year, but struggle toh chalu hai)

English

2.3K

Ashanvi@ashanviii·1d

for some reason, a graduate struggling to get a job is peak entertainment for the uneducated

English

141

2.3K

41.1K

Shiv@TensorTunesAI·4d

Also will be Offline on X for next 2 weeks coz of End Sem exams, uske baad se i have exciting plans!! Best of Luck for whatever you are doing , See you guys!

Shiv@TensorTunesAI

Zen mode on(End Sem exams) and I will be finishing two entire subjects in 2 days , every minute will be utilized as is it almost next to impossible to do this, but i like the feeling of taking up this challenge

English

Shiv@TensorTunesAI·4d

English

100

Shiv@TensorTunesAI·5d

@ashanviii

QME

Ashanvi@ashanviii·5d

at least it wasn't colleen hoover by it ends with us

English

407

Shiv@TensorTunesAI·5d

@itsOmSarraf_ @KaranVaidya6 You should definitely meet this cracked guy

English

Om Sarraf ( ॐ )@itsOmSarraf_·5d

@KaranVaidya6 me

191

Karan Vaidya@KaranVaidya6·5d

In BLR. Whom should I meet?

English

113

14.9K

Shiv@TensorTunesAI·5d

@itsOmSarraf_ Ye toh hota rahega, how's hackathon going 👊

English

Om Sarraf ( ॐ )@itsOmSarraf_·5d

crazy mid hackathon refresher 🤙🤙🤙

Vinayak Gavariya@VinayakGavariya

it’s bangaloring.

English

2.4K

Shiv@TensorTunesAI·5d

@SsharmaKirti You too natively from blore?

English

Kirti Sharma@SsharmaKirti·5d

@TensorTunesAI Oh ok niceee

English

Kirti Sharma@SsharmaKirti·5d

FINALLY !!! Bangalore is bangaloring 😅

Indonesia

599

Shiv@TensorTunesAI·5d

@SsharmaKirti I am from Bangalore (origin)

English

Kirti Sharma@SsharmaKirti·5d

@TensorTunesAI Where is your hometown?

English

Shiv@TensorTunesAI·6d

@cneuralnetwork Wait till you come to my hometown, and then watch how quickly money just disappears in the air 😂

English

neural nets.@cneuralnetwork·27 Nis

blr ka shopping ke liye 17k udd gye bc

Eesti

4.2K

Shiv@TensorTunesAI·6d

@SsharmaKirti Never had this in my entire life even once

English

Kirti Sharma@SsharmaKirti·6d

Look who's here 🤙

English

591

Shiv@TensorTunesAI·27 Nis

@ashanviii Neend nahi aa raha tha, left coffee Neend abb bhi nhi aa raha 🫤

Eesti

Ashanvi@ashanviii·27 Nis

coffee piyo toh neend aati hai coffee na piyo toh neend aati hai

Filipino

105

Shiv@TensorTunesAI·27 Nis

@ishratn00ri Wrong reply?

English

Ishrat Noori@ishratn00ri·27 Nis

@TensorTunesAI

GIF

QME

Ishrat Noori@ishratn00ri·13 Nis

how it feels to say okay instead of arguing

GIF

English

522

Shiv@TensorTunesAI·27 Nis

@ashanviii That's the real beauty,

English

Ashanvi@ashanviii·27 Nis

maybe try accepting the fact that you're a living breathing human being and not a plastic doll

سالي@sall897

وش الحل مع مسامات اليد الي كذا ؟

English

377

Shiv@TensorTunesAI·27 Nis

@HarveenChadha Sir, can we please do it after May 18 (i have end sem exams ) and i am from Bangalore only, really wanna attend

English

Harveen Singh Chadha@HarveenChadha·26 Nis

Forget leaders, Do we have any madlads in bangalore, who are hoarding gpus/mac studios instead of diet coke and are sane believers of local ai ?? Lets plan a meetup

Harveen Singh Chadha@HarveenChadha

I am so surprised none of the Indian tech leaders are obsessed about compute The US stocks are playing on themes like gpu shortage, memory shortage, cpu shortage while here we are so chill just by talking about applying agentic ai where we own no part of the stack

English

331

16.1K

Shiv@TensorTunesAI·27 Nis

@ishadotio ?? x.com/i/status/20439…

Isha@ishadotio

I believe if God has given me this level of hunger for success, then the path, the process, and the outcome are already aligned in my favor.

Isha@ishadotio·26 Nis

Isha ne LIZOL pi liya hai

हिन्दी

1.8K

Shiv@TensorTunesAI·27 Nis

@amaashvi thank you fr morning doze

English

Aashvi@amaashvi·27 Nis

Gm chat Go get it done, crush it all!!

English

102

Shiv@TensorTunesAI·27 Nis

Really insipred by this guy . So i just prompted out articles aligning with my interests , Have End sem next week , but just 30 mins a day is enough, so will try my best and even if not , idc-can do after exams There are 5 sets , This is the First Set - AI × Mind × Consciousness Consciousness, Creativity, and a Godlike AI nautil.us/consciousness-… We must build AI for people; not to be a person mustafa-suleyman.ai/seemingly-cons… The Claude Bliss Attractor astralcodexten.com/p/the-claude-b… Conscious AI You Say? Here Are Six Models of Consciousness scienceandculture.com/2026/03/consci… The Road to Honest AI astralcodexten.com/p/the-road-to-… Eleos AI welfare research 80000hours.org/podcast/episod…

Mrinal@Hi_Mrinal

Yoo I know I am a bit late with the list tho ... I hope you are able to squeeze enough time to dedicate atleast 30 mins of your day to read 1 article per day or maybe 2 I have also added few of my fav developer blogs/articles too Companies # Dropbox - Why we chose apache superset as our data exploration platform : [dropbox.tech/application/wh…] - How low-bit inference enables efficient AI : [dropbox.tech/machine-learni…] - Reducing our monorepo size to improve developer velocity : [dropbox.tech/infrastructure…] - Selecting a model for semantic search at Dropbox scale : [dropbox.tech/machine-learni…] - What’s new with Robinhood, our in-house load balancing service : [dropbox.tech/infrastructure…] # Twilio - Build an AI Video Analysis App with FastAPI, OpenAI, and SendGrid : [twilio.com/en-us/blog/dev…] - Calculating Character Count of RCS Messages : [twilio.com/en-us/blog/dev…] - Making Alt Text Fast: How Twilio Scaled Docs Accessibility with Automation : [twilio.com/en-us/blog/dev…] # Reddit - Evolving Signals-Joiner with Custom Joins in Apache Flink : [reddit.com/r/RedditEng/co…] - Query Autocomplete from LLMs : [reddit.com/r/RedditEng/co…] - Evolution of Reddit's In-house P0 Media Detection : [reddit.com/r/RedditEng/co…] - An In-Depth Look at the Notifications Recommender System : [reddit.com/r/RedditEng/co…] # Pinterest - How Pinterest Built a Real‑Time Radar for Violative Content using AI : [medium.com/pinterest-engi…] - Next-Level Personalization: How 16k+ Lifelong User Actions Supercharge Pinterest’s Recommendations : [medium.com/pinterest-engi…] # Bumble - Who, where, when: a components system for allotting team member responsibilities : [medium.com/bumble-tech/wh…] Developers Blog list Some developers blog list I love to go through for maximum experience learning ... What is Experience Learning ?? Something which an engineer learns only while experiencing new problems, new limitations, new constraints from different situations ... # Pragmatic Engineer - Cloudflare rewrites Next.js as AI rewrites commercial open source : [blog.pragmaticengineer.com/the-pulse-clou…] - Is the FDE role becoming less desirable? : [blog.pragmaticengineer.com/is-the-fde-rol…] # Julia Evans (1/3 top favs of my list) - Examples for the tcpdump and dig man pages: [jvns.ca/blog/2026/03/1…] - Using `make` to compile C programs (for non-C-programmers) : [jvns.ca/blog/2025/06/1…] - What helps people get comfortable on the command line? : [jvns.ca/blog/2023/08/0…] # Dan Lu - Tracing vs Sampling : [danluu.com/perf-tracing/] - How are coorporate blogs are written : [danluu.com/corp-eng-blogs/] # Martin Fowler (2/3 top favs) - APIs should not be copyrightable : [martinfowler.com/articles/copyr…] - Refactoring Module Dependencies : [martinfowler.com/articles/refac…]

English

237

Keşfet

@Pseudo_Sid26 @ashanviii @uncagedspirit_ @itsOmSarraf_ @KaranVaidya6 @SsharmaKirti @cneuralnetwork @elonmusk