Shiv

3.3K posts

Shiv banner
Shiv

Shiv

@TensorTunesAI

Agentic AI , building PolyMind core interest - music and deep reinforcement learning

Music Katılım Eylül 2024
509 Takip Edilen379 Takipçiler
Sabitlenmiş Tweet
Shiv
Shiv@TensorTunesAI·
I just trained my first LLM from scratch. No APIs or pre-trained models. A real transformer training pipeline. But the interesting part isn’t the model. It’s the hours of debugging before it finally worked. I wanted to deeply understand how modern language models actually work. Model: huggingface.co/hey-shiv/mini-… So I built a mini LLM pipeline myself: Dataset → BPE Tokenizer → Transformer → Training loop → Text generation This journey was inspired by Rishab sir’s AI classes, which pushed me to actually implement these systems instead of just learning the theory. Step 1 — Dataset I used the TinyStories dataset, which contains millions of short stories designed for training small language models. Challenges I ran into: • dataset shards (~250MB each) • slow downloads • broken scripts • environment issues Eventually everything was merged into one training corpus. Final dataset: ~1.78 GB text ~472M training tokens Step 2 — BPE Tokenizer Before the model can read text, it must convert it into tokens. I trained a Byte Pair Encoding (BPE) tokenizer. Vocabulary size: ~2000 tokens Example: "I love machine learning" → [41, 893, 176, 512] Tokenization is critical because it defines how the model sees language. Step 3 — Modern Transformer Architecture The model itself is a small transformer implemented in PyTorch. I implemented several modern transformer improvements used in today’s LLMs: • RoPE (Rotary Positional Embeddings) • RMSNorm • SwiGLU feed-forward layers • Grouped Query Attention (GQA) Even though the model is small, the architecture follows modern LLM design patterns. Step 4 — Training Training setup: Dataset size: 1.78 GB Training tokens: 472M Validation tokens: 52M Vocabulary size: ~2000 Hardware used: Apple Silicon GPU (MPS) Then came the best moment. Watching the model actually learn. Training loss: Start → 7.63 After training → ~2.29 That drop means the model is actually learning patterns from the dataset. Biggest lesson Building ML systems is 90% debugging infrastructure. Not fancy models. You spend most of your time fighting with: • datasets • tokenizers • training pipelines • environment issues But that’s where the real learning happens. Next experiments: • try different datasets • scale the model • improve tokenizer quality • experiment with RL approaches This was just the beginning. Huge thanks to @rishabh10x sir for the classes that inspired me to build this from scratch instead of just using APIs. If you're learning AI, try building at least one model pipeline yourself. It’s chaotic.
Shiv tweet media
English
9
0
27
5K
Shiv
Shiv@TensorTunesAI·
@Pseudo_Sid26 New insight and fairly agreeable 🤞
English
0
0
1
14
Siddharth
Siddharth@Pseudo_Sid26·
My friend is doing a 20k/month job. The role she is at is very easily replaceable by AI. All she has to do is some 6-8 hours of session monitoring and log all of it in an excel file ( being non - biased plays a huge role here ). Now, if you see you can easily spin up an agent and get this task automated. I am sure there must be more such employees in the company doing this at scale cuz the company is pretty big. But the question is will she actually get replaced !? Maybe not !! Reason - The company might be burning an X amount on n number of employees for this same task. If they opt for any AI tool, the cost to operate it daily for 8 hrs at continous screen monitoring would cost them much more than what they are right now spending. Sometimes if is about affordability and profitability as well instead of blindly following trends. Most of the AI companies are burning VC money cuz of this reason as well. Opting for AI solutions where not necessarily needed will lead to nothing but cash burn. Very few companies do understand this.
English
6
1
48
1.5K
Shiv
Shiv@TensorTunesAI·
@ashanviii @uncagedspirit_ My real life on my timeline now(graduation next year, but struggle toh chalu hai)
English
1
0
6
2.3K
Ashanvi
Ashanvi@ashanviii·
for some reason, a graduate struggling to get a job is peak entertainment for the uneducated
English
25
141
2.3K
41.1K
Shiv
Shiv@TensorTunesAI·
Zen mode on(End Sem exams) and I will be finishing two entire subjects in 2 days , every minute will be utilized as is it almost next to impossible to do this, but i like the feeling of taking up this challenge
English
0
0
2
100
Ashanvi
Ashanvi@ashanviii·
at least it wasn't colleen hoover by it ends with us
Ashanvi tweet media
English
4
0
11
407
Karan Vaidya
Karan Vaidya@KaranVaidya6·
In BLR. Whom should I meet?
English
65
1
113
14.9K
Shiv
Shiv@TensorTunesAI·
@itsOmSarraf_ Ye toh hota rahega, how's hackathon going 👊
English
0
0
0
71
Kirti Sharma
Kirti Sharma@SsharmaKirti·
FINALLY !!! Bangalore is bangaloring 😅
Indonesia
15
0
47
599
Shiv
Shiv@TensorTunesAI·
@cneuralnetwork Wait till you come to my hometown, and then watch how quickly money just disappears in the air 😂
English
0
0
0
35
neural nets.
neural nets.@cneuralnetwork·
blr ka shopping ke liye 17k udd gye bc
Eesti
8
0
86
4.2K
Shiv
Shiv@TensorTunesAI·
@SsharmaKirti Never had this in my entire life even once
English
1
0
2
28
Kirti Sharma
Kirti Sharma@SsharmaKirti·
Look who's here 🤙
Kirti Sharma tweet media
English
12
0
39
591
Shiv
Shiv@TensorTunesAI·
@ashanviii Neend nahi aa raha tha, left coffee Neend abb bhi nhi aa raha 🫤
Eesti
1
0
3
70
Ashanvi
Ashanvi@ashanviii·
coffee piyo toh neend aati hai coffee na piyo toh neend aati hai
Filipino
15
4
105
2K
Ishrat Noori
Ishrat Noori@ishratn00ri·
how it feels to say okay instead of arguing
GIF
English
7
1
28
522
Shiv
Shiv@TensorTunesAI·
@ashanviii That's the real beauty,
English
0
0
3
22
Shiv
Shiv@TensorTunesAI·
@HarveenChadha Sir, can we please do it after May 18 (i have end sem exams ) and i am from Bangalore only, really wanna attend
English
0
0
0
37
Isha
Isha@ishadotio·
Isha ne LIZOL pi liya hai
Isha tweet media
हिन्दी
24
0
41
1.8K
Shiv
Shiv@TensorTunesAI·
@amaashvi thank you fr morning doze
English
0
0
2
6
Aashvi
Aashvi@amaashvi·
Gm chat Go get it done, crush it all!!
English
2
0
6
102
Shiv
Shiv@TensorTunesAI·
Really insipred by this guy . So i just prompted out articles aligning with my interests , Have End sem next week , but just 30 mins a day is enough, so will try my best and even if not , idc-can do after exams There are 5 sets , This is the First Set - AI × Mind × Consciousness Consciousness, Creativity, and a Godlike AI nautil.us/consciousness-… We must build AI for people; not to be a person mustafa-suleyman.ai/seemingly-cons… The Claude Bliss Attractor astralcodexten.com/p/the-claude-b… Conscious AI You Say? Here Are Six Models of Consciousness scienceandculture.com/2026/03/consci… The Road to Honest AI astralcodexten.com/p/the-road-to-… Eleos AI welfare research 80000hours.org/podcast/episod…
Shiv tweet media
Mrinal@Hi_Mrinal

Yoo I know I am a bit late with the list tho ... I hope you are able to squeeze enough time to dedicate atleast 30 mins of your day to read 1 article per day or maybe 2 I have also added few of my fav developer blogs/articles too Companies # Dropbox - Why we chose apache superset as our data exploration platform : [dropbox.tech/application/wh…] - How low-bit inference enables efficient AI : [dropbox.tech/machine-learni…] - Reducing our monorepo size to improve developer velocity : [dropbox.tech/infrastructure…] - Selecting a model for semantic search at Dropbox scale : [dropbox.tech/machine-learni…] - What’s new with Robinhood, our in-house load balancing service : [dropbox.tech/infrastructure…] # Twilio - Build an AI Video Analysis App with FastAPI, OpenAI, and SendGrid : [twilio.com/en-us/blog/dev…] - Calculating Character Count of RCS Messages : [twilio.com/en-us/blog/dev…] - Making Alt Text Fast: How Twilio Scaled Docs Accessibility with Automation : [twilio.com/en-us/blog/dev…] # Reddit - Evolving Signals-Joiner with Custom Joins in Apache Flink : [reddit.com/r/RedditEng/co…] - Query Autocomplete from LLMs : [reddit.com/r/RedditEng/co…] - Evolution of Reddit's In-house P0 Media Detection : [reddit.com/r/RedditEng/co…] - An In-Depth Look at the Notifications Recommender System : [reddit.com/r/RedditEng/co…] # Pinterest - How Pinterest Built a Real‑Time Radar for Violative Content using AI : [medium.com/pinterest-engi…] - Next-Level Personalization: How 16k+ Lifelong User Actions Supercharge Pinterest’s Recommendations : [medium.com/pinterest-engi…] # Bumble - Who, where, when: a components system for allotting team member responsibilities : [medium.com/bumble-tech/wh…] Developers Blog list Some developers blog list I love to go through for maximum experience learning ... What is Experience Learning ?? Something which an engineer learns only while experiencing new problems, new limitations, new constraints from different situations ... # Pragmatic Engineer - Cloudflare rewrites Next.js as AI rewrites commercial open source : [blog.pragmaticengineer.com/the-pulse-clou…] - Is the FDE role becoming less desirable? : [blog.pragmaticengineer.com/is-the-fde-rol…] # Julia Evans (1/3 top favs of my list) - Examples for the tcpdump and dig man pages: [jvns.ca/blog/2026/03/1…] - Using `make` to compile C programs (for non-C-programmers) : [jvns.ca/blog/2025/06/1…] - What helps people get comfortable on the command line? : [jvns.ca/blog/2023/08/0…] # Dan Lu - Tracing vs Sampling : [danluu.com/perf-tracing/] - How are coorporate blogs are written : [danluu.com/corp-eng-blogs/] # Martin Fowler (2/3 top favs) - APIs should not be copyrightable : [martinfowler.com/articles/copyr…] - Refactoring Module Dependencies : [martinfowler.com/articles/refac…]

English
0
0
4
237