Bharrguv Vakharia

1.2K posts

Bharrguv Vakharia banner
Bharrguv Vakharia

Bharrguv Vakharia

@Bharrguv3

Passionate Blogger & Tech Enthusiast | Exploring Blockchain & Web3.0 | DevOps Enthusiast | DSA

Katılım Ekim 2022
2.4K Takip Edilen276 Takipçiler
Prithvi Yadav
Prithvi Yadav@Mr_PrithviYadav·
My first SaaS payment just hit my bank Built it. Shipped it. Sold it ₹60,000 from my first client I’ve never earned like this before. Not from something I created. This isn’t just money It’s proof. Proof that building something real works. The journey has officially begun.
Prithvi Yadav tweet media
English
106
18
773
38.6K
om chillure
om chillure@OmChillure·
So here is my journey from a NPC to landing job at xAI 1. The starting - started coding in May 2023, after my CET exams. - as every beginner started with MERN stack, web stuff. - achieved nothing much. - made a freelance project at end of 1st year thats it.
English
33
9
447
35.6K
Bharrguv Vakharia
Bharrguv Vakharia@Bharrguv3·
7/ Before a model understands language, it first turns meaning into geometry. That idea made LLMs much easier for me to understand.
English
0
0
0
1
Bharrguv Vakharia
Bharrguv Vakharia@Bharrguv3·
6/ One important thing: Embeddings aren’t fully fixed. Context changes them. That’s how the model understands the difference between “river bank” and “bank account.”
English
1
0
0
2
Bharrguv Vakharia
Bharrguv Vakharia@Bharrguv3·
1/ LLMs don’t understand words. They understand vectors. Before a model processes language, every token becomes a list of numbers called an embedding.
English
1
0
0
4
Bharrguv Vakharia
Bharrguv Vakharia@Bharrguv3·
(9/n) The more I learn AI, the more I realize: LLMs feel less magical when you understand the building blocks. And once the building blocks click… the architecture starts making sense. (10/n) What Transformer concept took you the longest to understand?
English
0
0
0
4
Bharrguv Vakharia
Bharrguv Vakharia@Bharrguv3·
(8/n) Another big insight: GPT does NOT use the full original Transformer. GPT is decoder-only. That means it focuses on one task: predicting the next token.
English
1
0
0
2
Bharrguv Vakharia
Bharrguv Vakharia@Bharrguv3·
(7/n) One misconception I had: Those arrows in Transformer diagrams are not just visual decoration. They are residual (skip) connections. They help gradients flow through deep networks and make training stable. Without them, very deep Transformers become much harder to train.
English
1
0
0
4
Bharrguv Vakharia
Bharrguv Vakharia@Bharrguv3·
(6/n) Positional Encoding Transformers don’t naturally understand order. So position information is added to embeddings. Without positional encoding, the model would know the words… …but not whether “dog bites man” or “man bites dog.”
English
1
0
0
5
Bharrguv Vakharia
Bharrguv Vakharia@Bharrguv3·
(5/n) Feed-Forward Networks (FFN) A simple way to think about it: Attention is the model’s listening. FFN is the model’s thinking. After gathering context, each token passes through a small neural network to transform that information.
English
1
0
0
7
Bharrguv Vakharia
Bharrguv Vakharia@Bharrguv3·
(4/n) Multi-Head Attention Attention doesn’t run once. It runs multiple times in parallel. Why? Because different heads can learn different relationships: • grammar • long-range dependencies • word associations • contextual meaning
English
1
0
0
5
Bharrguv Vakharia
Bharrguv Vakharia@Bharrguv3·
(3/n) The original Transformer has 2 main parts: Encoder → turns input text into contextual representations Decoder → generates output one token at a time But what really matters is what happens inside.
English
1
0
0
3
Bharrguv Vakharia
Bharrguv Vakharia@Bharrguv3·
(1/n) One of the biggest breakthroughs in modern AI came from a simple shift: Models stopped processing language one word at a time. That shift was the Transformer.
English
1
0
0
11
Bharrguv Vakharia
Bharrguv Vakharia@Bharrguv3·
(2/n) Older Seq2Seq models relied on recurrence. They read tokens sequentially. Transformers changed the game by using attention. Instead of asking “what came just before this?” the model asks: “Which other words matter most right now?”
English
1
0
0
1
Nick
Nick@imNiKkiiY·
Career update: I joined a yc startup one month ago. $1000/month
Nick tweet media
English
36
2
141
9.5K
Bharrguv Vakharia
Bharrguv Vakharia@Bharrguv3·
Autoregression: Definition: The process where the model’s output at step T becomes part of the input for step T + 1.
English
0
0
0
6
Bharrguv Vakharia
Bharrguv Vakharia@Bharrguv3·
Self-Attention: What: A mechanism that allows the model to "score" every other word in a sentence to see how relevant they are to the current word. Example: In "The bank of the river," Attention helps the model know "bank" refers to land, not money.
English
1
0
0
1
Bharrguv Vakharia
Bharrguv Vakharia@Bharrguv3·
GPT (Generative Pre-trained Transformer) is a Decoder-only Transformer architecture. It consists of a stack of identical layers, each containing two main sub-layers: Masked Multi-Head Self-Attention and a Position-wise Feed-Forward Network. Unlike the original Transformer paper
English
1
0
0
24