Bharrguv Vakharia

1.2K posts

Bharrguv Vakharia banner

Bharrguv Vakharia

Bharrguv Vakharia

@Bharrguv3

Passionate Blogger & Tech Enthusiast | Exploring Blockchain & Web3.0 | DevOps Enthusiast | DSA

Katılım Ekim 2022

2.4K Takip Edilen276 Takipçiler

Bharrguv Vakharia

Bharrguv Vakharia@Bharrguv3·9h

@Mr_PrithviYadav this is insane,

English

0

1

1

Prithvi Yadav

Prithvi Yadav@Mr_PrithviYadav·16h

My first SaaS payment just hit my bank Built it. Shipped it. Sold it ₹60,000 from my first client I’ve never earned like this before. Not from something I created. This isn’t just money It’s proof. Proof that building something real works. The journey has officially begun.

Prithvi Yadav tweet media

English

18

773

38.6K

Bharrguv Vakharia

Bharrguv Vakharia@Bharrguv3·9h

@OmChillure great share, proud of u

English

0

1

258

om chillure

om chillure@OmChillure·10h

So here is my journey from a NPC to landing job at xAI 1. The starting - started coding in May 2023, after my CET exams. - as every beginner started with MERN stack, web stuff. - achieved nothing much. - made a freelance project at end of 1st year thats it.

English

9

447

35.6K

Bharrguv Vakharia

Bharrguv Vakharia@Bharrguv3·1d

7/ Before a model understands language, it first turns meaning into geometry. That idea made LLMs much easier for me to understand.

English

0

0

1

Bharrguv Vakharia

Bharrguv Vakharia@Bharrguv3·1d

6/ One important thing: Embeddings aren’t fully fixed. Context changes them. That’s how the model understands the difference between “river bank” and “bank account.”

English

0

0

2

Bharrguv Vakharia

Bharrguv Vakharia@Bharrguv3·1d

1/ LLMs don’t understand words. They understand vectors. Before a model processes language, every token becomes a list of numbers called an embedding.

English

0

0

4

Bharrguv Vakharia

Bharrguv Vakharia@Bharrguv3·1d

(9/n) The more I learn AI, the more I realize: LLMs feel less magical when you understand the building blocks. And once the building blocks click… the architecture starts making sense. (10/n) What Transformer concept took you the longest to understand?

English

0

0

4

Bharrguv Vakharia

Bharrguv Vakharia@Bharrguv3·1d

(8/n) Another big insight: GPT does NOT use the full original Transformer. GPT is decoder-only. That means it focuses on one task: predicting the next token.

English

0

0

2

Bharrguv Vakharia

Bharrguv Vakharia@Bharrguv3·1d

(7/n) One misconception I had: Those arrows in Transformer diagrams are not just visual decoration. They are residual (skip) connections. They help gradients flow through deep networks and make training stable. Without them, very deep Transformers become much harder to train.

English

0

0

4

Bharrguv Vakharia

Bharrguv Vakharia@Bharrguv3·1d

(6/n) Positional Encoding Transformers don’t naturally understand order. So position information is added to embeddings. Without positional encoding, the model would know the words… …but not whether “dog bites man” or “man bites dog.”

English

0

0

5

Bharrguv Vakharia

Bharrguv Vakharia@Bharrguv3·1d

(5/n) Feed-Forward Networks (FFN) A simple way to think about it: Attention is the model’s listening. FFN is the model’s thinking. After gathering context, each token passes through a small neural network to transform that information.

English

0

0

7

Bharrguv Vakharia

Bharrguv Vakharia@Bharrguv3·1d

(4/n) Multi-Head Attention Attention doesn’t run once. It runs multiple times in parallel. Why? Because different heads can learn different relationships: • grammar • long-range dependencies • word associations • contextual meaning

English

0

0

5

Bharrguv Vakharia

Bharrguv Vakharia@Bharrguv3·1d

(3/n) The original Transformer has 2 main parts: Encoder → turns input text into contextual representations Decoder → generates output one token at a time But what really matters is what happens inside.

English

0

0

3

Bharrguv Vakharia

Bharrguv Vakharia@Bharrguv3·1d

(1/n) One of the biggest breakthroughs in modern AI came from a simple shift: Models stopped processing language one word at a time. That shift was the Transformer.

English

0

0

11

Bharrguv Vakharia

Bharrguv Vakharia@Bharrguv3·1d

(2/n) Older Seq2Seq models relied on recurrence. They read tokens sequentially. Transformers changed the game by using attention. Instead of asking “what came just before this?” the model asks: “Which other words matter most right now?”

English

0

0

1

Bharrguv Vakharia

Bharrguv Vakharia@Bharrguv3·1d

@imNiKkiiY Congratulations, How did you find one just curious?

English

0

1

32

Nick

Nick@imNiKkiiY·2d

Career update: I joined a yc startup one month ago. $1000/month

Nick tweet media

English

2

141

9.5K

Bharrguv Vakharia

Bharrguv Vakharia@Bharrguv3·1d

Autoregression: Definition: The process where the model’s output at step T becomes part of the input for step T + 1.

English

0

0

6

Bharrguv Vakharia

Bharrguv Vakharia@Bharrguv3·1d

Self-Attention: What: A mechanism that allows the model to "score" every other word in a sentence to see how relevant they are to the current word. Example: In "The bank of the river," Attention helps the model know "bank" refers to land, not money.

English

0

0

1

Bharrguv Vakharia

Bharrguv Vakharia@Bharrguv3·1d

GPT (Generative Pre-trained Transformer) is a Decoder-only Transformer architecture. It consists of a stack of identical layers, each containing two main sub-layers: Masked Multi-Head Self-Attention and a Position-wise Feed-Forward Network. Unlike the original Transformer paper

English

0

0

24

Keşfet

@Mr_PrithviYadav @OmChillure @imNiKkiiY @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates