Positional Encoding allows us to inject information about the order of the sequence into an otherwise position-indifferent architecture. The original Transformer does this by injecting sinusoids into the word embeddings.
youtu.be/LSCsfeEELso#ArtificialIntelligence
Info:👇
Transformers: Residual Connections, Layer Normalization, and Position Wise Feedforward Networks
In this episode, we take a look the components shared across all transformer sublayers
youtu.be/Wc8qJxoD52g#ArtificialIntelligence#nlproc
Transformers: Self-attention
In this video tutorial series, we take a look at the heart of the Transformer, the self-attention mechanism. After an intuitive breakdown of how it works, we code up the mechanism.
youtu.be/1BFE1Tfs8tM#NLProc#ArtificialIntelligence#ML#Data