It is a language model based on paper ‘Attention Is All You Need’ from 2017
Below is an image taken from the paper, describing the model architecture of Transformer
Interfaces for Explaining Transformer Language Models
Demystifying Transformers Architecture in Machine Learning
CS480/680 Lecture 19: Attention and Transformer Networks
Transformer Neural Networks, ChatGPT's foundation, Clearly Explained!!!
Introduction to Attention Mechanism - Blog by Kemal Erdem
The math behind Attention: Keys, Queries, and Values matrices
Intuition Behind Self-Attention Mechanism in Transformer Networks
Illustrated Guide to Transformers Neural Network: A step by step explanation