What is it?

A deep learning architecture which is based on multi-head attention.

Transformers were introduced to the world through a 2017 paper by eight scientists at Google: “Attention Is All You Need”. A paper which is seen as the turning point of modern artificial intelligence.

Why was it created?

Previously, ML architectures such as recurrent architectures, long short-term memory took much longer to train.

Transformers enabled more efficient training, and by proxy made possible the wave of LLMs (Large Language Models) we have access to today.