Foundation Models

A type of ML model that is trained on a broad dataset. These models are generally used for general or broad use cases, and may be adaptable through fine-tuning. Foundation models like the ones powering OpenAI’s ChatGPT, can cost hundreds of millions of dollars to train. The term was coined in Aug. 2021 at Stanford. —— Foundation models are near universally based on Transformers.

Mamba Model

Why? Foundation Models Linear Attention Gated Convolution Recurrent Models

Transformers

What is it? A deep learning architecture which is based on multi-head attention. Transformers were introduced to the world through a 2017 paper by eight scientists at Google: “Attention Is All You Need”. A paper which is seen as the turning point of modern artificial intelligence. Why was it created? Previously, ML architectures such as recurrent architectures, long short-term memory took much longer to train. Transformers enabled more efficient training, and by proxy made possible the wave of LLMs (Large Language Models) we have access to today.