CTO

Generative AI - How it works

A gentle introduction to the principles of how generative AI works. This is a non-technical overview.

What is Generative AI?

Generative AI, a subset of artificial intelligence, is revolutionizing how we create and interact with digital content. It uses advanced algorithms to generate new data based on existing datasets, mimicking human-like creativity and problem-solving.

Generative AI is just one branch of Artificial Intelligence or Machine Learning but it is the one which is most readily available to non technical users and the one that is receiving the most attention. It is revolutionizing how we create and interact with digital content. It uses machine learning algorithms to generate new data based on existing large datasets, mimicking human-like problem solving and creativity.

In this post, we will delve a little into how it works, which will help us to assess the benefits and drawbacks, and make informed choices about when and how to use it.

Training

Machine Learning, in general, uses sets of relevant data to train a software model that can extrapolate to new data in the same domain, to provide answers to various questions.

Generative AI models, such as those used in ChatGPT or Gemini are trained using massive datasets, which are known as Large Language Models (LLMs). These datasets help the model learn the intricacies of language, patterns, and context. The process involves several key steps, the first of which is model training.

Initially, large amounts of data are fed into the machine learning model. Data is sourced from multiple source types e.g. text from books, articles, websites, sometimes transcripts of audio sources that may themselves be transcribed by machine learning tools. This helps the model to understand language structure, grammar, and various contexts.

Transformer Models

One of the most significant advancements in generative AI was the development of a class of models known as transformer models, such as GPT (Generative Pre-trained Transformer). These models use an architecture known as the Transformer, which relies on mechanisms known as "Attention Heads." The Attention mechanisms allow a model to focus on specific parts of the input data when generating responses. This allows a model to be tuned to weigh the importance of different words or phrases based on the context, leading to remarkably coherent and relevant answers.

Pre-training and Fine-tuning

Many models use two main stages, known as pre-training and fine-tuning.

Pre-training

During the pre-training phase, the model learns to predict the next word in a sentence, given the previous words. This helps the AI to understand the general structure and flow of language.

Fine-tuning

After pre-training, the model can be fine-tuned on a narrower dataset, which is often more specific to a desired application (e.g., customer support, medical advice). This enhances the general model's ability to provide accurate and context-specific responses.

Generating Responses

When you provide a model with a prompt or series of prompts, the pre-learned knowledge of the model is used to generate a contextual response. The context of the input prompt(s) is used in applying the attention mechanisms, and the model predicts the most likely next words, phrases or sentences to form a coherent answer. The response is generated token by token (a token can be one or more words or pieces of words), and the model continuously refines its output to make sure it aligns with the prompt.

Retrieval Augmented Generation

Retrieval-Augmented Generation (RAG) is a technique for combining a large language model with additional authoritative data from one or more domain specific data source. For example, we could augment the model with the entire works of Chaucer and ask for answers to be provided in Middle English.

Large Language Models (LLMs) are trained on vast volumes of data and use billions of parameters to generate original output for tasks like answering questions, translating languages, and completing sentences. RAG extends the already powerful capabilities of LLMs to specific domains or an organization's internal knowledge base, all without the need to retrain the model. It is a very useful approach to improving LLM output so it remains relevant, accurate, and useful in various contexts with which the LLM was not specifically trained.

Applications

The practical applications of this technology are diverse and impactful. Generative AI can help us to be more productive by handling some of our more mundane task. It is already being used in many domains, some example of which we've listed below.

Converse with users in natural language
Generate creative content like stories, poetry, and articles
Assist with coding by providing suggestions and debugging help
Enhance customer service with automated, intelligent responses
Support educational tools and personalized learning experiences

Summing Up

In essence, the core of generative AI is a prediction engine, that given an input and a context can, with surprising effectiveness, predict what is most likely to come next.

By gaining a basic understanding of how generative AI works, we can better appreciate the intricacies behind how it provides intelligent, context-aware answers, and make more informed decisions about when and where it is appropriate to use it.

We will return to Generative AI in a future post, discussing how we are using it at Wayside and in Nualang.