|
41 | 41 | **How are foundation models trained differently from traditional AI systems?** One key innovation of foundation models is their training paradigm. Generally, foundation models use a two-stage training process. First, they go through what we call a pre-training, and then second, they can be adapted through various mechanisms like fine-tuning or scaffolding to perform specific tasks. Rather than learning from human-labeled examples for specific tasks, these models learn by finding patterns in huge amounts of unlabeled data. This shift toward self-supervised learning on massive datasets fundamentally changes not just how models learn, but also what kinds of capabilities and risks might emerge ([Bommasani et al., 2022](https://arxiv.org/abs/2108.07258)). From a safety perspective, this means we need to understand both how these training methods work and how they might lead to unexpected behaviors. |
42 | 42 |
|
43 | 43 | <figure markdown="span"> |
44 | | -{ loading=lazy } |
| 44 | +{ loading=lazy } |
45 | 45 | <figcaption markdown="1"><b>Figure 1.14:</b> On the Opportunities and Risks of Foundation Models ([Bommasani et al., 2022](https://arxiv.org/abs/2108.07258))</figcaption> |
46 | 46 | </figure> |
47 | 47 |
|
48 | 48 | **What is pre-training?** Pre-training is the initial phase where the model learns general patterns and knowledge from massive datasets of millions or billions of examples. During this phase, the model isn't trained for any specific task - instead, it develops broad capabilities that can later be specialized. This generality is both powerful and concerning from a safety perspective. While it enables the model to adapt to many different tasks, it also means we can't easily predict or constrain what the model might learn to do ([Hendrycks et al., 2022](https://arxiv.org/abs/2109.13916)). |
49 | 49 |
|
50 | 50 | <figure markdown="span"> |
51 | | -{ loading=lazy } |
| 51 | +{ loading=lazy } |
52 | 52 | <figcaption markdown="1"><b>Figure 1.15:</b> On the Opportunities and Risks of Foundation Models ([Bommasani et al., 2022](https://arxiv.org/abs/2108.07258))</figcaption> |
53 | 53 | </figure> |
54 | 54 |
|
55 | | -**How does self-supervised learning enable pre-training?** Self-supervised learning (SSL) is the key technical innovation that makes foundation models possible. This is how we actually implement the pre-training phase. Unlike traditional supervised learning, which requires human-labeled data, SSL leverages the inherent structure of the data itself to create training signals. For example, instead of manually labeling images, we might just hide part of a full image we already have and ask a model to predict what the rest should be. So it might predict the bottom half of an image given the top half, learning about which objects often appear together. As an example, it might learn that images with trees and grass at the top often have more grass, or maybe a path, at the bottom. It learns about objects and their context — trees and grass often appear in parks, dogs are often found in these environments, paths are usually horizontal, and so on. These learned representations can then be used for a wide variety of tasks that the model was not explicitly trained for, like identifying dogs in images, or recognizing parks - all without any human-provided labels! The same concept applies in language, a model might predict the next word in a sentence, such as "The cat sat on the ____," learning grammar, syntax, and context as long as we repeat this over huge amounts of text. |
| 55 | +**How does self-supervised learning enable pre-training?** Self-supervised learning (SSL) is the key technical innovation that makes foundation models possible. This is how we actually implement the pre-training phase. Unlike traditional supervised learning, which requires human-labeled data, SSL leverages the inherent structure of the data itself to create training signals. For example, instead of manually labeling images, we might just hide part of a full image we already have and ask a model to predict what the rest should be. So it might predict the bottom half of an image given the top half, learning about which objects often appear together. As an example, it might learn that images with trees and grass at the top often have more grass, or maybe a path, at the bottom. It learns about objects and their context - trees and grass often appear in parks, dogs are often found in these environments, paths are usually horizontal, and so on. These learned representations can then be used for a wide variety of tasks that the model was not explicitly trained for, like identifying dogs in images, or recognizing parks - all without any human-provided labels! The same concept applies in language, a model might predict the next word in a sentence, such as "The cat sat on the … ," learning grammar, syntax, and context as long as we repeat this over huge amounts of text. |
56 | 56 |
|
57 | 57 | **What is fine-tuning ?** After pre-training, foundation models can be adapted through two main approaches: fine-tuning and prompting. Fine-tuning involves additional training on a specific task or dataset to specialize the model's capabilities. For example, we might use Reinforcement Learning from Human Feedback (RLHF) to make language models better at following instructions or being more helpful. Prompting, on the other hand, involves providing the model with carefully crafted inputs that guide it toward desired behaviors without additional training. We'll discuss these adaptation methods in more detail in Chapter 8 when we explore scalable oversight. |
58 | 58 |
|
|
69 | 69 | **Why is generality?** Generalization in foundation models works differently from traditional AI systems. Rather than just generalizing within a narrow domain, these models can generalize capabilities across domains in surprising ways. However, this generalization of capabilities often happens without a corresponding generalization of goals or constraints - a critical safety concern we'll explore in detail in our chapter on goal misgeneralization. For example, a model might generalize its ability to manipulate text in unexpected ways without maintaining the safety constraints we intended ([Hendrycks et al., 2022](https://arxiv.org/abs/2109.13916)). |
70 | 70 |
|
71 | 71 | <figure markdown="span"> |
72 | | -{ loading=lazy } |
| 72 | +{ loading=lazy } |
73 | 73 | <figcaption markdown="1"><b>Figure 1.16:</b> On the Opportunities and Risks of Foundation Models ([Bommasani et al., 2022](https://arxiv.org/abs/2108.07258))</figcaption> |
74 | 74 | </figure> |
75 | 75 |
|
|
90 | 90 | **How do resource requirements limit development and access?** Training foundation models require massive computational resources, creating a delicate balance between cost and accessibility. While adapting an existing model might be relatively affordable, the substantial initial training costs risk centralizing power among a few well-resourced entities. This concentration of power raises important questions about oversight and responsible development, that we'll address in our chapter on governance. For example, a single training run of GPT-4 sized models can cost tens or hundreds of millions of dollars, effectively limiting who can participate in their development. Continued scaling has also brought up many concerns around the environmental impact of AI training runs. ([Patterson et al., 2023](https://arxiv.org/abs/2104.10350)). |
91 | 91 |
|
92 | 92 | <figure markdown="span"> |
93 | | -{ loading=lazy } |
| 93 | +{ loading=lazy } |
94 | 94 | <figcaption markdown="1"><b>Figure 1.17:</b> The rising costs of training frontier AI models ([Cottier et al., 2024](https://arxiv.org/abs/2405.21015))</figcaption> |
95 | 95 | </figure> |
96 | 96 |
|
|
0 commit comments