Skip to content

Commit 5651a50

Browse files
authored
Merge branch 'main' into pr-1
2 parents b9deb8a + 67f7021 commit 5651a50

File tree

112 files changed

+634
-961
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

112 files changed

+634
-961
lines changed
Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
1-
appendix: 52
2-
core: 93
1+
appendix: 51
2+
core: 94

docs/chapters/01/01.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,7 @@ Here are some of the capabilities that have been emerging in the last few years:
7676
- **Scientific & Mathematical ability** . In mathematics, AI's have assisted in the subfield of automatic theorem proving for decades. Today's models continue to assist in solving complex problems. AI can even achieve a gold medal level in the mathematical Olympiad by solving geometry problems ([Trinh et al., 2024](https://www.nature.com/articles/s41586-023-06747-5)).
7777

7878
<figure markdown="span">
79-
![Enter image alt description](Images/Fjk_Image_5.png){ loading=lazy }
79+
![Enter image alt description](Images/PQy_Image_5.png){ loading=lazy }
8080
<figcaption markdown="1"><b>Figure 1.5:</b> Note also the large jump from GPT-3.5 to GPT-4 in human percentile on these tests, often from well below the median human to the very top of the human range. ([Aschenbrenner, 2024](https://situational-awareness.ai/from-gpt-4-to-agi/); [OpenAI, 2023](https://arxiv.org/abs/2303.08774)). Keep in mind that the jump from GPT-3 to GPT-4 was in a single year.</figcaption>
8181
</figure>
8282

@@ -110,7 +110,7 @@ AI systems are becoming increasingly multimodal. This means that they can proces
110110
**Multi-modality** . A model is called multi-modal when both the inputs and outputs of a model can be in more than one modality. E.g. audio-to-text, video-to-text, text-to-image, etc…
111111

112112
<figure markdown="span">
113-
![Enter image alt description](Images/7zs_Image_9.png){ loading=lazy }
113+
![Enter image alt description](Images/DjZ_Image_9.png){ loading=lazy }
114114
<figcaption markdown="1"><b>Figure 1.9:</b> Image-to-text and text-to-image multimodality from the Flamingo model. ([Alayrac et al., 2022](https://arxiv.org/abs/2204.14198))</figcaption>
115115
</figure>
116116

docs/chapters/01/02.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -41,18 +41,18 @@
4141
**How are foundation models trained differently from traditional AI systems?** One key innovation of foundation models is their training paradigm. Generally, foundation models use a two-stage training process. First, they go through what we call a pre-training, and then second, they can be adapted through various mechanisms like fine-tuning or scaffolding to perform specific tasks. Rather than learning from human-labeled examples for specific tasks, these models learn by finding patterns in huge amounts of unlabeled data. This shift toward self-supervised learning on massive datasets fundamentally changes not just how models learn, but also what kinds of capabilities and risks might emerge ([Bommasani et al., 2022](https://arxiv.org/abs/2108.07258)). From a safety perspective, this means we need to understand both how these training methods work and how they might lead to unexpected behaviors.
4242

4343
<figure markdown="span">
44-
![Enter image alt description](Images/hVR_Image_14.png){ loading=lazy }
44+
![Enter image alt description](Images/IE4_Image_14.png){ loading=lazy }
4545
<figcaption markdown="1"><b>Figure 1.14:</b> On the Opportunities and Risks of Foundation Models ([Bommasani et al., 2022](https://arxiv.org/abs/2108.07258))</figcaption>
4646
</figure>
4747

4848
**What is pre-training?** Pre-training is the initial phase where the model learns general patterns and knowledge from massive datasets of millions or billions of examples. During this phase, the model isn't trained for any specific task - instead, it develops broad capabilities that can later be specialized. This generality is both powerful and concerning from a safety perspective. While it enables the model to adapt to many different tasks, it also means we can't easily predict or constrain what the model might learn to do ([Hendrycks et al., 2022](https://arxiv.org/abs/2109.13916)).
4949

5050
<figure markdown="span">
51-
![Enter image alt description](Images/0Rx_Image_15.png){ loading=lazy }
51+
![Enter image alt description](Images/qbA_Image_15.png){ loading=lazy }
5252
<figcaption markdown="1"><b>Figure 1.15:</b> On the Opportunities and Risks of Foundation Models ([Bommasani et al., 2022](https://arxiv.org/abs/2108.07258))</figcaption>
5353
</figure>
5454

55-
**How does self-supervised learning enable pre-training?** Self-supervised learning (SSL) is the key technical innovation that makes foundation models possible. This is how we actually implement the pre-training phase. Unlike traditional supervised learning, which requires human-labeled data, SSL leverages the inherent structure of the data itself to create training signals. For example, instead of manually labeling images, we might just hide part of a full image we already have and ask a model to predict what the rest should be. So it might predict the bottom half of an image given the top half, learning about which objects often appear together. As an example, it might learn that images with trees and grass at the top often have more grass, or maybe a path, at the bottom. It learns about objects and their context — trees and grass often appear in parks, dogs are often found in these environments, paths are usually horizontal, and so on. These learned representations can then be used for a wide variety of tasks that the model was not explicitly trained for, like identifying dogs in images, or recognizing parks - all without any human-provided labels! The same concept applies in language, a model might predict the next word in a sentence, such as "The cat sat on the ____," learning grammar, syntax, and context as long as we repeat this over huge amounts of text.
55+
**How does self-supervised learning enable pre-training?** Self-supervised learning (SSL) is the key technical innovation that makes foundation models possible. This is how we actually implement the pre-training phase. Unlike traditional supervised learning, which requires human-labeled data, SSL leverages the inherent structure of the data itself to create training signals. For example, instead of manually labeling images, we might just hide part of a full image we already have and ask a model to predict what the rest should be. So it might predict the bottom half of an image given the top half, learning about which objects often appear together. As an example, it might learn that images with trees and grass at the top often have more grass, or maybe a path, at the bottom. It learns about objects and their context - trees and grass often appear in parks, dogs are often found in these environments, paths are usually horizontal, and so on. These learned representations can then be used for a wide variety of tasks that the model was not explicitly trained for, like identifying dogs in images, or recognizing parks - all without any human-provided labels! The same concept applies in language, a model might predict the next word in a sentence, such as "The cat sat on the … ," learning grammar, syntax, and context as long as we repeat this over huge amounts of text.
5656

5757
**What is fine-tuning ?** After pre-training, foundation models can be adapted through two main approaches: fine-tuning and prompting. Fine-tuning involves additional training on a specific task or dataset to specialize the model's capabilities. For example, we might use Reinforcement Learning from Human Feedback (RLHF) to make language models better at following instructions or being more helpful. Prompting, on the other hand, involves providing the model with carefully crafted inputs that guide it toward desired behaviors without additional training. We'll discuss these adaptation methods in more detail in Chapter 8 when we explore scalable oversight.
5858

@@ -69,7 +69,7 @@
6969
**Why is generality?** Generalization in foundation models works differently from traditional AI systems. Rather than just generalizing within a narrow domain, these models can generalize capabilities across domains in surprising ways. However, this generalization of capabilities often happens without a corresponding generalization of goals or constraints - a critical safety concern we'll explore in detail in our chapter on goal misgeneralization. For example, a model might generalize its ability to manipulate text in unexpected ways without maintaining the safety constraints we intended ([Hendrycks et al., 2022](https://arxiv.org/abs/2109.13916)).
7070

7171
<figure markdown="span">
72-
![Enter image alt description](Images/HeN_Image_16.png){ loading=lazy }
72+
![Enter image alt description](Images/JXm_Image_16.png){ loading=lazy }
7373
<figcaption markdown="1"><b>Figure 1.16:</b> On the Opportunities and Risks of Foundation Models ([Bommasani et al., 2022](https://arxiv.org/abs/2108.07258))</figcaption>
7474
</figure>
7575

@@ -90,7 +90,7 @@
9090
**How do resource requirements limit development and access?** Training foundation models require massive computational resources, creating a delicate balance between cost and accessibility. While adapting an existing model might be relatively affordable, the substantial initial training costs risk centralizing power among a few well-resourced entities. This concentration of power raises important questions about oversight and responsible development, that we'll address in our chapter on governance. For example, a single training run of GPT-4 sized models can cost tens or hundreds of millions of dollars, effectively limiting who can participate in their development. Continued scaling has also brought up many concerns around the environmental impact of AI training runs. ([Patterson et al., 2023](https://arxiv.org/abs/2104.10350)).
9191

9292
<figure markdown="span">
93-
![Enter image alt description](Images/ux2_Image_17.png){ loading=lazy }
93+
![Enter image alt description](Images/Erm_Image_17.png){ loading=lazy }
9494
<figcaption markdown="1"><b>Figure 1.17:</b> The rising costs of training frontier AI models ([Cottier et al., 2024](https://arxiv.org/abs/2405.21015))</figcaption>
9595
</figure>
9696

docs/chapters/01/03.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -45,20 +45,20 @@ In our next subsection, we will explore how we can concretely define and measure
4545

4646
## 1.3.2 Measuring {: #02}
4747

48-
!!! quote "Lord Kelvin ([Oxford essential quotations, 2016](https://www.oxfordreference.com/display/10.1093/acref/9780191826719.001.0001/q-oro-ed4-00006236))"
48+
!!! quote "Lord Kelvin ([Oxford Reference, 2016](https://www.oxfordreference.com/display/10.1093/acref/9780191826719.001.0001/q-oro-ed4-00006236))"
4949

5050

5151

5252
When you can measure what you are speaking about, and express it in numbers, you know something about it, when you cannot express it in numbers, your knowledge is of a meager and unsatisfactory kind; it may be the beginning of knowledge, but you have scarely, in your thoughts advanced to the stage of science.
5353

5454

5555

56-
**Why do traditional definitions of AGI fall short?** In the previous section, we explored how foundation models are becoming increasingly powerful and versatile. But before we can meaningfully discuss risks and safety implications, or make predictions about future progress, we need clear ways to measure and track AI capabilities. This section introduces frameworks for measuring progress toward artificial general intelligence (AGI) and understanding the relationship between capabilities, autonomy, and risk. For example, OpenAI's definition of AGI as "*systems that outperform humans at most economically valuable work*" ([OpenAI, 2014](https://openai.com/charter/)), or the commonly used definition "*Intelligence measures an agent’s ability to achieve goals in a wide range of environments.*" ([Legg and Hutter, 2007](https://arxiv.org/abs/0706.3639)) and many others are not specific enough to be operationalizable. Which humans? Which goals? Which tasks are economically valuable? What about systems that exceed human performance on some tasks but only for short durations?
56+
**Why do traditional definitions of AGI fall short?** In the previous section, we explored how foundation models are becoming increasingly powerful and versatile. But before we can meaningfully discuss risks and safety implications, or make predictions about future progress, we need clear ways to measure and track AI capabilities. This section introduces frameworks for measuring progress toward artificial general intelligence (AGI) and understanding the relationship between capabilities, autonomy, and risk. For example, OpenAI's definition of AGI as "*systems that outperform humans at most economically valuable work*" ([OpenAI, 2014](https://openai.com/charter/)), or the commonly used definition "*Intelligence measures an agent’s ability to achieve goals in a wide range of environments.*" ([Legg & Hutter, 2007](https://arxiv.org/abs/0706.3639)) and many others are not specific enough to be operationalizable. Which humans? Which goals? Which tasks are economically valuable? What about systems that exceed human performance on some tasks but only for short durations?
5757

5858
**Why do we need better measurement frameworks?** Historically, discussions about AGI have often relied on binary thresholds - systems were categorized as either "narrow" or "general", "weak" or "strong", "sub-human" or "human-level." While these distinctions helped frame early discussions about AI, they become increasingly inadequate as AI systems grow more sophisticated. Just like we sidestepped debates around whether AIs display "true intelligence" or "real understanding" in favor of a more practical framework that focuses on capabilities, similarly we want to avoid debates around things like whether a system is "human-level" or not. It is much more pragmatic to be able to make statements like - it outperforms 75% of skilled adults on 30% of cognitive tasks.
5959

6060
<figure markdown="span">
61-
![Enter image alt description](Images/e5u_Image_18.png){ loading=lazy }
61+
![Enter image alt description](Images/knt_Image_18.png){ loading=lazy }
6262
<figcaption markdown="1"><b>Figure 1.18:</b> This is the continuous outlook of AI measuring performance. All points on this axis can be called ANI (except for the origin).</figcaption>
6363
</figure>
6464

@@ -75,15 +75,15 @@ In our next subsection, we will explore how we can concretely define and measure
7575
**How can we build a better measurement framework for AGI?** We need to track AI progress along both - performance (how well can it do things?) and generality (how many different things can it do?). Just like we can describe a point on a map using latitude and longitude, we can characterize AGI systems by their combined level of performance and degree of generality, as measured by benchmarks and evaluations. This framework gives us a much more granular way to track progress. This precision helps us better understand both current capabilities and likely development trajectories.
7676

7777
<figure markdown="span">
78-
![Enter image alt description](Images/wSd_Image_19.png){ loading=lazy }
79-
<figcaption markdown="1"><b>Figure 1.19:</b> Table of performance x generality showing both levels of ANI, and levels of AGI.</figcaption>
78+
![Enter image alt description](Images/S3F_Image_19.png){ loading=lazy }
79+
<figcaption markdown="1"><b>Figure 1.19:</b> Table of performance x generality showing both levels of ANI, and levels of AGI. ([Morris et al., 2024](https://arxiv.org/abs/2311.02462))</figcaption>
8080
</figure>
8181

8282
**Where do current AI systems fit in this framework?** Large language models like GPT-4 show an interesting pattern - they outperform roughly 50% of skilled adults on perhaps 15-20% of cognitive tasks (like basic writing and coding), while matching or slightly exceeding unskilled human performance on a broader range of tasks. This gives us a more precise way to track progress than simply debating whether such systems qualify as "AGI." LLMs like GPT-4 are early forms of AGI ([Bubeck, 2023](https://arxiv.org/abs/2303.12712)), and over time we will achieve stronger AGI as both generality and performance increase. To understand how this continuous framework relates to traditional definitions, let's examine how key historical concepts map onto our performance-generality space.
8383

8484
<figure markdown="span">
85-
![Enter image alt description](Images/0hx_Image_20.png){ loading=lazy }
86-
<figcaption markdown="1"><b>Figure 1.20:</b> The two-dimensional view of performance x generality. The different curves are meant to represent the different paths we can take to ASI. Every single point on the path corresponds to a different level of AGI. The specific development trajectory is hard to forecast. This will be discussed in the section on forecasting and takeoff.</figcaption>
85+
![Enter image alt description](Images/jjS_Image_20.png){ loading=lazy }
86+
<figcaption markdown="1"><b>Figure 1.20:</b> The two-dimensional view of performance x generality. The different colored curves are meant to represent the different paths we can take to ASI. Every single point on the path corresponds to a different level of AGI. The specific development trajectory is hard to forecast. This will be discussed in the section on forecasting and takeoff.</figcaption>
8787
</figure>
8888

8989
!!! info "Definition: Transformative AI (TAI) ([Karnofsky, 2016](https://www.openphilanthropy.org/research/some-background-on-our-views-regarding-advanced-artificial-intelligence/))"

0 commit comments

Comments
 (0)