You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
description: Consistent human oversight in critical AI systems.
4
+
featured:
5
+
class: c
6
+
element: '<action>Human In The Loop</action>'
7
+
tags:
8
+
- Human In The Loop
9
+
- Practice
10
+
practice:
11
+
mitigates:
12
+
- tag: Loss Of Human Control
13
+
reason: "Maintaining consistent human oversight in critical AI systems, ensuring that final decisions or interventions rest with human operators rather than the AI."
14
+
---
15
+
16
+
<PracticeIntrodetails={frontMatter} />
17
+
18
+
- Maintaining consistent human oversight in critical AI systems, ensuring that final decisions or interventions rest with human operators rather than the AI.
19
+
- AI may suggest diagnoses or treatments, but a certified professional reviews and confirms before enacting them. In the above NHS Grampian example, the AI is augmenting human decision making with a third opinion, rather than replacing human judgement altogether (yet).
20
+
- Some proposals mandate that human operators confirm critical actions (e.g., missile launches), preventing AI from unilaterally making life-or-death decisions. This might work in scenarios where response time isn't a factor.
21
+
22
+
-**Efficacy:** Medium – Reduces risk by limiting autonomy on high-stakes tasks; however, humans may become complacent or fail to intervene effectively if over-trusting AI.
23
+
-**Ease of Implementation:** Moderate – Policy, regulatory standards, and user training are needed to embed human oversight effectively.
description: Fail-safe systems capable of shutting down or isolating AI processes if they exhibit dangerous behaviours.
4
+
featured:
5
+
class: c
6
+
element: '<action>Kill Switch Mechanism</action>'
7
+
tags:
8
+
- Kill Switch
9
+
- Practice
10
+
practice:
11
+
mitigates:
12
+
- tag: Loss Of Human Control
13
+
reason: "An explicit interruption capability can avert catastrophic errors or runaway behaviours"
14
+
---
15
+
16
+
<PracticeIntrodetails={frontMatter} />
17
+
18
+
### Kill-Switch Mechanisms
19
+
20
+
-**Examples:**
21
+
-**Google DeepMind’s ‘Big Red Button’ concept** (2016), proposed as a method to interrupt a reinforcement learning AI without it learning to resist interruption.
22
+
23
+
-**Hardware Interrupts in Robotics:** Physical or software-based emergency stops that immediately terminate AI operation.
24
+
25
+
-**Efficacy:** High – An explicit interruption capability can avert catastrophic errors or runaway behaviours, but it's more likely that they will be employed once the error has started, in order to prevent further harm.
26
+
-**Ease of Implementation:** Medium – Requires robust design and consistent testing to avoid workarounds by advanced AI.
reason: "An explicit interruption capability can avert catastrophic errors or runaway behaviours"
14
+
---
15
+
16
+
17
+
18
+
19
+
20
+
### Replication Control
21
+
22
+
- Replication control becomes relevant when an AI system can duplicate itself—or be duplicated—beyond the reach of any central authority (analogous to a computer virus—though with potentially far greater autonomy and adaptability).
23
+
- An organization/person builds a very capable AI with some misaligned objectives. If they distribute its model or code openly, it effectively becomes “in the wild.”
24
+
- Could controls be put in place to prevent this from happening? TODO: figure this out.
25
+
26
+
-**Efficacy:** Medium – Limits the spread of potentially rogue AI copies.
27
+
-**Ease of Implementation:** Low – In open-source communities or decentralized systems, controlling replication requires broad consensus and technical enforcement measures.
description: AI systems operating autonomously with minimal human oversight can lead to scenarios where we cannot override or re-align them with human values.
4
4
featured:
5
5
class: c
6
-
element: '<risk class="process">Loss of Human Control over AI</risk>'
6
+
element: |
7
+
'<risk class="process" /><description style="text-align: center">Loss of
8
+
Human Control</description>'
7
9
tags:
8
-
- AI-Risk
9
-
- Loss-of-Human-Control
10
+
- AI Threats
11
+
- Loss Of HumanControl
10
12
sidebar_position: 3
11
13
tweet: yes
14
+
part_of: AI Threats
12
15
---
13
16
17
+
<AIThreatIntrofm={frontMatter} />
18
+
14
19
AI systems that act without robust human oversight can evolve in ways that defy our attempts at control or correction. In the short term, engineers have to wrestle with new approaches to defining acceptable behaviour (see Amodei et al): even just cleaning an environment is a hard goal to pin down (clean doesn't mean devoid of any furniture, for example). How do you allow the AI to learn and improve without enabling "Reward Hacking", where it finds ways to game the reward function (a la Goodhart's law).
15
20
16
21
The problem is that human oversight is _expensive_: we want to have a minimum level of oversight without worrying that things will go wrong.
@@ -30,32 +35,3 @@ The problem is that human oversight is _expensive_: we want to have a minimum le
30
35
-**[Boeing 737 MAX MCAS Issue (2018–2019)](https://mashable.com/article/boeing-737-max-aggressive-risky-ai):** Although not purely an AI system, automated flight software repeatedly overrode pilot inputs, contributing to two tragic crashes—illustrating how over-reliance on opaque automation can lead to disastrous outcomes. This was caused by systemic failures at Boeing, driven by a cost-cutting culture and short-term focus on shareholder returns.
31
36
32
37
-**Healthcare Diagnostic Tools:** Systems that recommend or even autonomously administer treatments based on patient data can outpace human doctors’ ability to review every decision, making interventions more difficult if the AI fails. [NHS Grampian: breast cancer detection.](https://ukstories.microsoft.com/features/nhs-grampian-is-working-with-kheiron-medical-technologies-university-of-aberdeen-and-microsoft-to-support-breast-cancer-detection/)
33
-
34
-
## Mitigations
35
-
36
-
### Kill-Switch Mechanisms
37
-
38
-
-**Description:** Fail-safe systems capable of shutting down or isolating AI processes if they exhibit dangerous behaviours.
39
-
-**Examples:**
40
-
-**Google DeepMind’s ‘Big Red Button’ concept** (2016), proposed as a method to interrupt a reinforcement learning AI without it learning to resist interruption.
41
-
-**Hardware Interrupts in Robotics:** Physical or software-based emergency stops that immediately terminate AI operation.
42
-
-**Efficacy:** High – An explicit interruption capability can avert catastrophic errors or runaway behaviours, but it's more likely that they will be employed once the error has started, in order to prevent further harm.
43
-
-**Ease of Implementation:** Medium – Requires robust design and consistent testing to avoid workarounds by advanced AI.
44
-
45
-
### Human-in-the-Loop Controls
46
-
47
-
- Maintaining consistent human oversight in critical AI systems, ensuring that final decisions or interventions rest with human operators rather than the AI.
48
-
- AI may suggest diagnoses or treatments, but a certified professional reviews and confirms before enacting them. In the above NHS Grampian example, the AI is augmenting human decision making with a third opinion, rather than replacing human judgement altogether (yet).
49
-
- Some proposals mandate that human operators confirm critical actions (e.g., missile launches), preventing AI from unilaterally making life-or-death decisions. This might work in scenarios where response time isn't a factor.
50
-
51
-
-**Efficacy:** Medium – Reduces risk by limiting autonomy on high-stakes tasks; however, humans may become complacent or fail to intervene effectively if over-trusting AI.
52
-
-**Ease of Implementation:** Moderate – Policy, regulatory standards, and user training are needed to embed human oversight effectively.
53
-
54
-
### Replication Control
55
-
56
-
- Replication control becomes relevant when an AI system can duplicate itself—or be duplicated—beyond the reach of any central authority (analogous to a computer virus—though with potentially far greater autonomy and adaptability).
57
-
- An organization/person builds a very capable AI with some misaligned objectives. If they distribute its model or code openly, it effectively becomes “in the wild.”
58
-
- Could controls be put in place to prevent this from happening? TODO: figure this out.
59
-
60
-
-**Efficacy:** Medium – Limits the spread of potentially rogue AI copies.
61
-
-**Ease of Implementation:** Low – In open-source communities or decentralized systems, controlling replication requires broad consensus and technical enforcement measures.
AI systems designed to influence behaviour at scale could (and do) undermine democracy, free will, and individual autonomy.
16
19
17
20
## Sources
@@ -24,8 +27,6 @@ AI systems designed to influence behaviour at scale could (and do) undermine dem
24
27
25
28
-**Nazi Propaganda**[United States Holocaust Memorial Museum](https://encyclopedia.ushmm.org/content/en/article/nazi-propaganda): Examines how the Nazi regime harnessed mass media—including radio broadcasts, film, and print—to shape public opinion, consolidate power, and foment anti-Semitic attitudes during World War. (Fake content isn't a new problem.)
0 commit comments