diff --git a/src/SUMMARY.md b/src/SUMMARY.md index dd6f25139ead..120e5870970d 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -486,16 +486,81 @@ - [Welcome](unsafe-deep-dive/welcome.md) - [Setup](unsafe-deep-dive/setup.md) -- [Motivations](unsafe-deep-dive/motivations.md) - - [Interoperability](unsafe-deep-dive/motivations/interop.md) - - [Data Structures](unsafe-deep-dive/motivations/data-structures.md) - - [Performance](unsafe-deep-dive/motivations/performance.md) -- [Foundations](unsafe-deep-dive/foundations.md) - - [What is unsafe?](unsafe-deep-dive/foundations/what-is-unsafe.md) - - [When is unsafe used?](unsafe-deep-dive/foundations/when-is-unsafe-used.md) - - [Data structures are safe](unsafe-deep-dive/foundations/data-structures-are-safe.md) - - [Actions might not be](unsafe-deep-dive/foundations/actions-might-not-be.md) - - [Less powerful than it seems](unsafe-deep-dive/foundations/less-powerful.md) +- [Introduction](unsafe-deep-dive/introduction.md) + - [Defining Unsafe Rust](unsafe-deep-dive/introduction/definition.md) + - [Purpose of the unsafe keyword](unsafe-deep-dive/introduction/purpose.md) + - [Two roles of the unsafe keyword](unsafe-deep-dive/introduction/two-roles.md) + - [Warm Up Examples](unsafe-deep-dive/introduction/warm-up.md) + - [Using an unsafe block](unsafe-deep-dive/introduction/warm-up/unsafe-block.md) + - [Defining an unsafe function](unsafe-deep-dive/introduction/warm-up/unsafe-fn.md) + - [Implementing an unsafe trait](unsafe-deep-dive/introduction/warm-up/unsafe-impl.md) + - [Defining an unsafe trait](unsafe-deep-dive/introduction/warm-up/unsafe-trait.md) + - [Characteristics of Unsafe Rust](unsafe-deep-dive/introduction/characteristics-of-unsafe-rust.md) + - [Dangerous](unsafe-deep-dive/introduction/characteristics-of-unsafe-rust/dangerous.md) + - [Sometimes necessary](unsafe-deep-dive/introduction/characteristics-of-unsafe-rust/sometimes-necessary.md) + - [Sometimes useful](unsafe-deep-dive/introduction/characteristics-of-unsafe-rust/sometimes-useful.md) + - [Responsibility shift](unsafe-deep-dive/introduction/responsibility-shift.md) + - [Stronger development workflow required](unsafe-deep-dive/introduction/impact-on-workflow.md) + - [Example: may_overflow](unsafe-deep-dive/introduction/may_overflow.md) +- [Safety Preconditions](unsafe-deep-dive/safety-preconditions.md) + - [Common Preconditions](unsafe-deep-dive/safety-preconditions/common-preconditions.md) + - [Getter example](unsafe-deep-dive/safety-preconditions/getter.md) + - [Semantic preconditions](unsafe-deep-dive/safety-preconditions/semantic-preconditions.md) + - [Example: u8 to bool](unsafe-deep-dive/safety-preconditions/u8-to-bool.md) + - [Determining preconditions](unsafe-deep-dive/safety-preconditions/determining.md) + - [Example: references](unsafe-deep-dive/safety-preconditions/references.md) + - [Defining your own preconditions](unsafe-deep-dive/safety-preconditions/defining.md) + - [Example: ASCII Type](unsafe-deep-dive/safety-preconditions/ascii.md) +- [Rules of the game](unsafe-deep-dive/rules-of-the-game.md) + - [Rust is sound](unsafe-deep-dive/rules-of-the-game/rust-is-sound.md) + - [Copying memory](unsafe-deep-dive/rules-of-the-game/copying-memory.md) + - [Safe Rust](unsafe-deep-dive/rules-of-the-game/copying-memory/safe.md) + - [Encapsulated Unsafe Rust](unsafe-deep-dive/rules-of-the-game/copying-memory/encapsulated-unsafe.md) + - [Exposed Unsafe Rust](unsafe-deep-dive/rules-of-the-game/copying-memory/exposed-unsafe.md) + - [Documented safety preconditions](unsafe-deep-dive/rules-of-the-game/copying-memory/documented-safety-preconditions.md) + - [Crying Wolf](unsafe-deep-dive/rules-of-the-game/copying-memory/crying-wolf.md) + - [3 shapes of sound Rust](unsafe-deep-dive/rules-of-the-game/3-shapes-of-sound-rust.md) + - [Soundness Proof](unsafe-deep-dive/rules-of-the-game/soundness-proof.md) + - [Soundness](unsafe-deep-dive/rules-of-the-game/soundness-proof/soundness.md) + - [Corollary](unsafe-deep-dive/rules-of-the-game/soundness-proof/corollary.md) + - [Unsoundness](unsafe-deep-dive/rules-of-the-game/soundness-proof/unsoundness.md) +- [Memory Lifecycle](unsafe-deep-dive/memory-lifecycle.md) +- [Initialization](unsafe-deep-dive/initialization.md) + - [MaybeUninit](unsafe-deep-dive/initialization/maybeuninit.md) + - [Arrays of uninit](unsafe-deep-dive/initialization/maybeuninit/arrays.md) + - [MaybeUninit::zeroed()](unsafe-deep-dive/initialization/maybeuninit/zeroed-method.md) + - [ptr::write vs assignment](unsafe-deep-dive/initialization/maybeuninit/write-vs-assignment.md) + - [How to initialize memory](unsafe-deep-dive/initialization/how-to-initialize-memory.md) + - [Partial initialization](unsafe-deep-dive/initialization/partial-initialization.md) +- [Pinning](unsafe-deep-dive/pinning.md) + - [What pinning is](unsafe-deep-dive/pinning/what-pinning-is.md) + - [What a move is](unsafe-deep-dive/pinning/what-a-move-is.md) + - [Definition of Pin](unsafe-deep-dive/pinning/definition-of-pin.md) + - [Why it's difficult](unsafe-deep-dive/pinning/why-difficult.md) + - [`Unpin` trait](unsafe-deep-dive/pinning/unpin-trait.md) + - [`PhantomPinned`](unsafe-deep-dive/pinning/phantompinned.md) + - [Self-Referential Buffer Example](unsafe-deep-dive/pinning/self-referential-buffer.md) + - [C++ implementation](unsafe-deep-dive/pinning/self-referential-buffer/cpp.md) + - [Modeled in Rust](unsafe-deep-dive/pinning/self-referential-buffer/rust.md) + - [With a raw pointer](unsafe-deep-dive/pinning/self-referential-buffer/rust-raw-pointers.md) + - [With an integer offset](unsafe-deep-dive/pinning/self-referential-buffer/rust-offset.md) + - [With `Pin`](unsafe-deep-dive/pinning/self-referential-buffer/rust-pin.md) + - [`Pin` and `Drop`](unsafe-deep-dive/pinning/pin-and-drop.md) + - [Worked Example](unsafe-deep-dive/pinning/drop-and-not-unpin-worked-example.md) +- [FFI](unsafe-deep-dive/ffi.md) + - [Language Interop](unsafe-deep-dive/ffi/language-interop.md) + - [Strategies](unsafe-deep-dive/ffi/strategies.md) + - [Consideration: Type Safety](unsafe-deep-dive/ffi/type-safety.md) + - [Language differences](unsafe-deep-dive/ffi/language-differences.md) + - [Different representations](unsafe-deep-dive/ffi/language-differences/representations.md) + - [Different semantics](unsafe-deep-dive/ffi/language-differences/semantics.md) + - [Rust ↔ C](unsafe-deep-dive/ffi/language-differences/rust-and-c.md) + - [C++ ↔ C](unsafe-deep-dive/ffi/language-differences/cpp-and-c.md) + - [Rust ↔ C++](unsafe-deep-dive/ffi/language-differences/rust-and-cpp.md) + - [`abs(3)`](unsafe-deep-dive/ffi/abs.md) + - [`rand(3)`](unsafe-deep-dive/ffi/rand.md) + - [Exercise:C library](unsafe-deep-dive/ffi/c-library-example.md) + - [Exercise: C++ library](unsafe-deep-dive/ffi/cpp-library-example.md) --- diff --git a/src/unsafe-deep-dive/case-studies.md b/src/unsafe-deep-dive/case-studies.md new file mode 100644 index 000000000000..3be3183624b8 --- /dev/null +++ b/src/unsafe-deep-dive/case-studies.md @@ -0,0 +1,9 @@ +# Case Studies + +Uses of `unsafe` in production that may be useful for further study: + +| Case study | Topics | +| ------------------------------- | -------------------------------------- | +| tokio's [Intrusive Linked List] | UnsafeCell, PhantomData, PhantomPinned | + +[Intrusive Linked List]: case-studies/intrusive-linked-list.md diff --git a/src/unsafe-deep-dive/case-studies/intrusive-linked-list.md b/src/unsafe-deep-dive/case-studies/intrusive-linked-list.md new file mode 100644 index 000000000000..8bde9c2380b6 --- /dev/null +++ b/src/unsafe-deep-dive/case-studies/intrusive-linked-list.md @@ -0,0 +1,106 @@ +# Tokio's Intrusive Linked List + +> Current as of tokio v1.48.0 + +The Tokio project maintains an [intrusive linked list implementation][ill] that +demonstrates many use cases of `unsafe` and a number of types and traits from +Rust's unsafe ecosystem, including `cell::UnsafeCell`, `mem::ManuallyDrop`, +[pinning](../pinning/what-pinning-is.md), and unsafe traits. + +A linked list is a difficult data structure to implement in Rust, because it +relies heavily on stable memory addresses remaining stable. This isn't something +that happens naturally, as Rust types change their memory address every time +they move. + +## Introductory Walkthrough + +The public API is provided by the `LinkedList` type, which contains fields +for the start and the end of the list. `Option>` could be read as a +`Option<*mut T>`, with the assurance that null pointers will never be created. + +`NonNull` is "_covariant_ over `T`", which means that `NonNull` inherits +the [variance] relationships from `T`. + +```rust,ignore +use core::marker::PhantomData; + +// ... + +/// An intrusive linked list. +/// +/// Currently, the list is not emptied on drop. It is the caller's +/// responsibility to ensure the list is empty before dropping it. +pub(crate) struct LinkedList { + /// Linked list head + head: Option>, + + /// Linked list tail + tail: Option>, + + /// Node type marker. + _marker: PhantomData<*const L>, +} +``` + +`LinkedList` is neither `Send` nor `Sync`, unless its targets are. + +```rust,ignore +unsafe impl Send for LinkedList where L::Target: Send {} +unsafe impl Sync for LinkedList where L::Target: Sync {} +``` + +The `Link` trait used the those trait bounds is defined next. `Link` is an +unsafe trait that manages the relationships between nodes when the list needs to +pass references externally to callers. + +Here's trait's definition. The most significant method is `pointers()`, which +returns a `Pointers` struct. `Pointers` provides access to the two ends of the +link by marking itself as `!Unpin`. + +```rust,ignore +pub unsafe trait Link { + type Handle; + + type Target; + + fn as_raw(handle: &Self::Handle) -> NonNull; + + unsafe fn from_raw(ptr: NonNull) -> Self::Handle; + + /// # Safety + /// + /// The resulting pointer should have the same tag in the stacked-borrows + /// stack as the argument. In particular, the method may not create an + /// intermediate reference in the process of creating the resulting raw + /// pointer. + unsafe fn pointers( + target: NonNull, + ) -> NonNull>; +} +``` + +`Pointers` is where the magic happens: + +```rust,ignore +pub(crate) struct Pointers { + inner: UnsafeCell>, +} + +struct PointersInner { + prev: Option>, + next: Option>, + + /// This type is !Unpin due to the heuristic from: + /// + _pin: PhantomPinned, +} +``` + +## Remarks + +Understanding the whole implementation will take some time, but it's a rewarding +experience. The code demonstrates composing many parts of unsafe Rust's +ecosystem into a workable, high performance data structure. Enjoy exploring! + +[ill]: https://docs.rs/tokio/1.48.0/src/tokio/util/linked_list.rs.html +[variance]: https://doc.rust-lang.org/reference/subtyping.html diff --git a/src/unsafe-deep-dive/ffi.md b/src/unsafe-deep-dive/ffi.md new file mode 100644 index 000000000000..34117bf5751f --- /dev/null +++ b/src/unsafe-deep-dive/ffi.md @@ -0,0 +1 @@ +# FFI diff --git a/src/unsafe-deep-dive/ffi/README.md b/src/unsafe-deep-dive/ffi/README.md new file mode 100644 index 000000000000..3fe7435c3697 --- /dev/null +++ b/src/unsafe-deep-dive/ffi/README.md @@ -0,0 +1,7 @@ +This segment of the class is about the foreign function interface with Rust. + +outline: + +Start by wrapping a simple C function + +progress into more complex cases which involve pointers and uninitialized memory diff --git a/src/unsafe-deep-dive/ffi/abs.md b/src/unsafe-deep-dive/ffi/abs.md new file mode 100644 index 000000000000..344d0bc72b82 --- /dev/null +++ b/src/unsafe-deep-dive/ffi/abs.md @@ -0,0 +1,111 @@ +--- +minutes: 15 +--- + +# Wrapping `abs(3)` + +```rust,editable +fn abs(x: i32) -> i32; + +fn main() { + let x = -42; + let abs_x = abs(x); + println!("{x}, {abs_x}"); +} +``` + +
+ +In this slide, we’re establishing a pattern for writing wrappers. + +Find the external definition of a function’s signature Write a matching function +in Rust within an `extern` block Confirm which safety invariants need to be +upheld Decide whether it’s possible to mark the function as safe + +Note that this doesn’t work _yet_. + +Add the extern block: + +```rust +extern "C" { + abs(x: i32) -> i32; +} +``` + +Explain that many POSIX functions are available Rust because cargo links against +the C standard library (libc) by default, which makes its symbols into the +program’s scope. + +Show `man 3 abs` in the terminal or [a webpage][abs]. + +Explain that our function signature must match its definition: +`int abs(int j);`. + +Update the code block to use the C types. + +```rust +use std::ffi::c_int; + +extern "C" { + abs(x: c_int) -> c_int; +} +``` + +Discuss rationale: using `ffi::c_int` increases the portability of our code. +When the standard library is compiled for the target platform, the platform can +determine the widths. According to the C standard, an `c_int` may be defined as +an `i16` rather than the much more common `i32`. + +(Optional) Show the [documentation for c_int][c_int] to reveal that it is a type +alias for `i32`. + +Attempt to compile to trigger “error: extern blocks must be unsafe” error +message. + +Add the unsafe keyword to the block: + +```rust +use std::ffi::c_int; + +unsafe extern "C" { + fn abs(x: c_int) -> c_int; +} +``` + +Check that learners understand the significance of this change. We are required +to uphold type safety and other safety preconditions. + +Re-compile + +Add safe keyword to the abs function: + +```rust +use std::ffi::c_int; + +unsafe extern "C" { + safe fn abs(x: c_int) -> c_int; +} +``` + +Explain the `safe fn` marks `abs` as safe to call without an `unsafe` block. + +Completed program for reference: + +```rust +use std::ffi::c_int; + +unsafe extern "C" { + safe fn abs(x: c_int) -> c_int; +} + +fn main() { + let x = -42; + let abs_x = abs(x); + println!("{x}, {abs_x}"); +} +``` + +[abs]: https://www.man7.org/linux/man-pages/man3/abs.3.html +[c_int]: https://doc.rust-lang.org/std/ffi/type.c_int.html + +
diff --git a/src/unsafe-deep-dive/ffi/c-library-example.md b/src/unsafe-deep-dive/ffi/c-library-example.md new file mode 100644 index 000000000000..874fd52d41d0 --- /dev/null +++ b/src/unsafe-deep-dive/ffi/c-library-example.md @@ -0,0 +1,139 @@ +# C Library Example + +```c +#ifndef TEXT_ANALYSIS_H +#define TEXT_ANALYSIS_H + +#include +#include + +typedef struct TextAnalyst TextAnalyst; + +typedef struct { + const char* start; + size_t length; + size_t index; +} Token; + +typedef enum { + TA_OK = 0, + TA_ERR_NULL_POINTER, + TA_ERR_OUT_OF_MEMORY, + TA_ERR_OTHER, +} TAError; + +/* Return `false` to indicate that no token was found. */ +typedef bool (*Tokenizer)(Token* token, void* extra_context); + + +typedef bool (*TokenCallback)(void* user_context, Token* token, void* result); + +/* TextAnalyst constructor */ +TextAnalyst* ta_new(void); + +/* TextAnalyst destructor */ +void ta_free(TextAnalyst* ta); + +/* Resets state to clear the current document */ +void ta_reset(TextAnalyst* ta); + +/* Use custom tokenizer (defaults to whitespace) */ +void ta_set_tokenizer(TextAnalyst* ta, Tokenizer* func); + +TAError ta_set_text(TextAnalyst* ta, const char* text, size_t len, bool make_copy); + +/* Apply `callback` to each token */ +size_t ta_foreach_token(const TextAnalyst* ta, const TokenCallback* callback, void* user_context); + +/* Get human-readable error message */ +const char* ta_error_string(TAError error); + +#endif /* TEXT_ANALYSIS_H */ +``` + +
+ +C libraries will hide their implementation details with a `void*` argument. + +Consider this header file of a natural language processing library that hides +the `TextAnalyst` and `Analysis` types. + +This can be emulated in Rust with a type similar to this: + +```rust +#[repr(C)] +pub struct TextAnalyst { + _private: [u8; 0], +} +``` + +Exercise: Ask learners to wrap this library. + +_Suggested Solution_ + +```rust +// ffi.rs +use std::ffi::c_char; +use std::os::raw::c_void; + +#[repr(C)] +pub struct TextAnalyst { + _private: [u8; 0], +} + +#[repr(C)] +#[derive(Debug, Clone, Copy)] +pub struct Token { + pub start: *const c_char, + pub length: usize, + pub index: usize, +} + +#[repr(C)] +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +pub enum TAError { + Ok = 0, + NullPointer = 1, + OutOfMemory = 2, + Other = 3, +} + +pub type Tokenizer = Option< + unsafe extern "C" fn(token: *mut Token, extra_context: *mut c_void) -> bool, +>; + +pub type TokenCallback = Option< + unsafe extern "C" fn( + user_context: *mut c_void, + token: *mut Token, + result: *mut c_void, + ) -> bool, +>; + +extern "C" { + pub fn ta_new() -> *mut TextAnalyst; + + pub fn ta_free(ta: *mut TextAnalyst); + + pub fn ta_reset(ta: *mut TextAnalyst); + + pub fn ta_set_tokenizer(ta: *mut TextAnalyst, func: *const Tokenizer); + + pub fn ta_set_text( + ta: *mut TextAnalyst, + text: *const c_char, + len: usize, + make_copy: bool, + ) -> TAError; + + pub fn ta_foreach_token( + ta: *const TextAnalyst, + callback: *const TokenCallback, + user_context: *mut c_void, + ) -> usize; + + pub fn ta_error_string(error: TAError) -> *const c_char; +} +``` + +
diff --git a/src/unsafe-deep-dive/ffi/cpp-library-example.md b/src/unsafe-deep-dive/ffi/cpp-library-example.md new file mode 100644 index 000000000000..d55c92c2e8db --- /dev/null +++ b/src/unsafe-deep-dive/ffi/cpp-library-example.md @@ -0,0 +1,121 @@ +--- +minutes: 30 +--- + +# Example: String interning library + +C++ Header: interner.hpp + +```cpp +#ifndef INTERNER_HPP +#define INTERNER_HPP + +#include +#include + +class StringInterner { + std::unordered_set strings; + +public: + // Returns pointer to interned string (valid for lifetime of interner) + const char* intern(const char* s) { + auto [it, _] = strings.emplace(s); + return it->c_str(); + } + + size_t count() const { + return strings.size(); + } +}; + +#endif +``` + +C header file: interner.h + +```c +// interner.h (C API for FFI) +#ifndef INTERNER_H +#define INTERNER_H + +#include + +#ifdef __cplusplus +extern "C" { +#endif + +typedef struct StringInterner StringInterner; + +StringInterner* interner_new(void); +void interner_free(StringInterner* interner); +const char* interner_intern(StringInterner* interner, const char* s); +size_t interner_count(const StringInterner* interner); + +#ifdef __cplusplus +} +#endif +``` + +C++ implementation (interner.cpp) + +```cpp +#include "interner.hpp" +#include "interner.h" + +extern "C" { + +StringInterner* interner_new(void) { + return new StringInterner(); +} + +void interner_free(StringInterner* interner) { + delete interner; +} + +const char* interner_intern(StringInterner* interner, const char* s) { + return interner->intern(s); +} + +size_t interner_count(const StringInterner* interner) { + return interner->count(); +} + +} +``` + +
+ +This is a larger example. Write a wrapper for the string interner. You will need +to guide learners on creating an opaque pointer, either directly by explaining +the code below or asking learners to do further research. + +_Suggested Solution_ + +```rust +use std::ffi::{CStr, CString}; +use std::marker::PhantomData; +use std::os::raw::c_char; + +#[repr(C)] +pub struct StringInternerRaw { + _opaque: [u8; 0], + _pin: PhantomData<(*mut u8, std::marker::PhantomPinned)>, +} + +extern "C" { + fn interner_new() -> *mut StringInternerRaw; + + fn interner_free(interner: *mut StringInternerRaw); + + fn interner_intern( + interner: *mut StringInternerRaw, + s: *const c_char, + ) -> *const c_char; + + fn interner_count(interner: *const StringInternerRaw) -> usize; +} +``` + +Once the raw wrapper is written, ask learners to create a safe wrapper. + +
diff --git a/src/unsafe-deep-dive/ffi/language-differences.md b/src/unsafe-deep-dive/ffi/language-differences.md new file mode 100644 index 000000000000..9f00764c7a67 --- /dev/null +++ b/src/unsafe-deep-dive/ffi/language-differences.md @@ -0,0 +1,24 @@ +--- +minutes: 5 +--- + +# Language differences + +```bob +╭────────────╮ ╭───╮ ╭───╮ ╭────────────╮ +│ │ │ │ │ │ │ │ +│ │ <-----> │ │ <~~~~~~~> │ │ <------> │ │ +│ │ │ │ │ │ │ │ +╰────────────╯ ╰───╯ ╰───╯ ╰────────────╯ + Rust C C "C++" +``` + +
+ +Using C as a lowest common denominator means that lots of the richness available +to Rust and C++ is lost. + +Each translation has the potential for semantic loss, runtime overhead, and +subtle bugs. + +
diff --git a/src/unsafe-deep-dive/ffi/language-differences/cpp-and-c.md b/src/unsafe-deep-dive/ffi/language-differences/cpp-and-c.md new file mode 100644 index 000000000000..ba8a2ad4383e --- /dev/null +++ b/src/unsafe-deep-dive/ffi/language-differences/cpp-and-c.md @@ -0,0 +1,34 @@ +--- +minutes: 3 +--- + +# C++ ↔ C + +| Concern | C | C++ | +| ----------------- | -------------- | ------------------------------------------------- | +| **Overloading** | Manual/ad-hoc | Automatic | +| **Exceptions** | - | Stack unwinding | +| **Destructors** | Manual cleanup | Automatic via destructors (RAII) | +| **Non-POD types** | - | Objects with constructors, vtables, virtual bases | +| **Templates** | - | Compile-time code generation | + +
+ +C++ includes a number of features that don’t exist in C and they have an FFI +impact: + +Overloading: overloads become impossible to express because of name mangling + +Exceptions: must catch exceptions at the FFI boundary and convert as escaping +exceptions in `extern "C"` functions is undefined behavior + +Destructors: C callers won't run destructors; must expose explicit `*_destroy()` +functions + +Non-POD types: Must use opaque pointers across the FFI boundary as pass by value +does not make sense + +Templates: Cannot expose directly; must instantiate explicitly and wrap each +specialisation + +
diff --git a/src/unsafe-deep-dive/ffi/language-differences/representations.md b/src/unsafe-deep-dive/ffi/language-differences/representations.md new file mode 100644 index 000000000000..ecf4852912c7 --- /dev/null +++ b/src/unsafe-deep-dive/ffi/language-differences/representations.md @@ -0,0 +1,46 @@ +# Different representations + +```rust,editable +fn main() { + let c_repr = b"Hello, C\0"; + let cc_repr = (b"Hello, C++\0", 10u32); + let rust_repr = (b"Hello, Rust", 11); +} +``` + +
+ +Each language has its own opinion about how to implement things, which can lead +to confusion and bugs. Consider three ways to represent text. + +Show how to convert the raw representations to a Rust string slice: + +```rust +// C representation to Rust +unsafe { + let ptr = c_repr.as_ptr() as *const i8; + let c: &str = std::ffi::CStr::from_ptr(ptr).to_str().unwrap(); + println!("{c}"); +}; + +// C++ representation to Rust +unsafe { + let ptr = cc_repr.0.as_ptr(); + let bytes = std::slice::from_raw_parts(ptr, cc_repr.1); + let cc: &str = std::str::from_utf8_unchecked(bytes); + println!("{cc}"); +}; + +// Rust representation (bytes) to string slice +unsafe { + let ptr = rust_repr.0.as_ptr(); + let bytes = std::slice::from_raw_parts(ptr, rust_repr.1); + let rust: &str = std::str::from_utf8_unchecked(bytes); + println!("{rust}"); +}; +``` + +Aside: Rust has also has a c-prefixed string literal, i.e. `c"..."`. It appends +a null byte at the end. `c"Rust" == b"Rust\0".` + +
diff --git a/src/unsafe-deep-dive/ffi/language-differences/rust-and-c.md b/src/unsafe-deep-dive/ffi/language-differences/rust-and-c.md new file mode 100644 index 000000000000..6babeb829444 --- /dev/null +++ b/src/unsafe-deep-dive/ffi/language-differences/rust-and-c.md @@ -0,0 +1,35 @@ +--- +minutes: 3 +--- + +# Rust ↔ C + +| Concern | Rust | C | +| --------------- | ------------------------------------- | --------------------------------------------------- | +| **Errors** | `Result`, `Option` | Magic return values, out-parameters, global `errno` | +| **Strings** | `&str`/`String` (UTF-8, length-known) | Null-terminated `char*`, encoding undefined | +| **Nullability** | Explicit via `Option` | Any pointer may be null | +| **Ownership** | Affine types, lifetimes | Conventions | +| **Callbacks** | `Fn`/`FnMut`/`FnOnce` closures | Function pointer + `void* userdata` | +| **Panics** | Stack unwinding (or abort) | Abort | + +
+ +Errors: Must convert `Result` to abide by C conventions; easy to forget to check +errors on C side. + +Strings: Conversion cost; null bytes in Rust strings cause truncation; UTF-8 +validation on ingress + +Nullability: Every pointer from C must be checked to create an +`Option>`, implying unsafe blocks or runtime cost + +Ownership: Must document and enforce object lifetimes manually + +Callbacks: Must decompose closures into fn pointer + context; lifetime of +context is manual + +Panics: Panic across FFI boundary is undefined behavior; must catch at boundary +with `catch_unwind` + +
diff --git a/src/unsafe-deep-dive/ffi/language-differences/rust-and-cpp.md b/src/unsafe-deep-dive/ffi/language-differences/rust-and-cpp.md new file mode 100644 index 000000000000..8e93438133fa --- /dev/null +++ b/src/unsafe-deep-dive/ffi/language-differences/rust-and-cpp.md @@ -0,0 +1,44 @@ +--- +minutes: 3 +--- + +# Rust ↔ C++ + +| Concern | Rust | C++ | +| -------------------------- | ---------------------------------------- | --------------------------------------------------------------- | +| **Trivial relocatability** | All moves are `memcpy` | Self-referential types, move constructors can have side effects | +| **Destruction safety** | `Drop::drop()` on original location only | Destructor may run on moved-from objects | +| **Exception safety** | Panics (abort or unwind) | Exceptions (unwind) | +| **ABI stability** | Explicitly unstable | Vendor-specific | + +
+ +Even if it were possible to avoid interop via C, there are still some areas of +the languages that impact FFI: + +_Trivial relocatability_ + +Cannot safely move C++ objects on Rust side; must pin or keep in C++ heap. + +In Rust, object movement, which occurs during assignment or by being passed by +value, always copies values bit by bit. + +C++ allows users to define their own semantics by allowing them to overload the +assignment operator and create move and copy constructors. + +This impacts interop because self-referential types become natural in +high-performance C++. Custom constructors can uphold safety invariants even when +the object moves its position in memory. + +Objects with the same semantics are impossible to define in Rust. + +_Destruction safety_ + +Moved-from C++ object semantics don't map; must prevent Rust from "moving" C++ +types + +_Exception safety_ + +Neither can cross into the other safely; both must catch at boundary + +
diff --git a/src/unsafe-deep-dive/ffi/language-differences/semantics.md b/src/unsafe-deep-dive/ffi/language-differences/semantics.md new file mode 100644 index 000000000000..5fc6f67fc75b --- /dev/null +++ b/src/unsafe-deep-dive/ffi/language-differences/semantics.md @@ -0,0 +1,50 @@ +--- +minutes: 15 +--- + +# Different semantics + +```rust,editable +use std::ffi::{CStr, c_char}; +use std::time::{SystemTime, SystemTimeError, UNIX_EPOCH}; + +unsafe extern "C" { + /// Create a formatted time based on timestamp `t`. + fn ctime(t: *const libc::time_t) -> *const c_char; +} + +fn now_formatted() -> Result { + let now = SystemTime::now().duration_since(UNIX_EPOCH)?; + let seconds = now.as_secs() as i64; + + // SAFETY: `seconds` is generated by the system clock and will not cause + // overflow + let ptr = unsafe { ctime(&seconds) }; + + // SAFETY: ctime returns a pointer to a preallocated (non-null) buffer + let ptr = unsafe { CStr::from_ptr(ptr) }; + + // SAFETY: ctime uses valid UTF-8 + let fmt = ptr.to_str().unwrap(); + + Ok(fmt.trim_end().to_string()) +} + +fn main() { + let t = now_formatted(); + println!("{t:?}"); +} +``` + +
+ +Some constructs that other languages allow cannot be expressed in the Rust +language. + +The `ctime` function modifies an internal buffer shared between calls. This +cannot be represented as Rust’s lifetimes. + +- `'static` does not apply, as the semantics are different +- `'a` does not apply, as the buffer outlives each call + +
diff --git a/src/unsafe-deep-dive/ffi/language-interop.md b/src/unsafe-deep-dive/ffi/language-interop.md new file mode 100644 index 000000000000..a20e797a87e4 --- /dev/null +++ b/src/unsafe-deep-dive/ffi/language-interop.md @@ -0,0 +1,32 @@ +# Language Interop + +Ideal scenario: + +```bob +╭────────────╮ ╭────────────╮ +│ │ │ │ +│ │ <--------------------------------------> │ │ +│ │ │ │ +╰────────────╯ ╰────────────╯ + Rust "C++" +``` + +
+ +This section of the course covers interacting with Rust and external languages +via its foreign-function interface (FFI), with a special focus on +interoperability with C++. + +Ideally, users of Rust and the external language (in this case C++) could call +each others’ methods directly. + +This ideal scenario is very difficult to achieve: + +Different languages have different semantics and mapping between them implies +trade offs Neither Rust nor C++ offer ABI stability[^1], making it difficult to +build from a stable foundation + +[^1]: Some C++ compiler vendors provide support for ABI stability within their + toolchain. + +
diff --git a/src/unsafe-deep-dive/ffi/rand.md b/src/unsafe-deep-dive/ffi/rand.md new file mode 100644 index 000000000000..db2d6b35fdb2 --- /dev/null +++ b/src/unsafe-deep-dive/ffi/rand.md @@ -0,0 +1,48 @@ +--- +minutes: 15 +--- + +# Wrapping `srand(3)` and `rand(3)` + +```rust,editable +use libc::{rand, srand}; + +// unsafe extern "C" { +// /// Seed the rng +// fn srand(seed: std::ffi::c_uint); + +// fn rand() -> std::ffi::c_int; +// } + +fn main() { + unsafe { srand(12345) }; + + let a = unsafe { rand() as i32 }; + println!("{a:?}"); +} +``` + +
+ +This slide attempts to demonstrate that it is very easy for wrappers to trigger +undefined behavior if they are written incorrectly. We’ll see how easy it is too +trigger type safety problems. + +Explain that `rand` and `srand` functions are provided by the C standard library +(libc). + +Explain that the functions are exported by the libc crate, but we can also write +a FFI wrapper for them manually. + +Show calling the functions from the exported. + +Code compiles because libc is linked to Rust programs by default. + +Explain that Rust will trust you if you use the wrong type(s). + +Modify `fn rand() -> std::ffi::c_int;` to return `char`. + +Avoiding type safety issues is a reason for using tools for generating wrappers, +rather than doing it by hand. + +
diff --git a/src/unsafe-deep-dive/ffi/strategies.md b/src/unsafe-deep-dive/ffi/strategies.md new file mode 100644 index 000000000000..b517b26be297 --- /dev/null +++ b/src/unsafe-deep-dive/ffi/strategies.md @@ -0,0 +1,64 @@ +--- +minutes: 5 +--- + +# Strategies of interop + + + +Sharing data structures and symbols directly is very difficult: + +```bob +╭────────────╮ ╭────────────╮ +│ │ │ │ +│ │ <--------------------------------------> │ │ +│ │ │ │ +╰────────────╯ ╰────────────╯ + Rust "C++" +``` + +FFI through the C ABI is much more feasible: + +```bob +╭────────────╮ ╭───╮ ╭───╮ ╭────────────╮ +│ │ │ │ │ │ │ │ +│ │ <-----> │ │ <~~~~~~~> │ │ <------> │ │ +│ │ │ │ │ │ │ │ +╰────────────╯ ╰───╯ ╰───╯ ╰────────────╯ + Rust C C "C++" +``` + +Other strategies: + +- Distributed system (RPC) +- Custom ABI (i.e. WebAssembly Interface Types) + +
+ +_High-fidelity interop_ + +The ideal scenario is currently experimental. + +Two projects exploring this areaare [crubit](https://github.com/google/crubit) +and [Zngur](https://hkalbasi.github.io/zngur/). The first provides glue code on +each side for enabling compatible types to work seamlessly across domains. The +second relies on dynamic dispatch and imports C++ objects into Rust as trait +objects. + +_Low-fidelity interop_ work through a C API + +The typical strategy for interop is to use the C language as the interface. C is +a lossy codec and this strategy typically results in unergonomic code on both +side. + +_Other strategies_ are less viable in a zero cost environment. + +_Distributed systems_ impose runtime costs. + +They incur significant overhead as calling a method in a foreign library incurs +a round trip of serialization/transport/deserialization. Generally speaking, a +transparent RPC is not a good idea. There’s network in the middle. + +_Custom ABI_, such as wasm require a runtime or significant implementation cost. + +
diff --git a/src/unsafe-deep-dive/ffi/type-safety.md b/src/unsafe-deep-dive/ffi/type-safety.md new file mode 100644 index 000000000000..910f799a4244 --- /dev/null +++ b/src/unsafe-deep-dive/ffi/type-safety.md @@ -0,0 +1 @@ +# Consideration: Type Safety diff --git a/src/unsafe-deep-dive/foundations.md b/src/unsafe-deep-dive/foundations.md deleted file mode 100644 index b81d9de12c1e..000000000000 --- a/src/unsafe-deep-dive/foundations.md +++ /dev/null @@ -1,5 +0,0 @@ -# Foundations - -Some fundamental concepts and terms. - -{{%segment outline}} diff --git a/src/unsafe-deep-dive/foundations/actions-might-not-be.md b/src/unsafe-deep-dive/foundations/actions-might-not-be.md deleted file mode 100644 index fd9f60d790e6..000000000000 --- a/src/unsafe-deep-dive/foundations/actions-might-not-be.md +++ /dev/null @@ -1,19 +0,0 @@ ---- -minutes: 2 ---- - -# ... but actions on them might not be - -```rust -fn main() { - let n: i64 = 12345; - let safe = &n as *const _; - println!("{safe:p}"); -} -``` - -
- -Modify the example to de-reference `safe` without an `unsafe` block. - -
diff --git a/src/unsafe-deep-dive/foundations/data-structures-are-safe.md b/src/unsafe-deep-dive/foundations/data-structures-are-safe.md deleted file mode 100644 index e298edaa09df..000000000000 --- a/src/unsafe-deep-dive/foundations/data-structures-are-safe.md +++ /dev/null @@ -1,25 +0,0 @@ ---- -minutes: 2 ---- - -# Data structures are safe ... - -Data structures are inert. They cannot do any harm by themselves. - -Safe Rust code can create raw pointers: - -```rust -fn main() { - let n: i64 = 12345; - let safe = &raw const n; - println!("{safe:p}"); -} -``` - -
- -Consider a raw pointer to an integer, i.e., the value `safe` is the raw pointer -type `*const i64`. Raw pointers can be out-of-bounds, misaligned, or be null. -But the unsafe keyword is not required when creating them. - -
diff --git a/src/unsafe-deep-dive/foundations/less-powerful.md b/src/unsafe-deep-dive/foundations/less-powerful.md deleted file mode 100644 index cc2d795b8d09..000000000000 --- a/src/unsafe-deep-dive/foundations/less-powerful.md +++ /dev/null @@ -1,52 +0,0 @@ ---- -minutes: 10 ---- - -# Less powerful than it seems - -The `unsafe` keyword does not allow you to break Rust. - -```rust,ignore -use std::mem::transmute; - -let orig = b"RUST"; -let n: i32 = unsafe { transmute(orig) }; - -println!("{n}") -``` - -
- -## Suggested outline - -- Request that someone explains what `std::mem::transmute` does -- Discuss why it doesn't compile -- Fix the code - -## Expected compiler output - -```ignore - Compiling playground v0.0.1 (/playground) -error[E0512]: cannot transmute between types of different sizes, or dependently-sized types - --> src/main.rs:5:27 - | -5 | let n: i32 = unsafe { transmute(orig) }; - | ^^^^^^^^^ - | - = note: source type: `&[u8; 4]` (64 bits) - = note: target type: `i32` (32 bits) -``` - -## Suggested change - -```diff -- let n: i32 = unsafe { transmute(orig) }; -+ let n: i64 = unsafe { transmute(orig) }; -``` - -## Notes on less familiar Rust - -- the `b` prefix on a string literal marks it as byte slice (`&[u8]`) rather - than a string slice (`&str`) - -
diff --git a/src/unsafe-deep-dive/foundations/what-is-unsafe.md b/src/unsafe-deep-dive/foundations/what-is-unsafe.md deleted file mode 100644 index 8af083ac3fac..000000000000 --- a/src/unsafe-deep-dive/foundations/what-is-unsafe.md +++ /dev/null @@ -1,98 +0,0 @@ ---- -minutes: 6 ---- - -# What is “unsafety”? - -Unsafe Rust is a superset of Safe Rust. - -Let's create a list of things that are enabled by the `unsafe` keyword. - -
- -## Definitions from authoritative docs: - -From the [unsafe keyword's documentation](): - -> Code or interfaces whose memory safety cannot be verified by the type system. -> -> ... -> -> Here are the abilities Unsafe Rust has in addition to Safe Rust: -> -> - Dereference raw pointers -> - Implement unsafe traits -> - Call unsafe functions -> - Mutate statics (including external ones) -> - Access fields of unions - -From the [reference](https://doc.rust-lang.org/reference/unsafety.html) - -> The following language level features cannot be used in the safe subset of -> Rust: -> -> - Dereferencing a raw pointer. -> - Reading or writing a mutable or external static variable. -> - Accessing a field of a union, other than to assign to it. -> - Calling an unsafe function (including an intrinsic or foreign function). -> - Calling a safe function marked with a target_feature from a function that -> does not have a target_feature attribute enabling the same features (see -> attributes.codegen.target_feature.safety-restrictions). -> - Implementing an unsafe trait. -> - Declaring an extern block. -> - Applying an unsafe attribute to an item. - -## Group exercise - -> You may have a group of learners who are not familiar with each other yet. -> This is a way for you to gather some data about their confidence levels and -> the psychological safety that they're feeling. - -### Part 1: Informal definition - -> Use this to gauge the confidence level of the group. If they are uncertain, -> then tailor the next section to be more directed. - -Ask the class: **By raising your hand, indicate if you would feel comfortable -defining unsafe?** - -If anyone's feeling confident, allow them to try to explain. - -### Part 2: Evidence gathering - -Ask the class to spend 3-5 minutes. - -- Find a use of the unsafe keyword. What contract/invariant/pre-condition is - being established or satisfied? -- Write down terms that need to be defined (unsafe, memory safety, soundness, - undefined behavior) - -### Part 3: Write a working definition - -### Part 4: Remarks - -Mention that we'll be reviewing our definition at the end of the day. - -## Note: Avoid detailed discussion about precise semantics of memory safety - -It's possible that the group will slide into a discussion about the precise -semantics of what memory safety actually is and how define pointer validity. -This isn't a productive line of discussion. It can undermine confidence in less -experienced learners. - -Perhaps refer people who wish to discuss this to the discussion within the -official [documentation for pointer types] (excerpt below) as a place for -further research. - -> Many functions in [this module] take raw pointers as arguments and read from -> or write to them. For this to be safe, these pointers must be _valid_ for the -> given access. -> -> ... -> -> The precise rules for validity are not determined yet. - -[this module]: https://doc.rust-lang.org/std/ptr/index.html -[documentation for pointer types]: https://doc.rust-lang.org/std/ptr/index.html#safety - -
diff --git a/src/unsafe-deep-dive/foundations/when-is-unsafe-used.md b/src/unsafe-deep-dive/foundations/when-is-unsafe-used.md deleted file mode 100644 index 955c17beb593..000000000000 --- a/src/unsafe-deep-dive/foundations/when-is-unsafe-used.md +++ /dev/null @@ -1,48 +0,0 @@ ---- -minutes: 2 ---- - -# When is unsafe used? - -The unsafe keyword indicates that the programmer is responsible for upholding -Rust's safety guarantees. - -The keyword has two roles: - -- define pre-conditions that must be satisfied -- assert to the compiler (= promise) that those defined pre-conditions are - satisfied - -## Further references - -- [The unsafe keyword chapter of the Rust Reference](https://doc.rust-lang.org/reference/unsafe-keyword.html) - -
- -Places where pre-conditions can be defined (Role 1) - -- [unsafe functions] (`unsafe fn foo() { ... }`). Example: `get_unchecked` - method on slices, which requires callers to verify that the index is - in-bounds. -- unsafe traits (`unsafe trait`). Examples: [`Send`] and [`Sync`] marker traits - in the standard library. - -Places where pre-conditions must be satisfied (Role 2) - -- unsafe blocks (`unafe { ... }`) -- implementing unsafe traits (`unsafe impl`) -- access external items (`unsafe extern`) -- adding - [unsafe attributes](https://doc.rust-lang.org/reference/attributes.html) o an - item. Examples: [`export_name`], [`link_section`] and [`no_mangle`]. Usage: - `#[unsafe(no_mangle)]` - -[unsafe functions]: https://doc.rust-lang.org/reference/unsafe-keyword.html#unsafe-functions-unsafe-fn -[unsafe traits]: https://doc.rust-lang.org/reference/unsafe-keyword.html#unsafe-traits-unsafe-trait -[`export_name`]: https://doc.rust-lang.org/reference/abi.html#the-export_name-attribute -[`link_section`]: https://doc.rust-lang.org/reference/abi.html#the-link_section-attribute -[`no_mangle`]: https://doc.rust-lang.org/reference/abi.html#the-no_mangle-attribute -[`Send`]: https://doc.rust-lang.org/std/marker/trait.Send.html -[`Sync`]: https://doc.rust-lang.org/std/marker/trait.Sync.html - -
diff --git a/src/unsafe-deep-dive/initialization.md b/src/unsafe-deep-dive/initialization.md new file mode 100644 index 000000000000..7597c9752195 --- /dev/null +++ b/src/unsafe-deep-dive/initialization.md @@ -0,0 +1 @@ +# Initialization diff --git a/src/unsafe-deep-dive/initialization/how-to-initialize-memory.md b/src/unsafe-deep-dive/initialization/how-to-initialize-memory.md new file mode 100644 index 000000000000..8a51fb5d7f19 --- /dev/null +++ b/src/unsafe-deep-dive/initialization/how-to-initialize-memory.md @@ -0,0 +1,43 @@ +--- +minutes: 8 +--- + +# How to Initialize Memory + +Steps: + +1. Create `MaybeUninit` +2. Write a value to it +3. Notify Rust that the memory is initialized + +```rust,editable +use std::mem::MaybeUninit; + +fn main() { + // Step 1: create MaybeUninit + let mut uninit = MaybeUninit::uninit(); + + // Step 2: write a valid value to the memory + uninit.write(1); + + // Step 3: inform the type system that the memory location is valid + let init = unsafe { uninit.assume_init() }; + + println!("{init}"); +} +``` + +
+ +“To work with uninitialized memory, follow this general workflow: create, write, +confirm. + +“1. Create MaybeUninit. The `::uninit()` constructor is the most general +purpose one, but there are others which perform a write as well.” + +“2. Write a value of T. Notice that this is available from safe Rust. Staying in +safe Rust is useful because you must ensure that the value you write is valid. + +“3. Confirm init with the `.assume_init()` method.” + +
diff --git a/src/unsafe-deep-dive/initialization/maybeuninit-and-arrays.md b/src/unsafe-deep-dive/initialization/maybeuninit-and-arrays.md new file mode 100644 index 000000000000..836e326003dc --- /dev/null +++ b/src/unsafe-deep-dive/initialization/maybeuninit-and-arrays.md @@ -0,0 +1,17 @@ +# MaybeUninit and arrays + +```rust +use std::mem::MaybeUninit; + +fn main() { + let mut buf = [const { MaybeUninit::::uninit() }; 2048]; +} +``` + +
+ +Uninitialized memory often arrives as a pointer that we. + +Use `ptr::write` to initialize. + +
diff --git a/src/unsafe-deep-dive/initialization/maybeuninit.md b/src/unsafe-deep-dive/initialization/maybeuninit.md new file mode 100644 index 000000000000..f8d72a61505a --- /dev/null +++ b/src/unsafe-deep-dive/initialization/maybeuninit.md @@ -0,0 +1,32 @@ +--- +minutes: 8 +--- + +# MaybeUninit + +`MaybeUninit` allows Rust to refer to uninitialized memory. + +```rust,editable +use std::mem::MaybeUninit; + +fn main() { + let uninit = MaybeUninit::<&i32>::uninit(); + println!("{uninit:?}"); +} +``` + +
+ +“Safe Rust is unable to refer to data that’s potentially uninitialized” + +“Yet, all data arrives at the program as uninitialized.” + +“Therefore, we need some bridge in the type system to allow memory to +transition. `MaybeUninit` is that type.” + +It is very similar to the `Option` type, although its semantics are very +different. Its equivalent of None,uninitialized. + +Reading from memory that may be uninitialized is extremely dangerous. + +
diff --git a/src/unsafe-deep-dive/initialization/maybeuninit/arrays.md b/src/unsafe-deep-dive/initialization/maybeuninit/arrays.md new file mode 100644 index 000000000000..6b7101cf3b56 --- /dev/null +++ b/src/unsafe-deep-dive/initialization/maybeuninit/arrays.md @@ -0,0 +1,20 @@ +# Arrays of uninit + +```rust +use std::mem::{MaybeUninit, size_of}; + +fn main() { + let stack = [const { MaybeUninit::::uninit() }; 2048]; + assert!(std::size_of) + todo!("add size of assertion [u8; 2048]") + + let heap = Box::new_uninit::<[u8; 2048]>(); + +} +``` + +
+ +Rust supports some shorthand forms to create arrays of uninit. + +
diff --git a/src/unsafe-deep-dive/initialization/maybeuninit/write-vs-assignment.md b/src/unsafe-deep-dive/initialization/maybeuninit/write-vs-assignment.md new file mode 100644 index 000000000000..3371fc135c8a --- /dev/null +++ b/src/unsafe-deep-dive/initialization/maybeuninit/write-vs-assignment.md @@ -0,0 +1,32 @@ +# MaybeUninit.write() vs assignment + +```rust +use std::mem::MaybeUninit; + +fn main() { + let mut buf = [const { MaybeUninit::::uninit() }; 2048]; + + let external_data = b"Hello, Rust!"; + + for (dest, src) in buf.iter_mut().zip(external_data) { + // *dest = MaybeUninit::new(*src) + + // W + dest.write(*src); + } + + todo!() +} +``` + +
+ +“When writing data with `MaybeUninit.write()`, the old value is not dropped. + +“`MaybeUninit` does not call the destructor on its value, because the compiler +cannot guarantee that the value has been properly initialized. + +“This is different than what occurs on assignment. Assignment triggers a move, +which results in a bitwise copy. This can trigger memory leaks. + +
diff --git a/src/unsafe-deep-dive/initialization/maybeuninit/zeroed-method.md b/src/unsafe-deep-dive/initialization/maybeuninit/zeroed-method.md new file mode 100644 index 000000000000..ae0d408ead24 --- /dev/null +++ b/src/unsafe-deep-dive/initialization/maybeuninit/zeroed-method.md @@ -0,0 +1,30 @@ +# MaybeUninit::zeroed() + +```rust,editable +use std::mem::{MaybeUninit, transmute}; + +fn main() { + let mut x = [const { MaybeUninit::::zeroed() }; 10]; + + x[6].write(7); + + // SAFETY: All values of `x` have been written to + let x: [u32; 10] = unsafe { transmute(x) }; + println!("{x:?}") +} +``` + +
+ +“MaybeUninit::zeroed() is an alternative constructor to +MaybeUninit::uninit(). It instructs the compiler to fill the bits of T with +zeros.” + +Q: “Although the memory has been written to, the type remains `MaybeUninit`. +Can anyone think of why?” + +A: Some types require their values to be non-zero or non-null. The classic case +is references, but there other places where this applies. Consider +`NonZeroUsize` integer type and others in its family. + +
diff --git a/src/unsafe-deep-dive/initialization/partial-initialization.md b/src/unsafe-deep-dive/initialization/partial-initialization.md new file mode 100644 index 000000000000..f44d85288edd --- /dev/null +++ b/src/unsafe-deep-dive/initialization/partial-initialization.md @@ -0,0 +1,48 @@ +# Partial Initialization + +```rust +use std::mem::MaybeUninit; + +fn main() { + // let mut buf = [0u8; 2048]; + let mut buf = [const { MaybeUninit::::uninit() }; 2048]; + + let external_data = b"Hello, Rust!"; + let len = external_data.len(); + + for (dest, src) in buf.iter_mut().zip(external_data) { + dest.write(*src); + } + + // SAFETY: We initialized exactly 'len' bytes of `buf` with UTF-8 text + let text: &str = unsafe { + let ptr: *const u8 = buf.as_ptr().cast::(); + let init: &[u8] = std::slice::from_raw_parts(ptr, len); + std::str::from_utf8_unchecked(init) + }; + + println!("{text}"); +} +``` + +
+ +This code simulates receiving data from some external source. + +When reading bytes from an external source into a buffer, you typically don't +know how many bytes you'll receive. Using `MaybeUninit` lets you allocate the +buffer once without paying for a redundant initialization pass. + +If we were to create the array with the standard syntax (`buf = [0u8; 2048]`), +the whole buffer will be flushed with zeroes. `MaybeUninit` tells the +compiler to reserve space, but don't touch the memory yet. + +Q: Which part of the code snippet is performing a similar role to +`.assume_init()`? A: The pointer cast and the implicit read. + +We cannot call `assume_init()` on the whole array. That would be unsound because +most elements remain uninitialized. Instead, we cast the pointer from +`*const MaybeUninit` to `*const u8` and build a slice covering only the +initialised portion. + +
diff --git a/src/unsafe-deep-dive/introduction.md b/src/unsafe-deep-dive/introduction.md new file mode 100644 index 000000000000..0499fa01d783 --- /dev/null +++ b/src/unsafe-deep-dive/introduction.md @@ -0,0 +1,8 @@ +# Introduction + +We'll start our course by creating a shared understanding of what Unsafe Rust is +and what the `unsafe` keyword does. + +## Outline + +{{%segment outline}} diff --git a/src/unsafe-deep-dive/introduction/characteristics-of-unsafe-rust.md b/src/unsafe-deep-dive/introduction/characteristics-of-unsafe-rust.md new file mode 100644 index 000000000000..9c29271f0e86 --- /dev/null +++ b/src/unsafe-deep-dive/introduction/characteristics-of-unsafe-rust.md @@ -0,0 +1,5 @@ +# Characteristics of unsafe + +- [Dangerous](characteristics-of-unsafe-rust/dangerous.md) +- [Sometimes necessary](characteristics-of-unsafe-rust/sometimes-necessary.md) +- [Sometimes useful](characteristics-of-unsafe-rust/sometimes-useful.md) diff --git a/src/unsafe-deep-dive/introduction/characteristics-of-unsafe-rust/dangerous.md b/src/unsafe-deep-dive/introduction/characteristics-of-unsafe-rust/dangerous.md new file mode 100644 index 000000000000..f86f7381391c --- /dev/null +++ b/src/unsafe-deep-dive/introduction/characteristics-of-unsafe-rust/dangerous.md @@ -0,0 +1,25 @@ +--- +minutes: 2 +--- + +# Unsafe is dangerous + +> “Use-after-free (UAF), integer overflows, and out of bounds (OOB) reads/writes +> comprise 90% of vulnerabilities with OOB being the most common.” +> +> --- **Jeff Vander Stoep and Chong Zang**, Google. +> "[Queue the Hardening Enhancements][blog]" + +[blog]: https://security.googleblog.com/2019/05/queue-hardening-enhancements.html + +
+ +“The software industry has gathered lots of evidence that unsafe code is +difficult to write correctly and creates very serious problems.” + +“The issues in this list are eliminated by Rust. The unsafe keyword lets them +back into your source code.” + +“Be careful.” + +
diff --git a/src/unsafe-deep-dive/introduction/characteristics-of-unsafe-rust/sometimes-necessary.md b/src/unsafe-deep-dive/introduction/characteristics-of-unsafe-rust/sometimes-necessary.md new file mode 100644 index 000000000000..264d03d2d7fd --- /dev/null +++ b/src/unsafe-deep-dive/introduction/characteristics-of-unsafe-rust/sometimes-necessary.md @@ -0,0 +1,48 @@ +--- +minutes: 5 +--- + +# Unsafe is sometimes necessary + +The Rust compiler can only enforce its rules for code that it has compiled. + +```rust,editable +fn main() { + let pid = unsafe { libc::getpid() }; + println!("{pid}"); +} +``` + +
+ +“There are some activities that _require_ unsafe. + +“The Rust compiler cannot verify that external functions comply with Rust's +memory guarantees. Therefore, invoking external functions requires an unsafe +block.” + +Optional: + +“Working with the external environment often involves sharing memory. The +interface that computers provide is a memory address (a pointer).” + +“Here's an example that asks the Linux kernel to write to memory that we +control: + +```rust +fn main() { + let mut buf = [0u8; 8]; + let ptr = buf.as_mut_ptr() as *mut libc::c_void; + + let status = unsafe { libc::getrandom(ptr, buf.len(), 0) }; + if status > 0 { + println!("{buf:?}"); + } +} +``` + +“This FFI call reaches into the operating system to fill our buffer (`buf`). As +well as calling an external function, we must mark the boundary as `unsafe` +because the compiler cannot verify how the OS touches that memory.” + +
diff --git a/src/unsafe-deep-dive/introduction/characteristics-of-unsafe-rust/sometimes-useful.md b/src/unsafe-deep-dive/introduction/characteristics-of-unsafe-rust/sometimes-useful.md new file mode 100644 index 000000000000..8a2c2c707929 --- /dev/null +++ b/src/unsafe-deep-dive/introduction/characteristics-of-unsafe-rust/sometimes-useful.md @@ -0,0 +1,48 @@ +--- +minutes: 5 +--- + +# Unsafe is sometimes useful + +Your code can go faster! + +```rust,editable +fn iter_sum(xs: &[u64]) -> u64 { + xs.iter().sum() +} + +fn fast_sum(xs: &[u64]) -> u64 { + let mut acc = 0; + let mut i = 0; + unsafe { + while i < xs.len() { + acc += *xs.get_unchecked(i); + i += 1; + } + } + acc +} + +fn main() { + let data: Vec<_> = (0..1_000_000).collect(); + + let baseline = iter_sum(&data); + let unchecked = fast_sum(&data); + + assert_eq!(baseline, unchecked); +} +``` + +
+ +Code using `unsafe` _might_ be faster. + +`fast_sum()` skips skips bounds checks. However, benchmarking is necessary to +validate performance claims. For cases like this, Rust's iterators can usually +elide bounds checks anyway. + +Optional: [show identical generated assembly][godbolt] for the two functions. + +[godbolt]: https://rust.godbolt.org/z/d48v1Y5aj + +
diff --git a/src/unsafe-deep-dive/introduction/defining-unsafe-rust.md b/src/unsafe-deep-dive/introduction/defining-unsafe-rust.md new file mode 100644 index 000000000000..b4519d8f905c --- /dev/null +++ b/src/unsafe-deep-dive/introduction/defining-unsafe-rust.md @@ -0,0 +1,43 @@ +--- +minutes: 5 +--- + +# Defining Unsafe Rust + + + +```bob +╭───────────────────────────────────────────────────────────╮ +│╭─────────────────────────────────────────────────────────╮│ +││ ││ +││ Safe ││ +││ Rust ││ +││ ││ +││ ││ +│╰─────────╮ ││ +│ │ ││ +│ Unsafe │ ││ +│ Rust │ ││ +│ ╰───────────────────────────────────────────────╯│ +╰───────────────────────────────────────────────────────────╯ +``` + +
+ +“Unsafe Rust is a superset of Safe Rust.” + +“Unsafe Rust adds extra capabilities, such as allowing you to dereference raw +pointers and call functions that can break Rust’s safety guarantees if called +incorrectly.” + +“These extra capabilities are referred to as _unsafe operations_.” + +“Unsafe operations provide the foundation that the Rust standard library is +built on. For example, without the ability to dereference a raw pointer, it +would be impossible to implement `Vec` or `Box`.” + +“The compiler will still assist you while writing Unsafe Rust. Borrow checking +and type safety still apply. Unsafe operations have their own rules, which we’ll +learn about in this class.” + +
diff --git a/src/unsafe-deep-dive/introduction/definition.md b/src/unsafe-deep-dive/introduction/definition.md new file mode 100644 index 000000000000..308e2c8cc421 --- /dev/null +++ b/src/unsafe-deep-dive/introduction/definition.md @@ -0,0 +1,60 @@ +--- +minutes: 5 +--- + +# Defining Unsafe Rust + + + +```bob +╭───────────────────────────────────────────────────────────╮ +│╭─────────────────────────────────────────────────────────╮│ +││ ││ +││ Safe ││ +││ Rust ││ +││ ││ +││ ││ +│╰─────────╮ ││ +│ │ ││ +│ Unsafe │ ││ +│ Rust │ ││ +│ ╰───────────────────────────────────────────────╯│ +╰───────────────────────────────────────────────────────────╯ +``` + +
+ +“Unsafe Rust is a superset of Safe Rust.” + +“Unsafe Rust adds extra capabilities, such as allowing you to dereference raw +pointers and call functions that can break Rust’s safety guarantees if called +incorrectly.” + +“These extra capabilities are referred to as _unsafe operations_.” + +“Unsafe operations provide the foundation that the Rust standard library is +built on. For example, without the ability to dereference a raw pointer, it +would be impossible to implement `Vec` or `Box`.” + +“The compiler will still assist you while writing Unsafe Rust. Borrow checking +and type safety still apply. Unsafe operations have their own rules, which we’ll +learn about in this class.” + +The unsafe operations from the [Rust Reference] (Avoid spending too much time): + +> The following language level features cannot be used in the safe subset of +> Rust: +> +> - Dereferencing a raw pointer. +> - Reading or writing a mutable or unsafe external static variable. +> - Accessing a field of a union, other than to assign to it. +> - Calling an `unsafe` function. +> - Calling a safe function marked with a `` from a function +> that does not have a `` attribute enabling the same features +> - Implementing an unsafe trait. +> - Declaring an extern block. +> - Applying an unsafe attribute to an item. + +[Rust Reference]: https://doc.rust-lang.org/reference/unsafety.html + +
diff --git a/src/unsafe-deep-dive/introduction/impact-on-workflow.md b/src/unsafe-deep-dive/introduction/impact-on-workflow.md new file mode 100644 index 000000000000..7f48138a126a --- /dev/null +++ b/src/unsafe-deep-dive/introduction/impact-on-workflow.md @@ -0,0 +1,33 @@ +--- +minutes: 5 +--- + +# Impact on workflow + +While writing code + +- Verify that you understand the preconditions of any `unsafe` functions/traits +- Check that the preconditions are satisfied +- Document your reasoning in safety comments + +Enhanced code review + +- Self-review → peer reviewer → unsafe Rust expert (when needed) +- Escalate to a person who is comfortable with your code and reasoning + +
+ +“The unsafe keyword places more responsibility on the programmer, therefore it +requires a stronger development workflow. + +“This class assumes a specific software development workflow where code review +is mandatory, and where the author and primary reviewer have access to an unsafe +Rust expert.” + +“The author and primary reviewer will verify simple unsafe Rust code themselves, +and punt to an unsafe expert when necessary.” + +“There are only a few unsafe Rust experts, and they are very busy, so we need to +optimally use their time.” + +
diff --git a/src/unsafe-deep-dive/introduction/may_overflow.md b/src/unsafe-deep-dive/introduction/may_overflow.md new file mode 100644 index 000000000000..566368637e42 --- /dev/null +++ b/src/unsafe-deep-dive/introduction/may_overflow.md @@ -0,0 +1,51 @@ +--- +minutes: 10 +--- + +# Example: may_overflow function + +```rust,editable +/// Adds 2^31 - 1 to negative numbers. +unsafe fn may_overflow(a: i32) -> i32 { + a + i32::MAX +} + +fn main() { + let x = unsafe { may_overflow(123) }; + println!("{x}"); +} +``` + +
+ +“The `unsafe` keyword may have a subtly different meaning than what some people +assume.” + +“The code author believes that the code is correct. In principle, the code is +safe.” + +“In this toy example, the `may_overflow` function is only intended to be called +with negative numbers. + +Ask learners if they can explain why `may_overflow` requires the unsafe keyword. + +“In case you’re unsure what the problem is, let’s pause briefly to explain. An +`i32` only has 31 bits available for positive numbers. When an operation +produces a result that requires more than 31 bits, then the program is put into +an invalid state. And it’s not just a numerical problem. Compilers optimize code +on the basis that invalid states are impossible. This causes code paths to be +deleted, producing erratic runtime behavior while also introducing security +vulnerabilities. + +Compile and run the code, producing a panic. Then run the example in the +playground to run under `--release` mode to trigger UB. + +“This code can be used correctly, however, improper usage is highly dangerous.” + +“And it's impossible for the compiler to verify that the usage is correct.” + +This is what we mean when we say that the `unsafe` keyword marks the location +where responsibility for memory safety shifts from the compiler to the +programmer. + +
diff --git a/src/unsafe-deep-dive/introduction/purpose.md b/src/unsafe-deep-dive/introduction/purpose.md new file mode 100644 index 000000000000..6ca89529d790 --- /dev/null +++ b/src/unsafe-deep-dive/introduction/purpose.md @@ -0,0 +1,26 @@ +--- +minutes: 5 +--- + +# Why the unsafe keyword exists + +- Rust ensures safety +- But there are limits to what the compiler can do +- The unsafe keyword allows programmers to assume responsibility for Rust’s + rules + +
+ +“A fundamental goal of Rust is to ensure memory safety.” + +“But, there are limits. Some safety considerations cannot be expressed in a +programming language. Even if they could be, there are limits to what the Rust +compiler can control.” + +“The `unsafe` keyword shifts the burden of upholding Rust’s rules from the +compiler to the programmer.” + +“When you see the `unsafe` keyword, you are seeing responsibility shift from the +compiler to the programmer. + +
diff --git a/src/unsafe-deep-dive/introduction/responsibility-shift.md b/src/unsafe-deep-dive/introduction/responsibility-shift.md new file mode 100644 index 000000000000..59da5ff973c0 --- /dev/null +++ b/src/unsafe-deep-dive/introduction/responsibility-shift.md @@ -0,0 +1,32 @@ +--- +minutes: 3 +--- + +# Unsafe keyword shifts responsibility + +| | Is Memory Safe? | Responsibility for Memory Safety | +| :---------- | :-------------: | :------------------------------- | +| Safe Rust | Yes | Compiler | +| Unsafe Rust | Yes | Programmer | + +
+ +Who has responsibility for memory safety? + +- Safe Rust → compiler +- Unsafe Rust → programmer + +“While writing safe Rust, you cannot create memory safety problems. The compiler +will ensure that a program with mistakes will not build.” + +“The `unsafe` keyword shifts responsibility for maintaining memory safety from +the compiler to programmers. It signals that there are preconditions that must +be satisfied. + +“To uphold that responsibility, programmers must ensure that they've understood +what the preconditions are and that they code will always satisfy them. + +“Throughout this course, we'll use the term _safety preconditions_ to describe +this situation.” + +
diff --git a/src/unsafe-deep-dive/introduction/two-roles.md b/src/unsafe-deep-dive/introduction/two-roles.md new file mode 100644 index 000000000000..4c3771e1fb9e --- /dev/null +++ b/src/unsafe-deep-dive/introduction/two-roles.md @@ -0,0 +1,58 @@ +--- +minutes: 5 +--- + +# The unsafe keyword has two roles + +1. _Creating_ APIs with safety considerations + +unsafe functions: `unsafe fn get_unchecked(&self) { ... }` unsafe traits: +`unsafe trait Send {}` + +2. _Using_ APIs with safety considerations + + - invoking built-in unsafe operators: `unsafe { *ptr }` + - calling unsafe functions: `unsafe { x.get_unchecked() }` + - implementing unsafe traits: `unsafe impl Send for Counter {}` + +
+ +Two roles: + +1. **Creating** APIs with safety considerations and defining what needs to be + considered +2. **Using** APIs with safety considerations and confirming that the + consideration has been made + +### Creating APIs with safety considerations + +“First, the unsafe keyword enables you to create APIs that can break Rust’s +safety guarantees. Specifically, you need to use the unsafe keyword when +defining unsafe functions and unsafe traits. + +“When used in this role, you’re informing users of your API that they need to be +careful.” + +“The creator of the API should communicate what care needs to be taken. Unsafe +APIs are not complete without documentation about safety requirements.. Callers +need to know that they have satisfied any requirements, and that’s impossible if +they’re not written down.” + +### Using APIs with safety considerations + +“The unsafe keyword adopts its other role, using APIs, when it is used nearby to +a curly brace. + +“When used in this role, the unsafe keyword means that the author has been +careful. They have verified that the code is safe and is providing an assurance +to others.” + +“Unsafe blocks are most common. They allow you to invoke unsafe functions that +have been defined using the first role. + +“Unsafe blocks also allow you to perform operations which the compiler knows are +unsafe, such as dereferencing a raw pointer.” + +“You might also see the unsafe keyword being used to implement unsafe traits. + +
diff --git a/src/unsafe-deep-dive/introduction/warm-up.md b/src/unsafe-deep-dive/introduction/warm-up.md new file mode 100644 index 000000000000..c36687bc5e9c --- /dev/null +++ b/src/unsafe-deep-dive/introduction/warm-up.md @@ -0,0 +1,13 @@ +# Warm up examples + +Examples to demonstrate: + +- using an [unsafe block] (`unsafe { ... }`) +- defining an [unsafe function] (`unsafe fn`) +- [implementing] an unsafe trait (`unsafe impl { ... }`) +- defining an [unsafe trait] (`unsafe trait`) + +[unsafe block]: warm-up/unsafe-block.md +[unsafe function]: warm-up/unsafe-fn.md +[implementing]: warm-up/unsafe-impl.md +[unsafe trait]: warm-up/unsafe-trait.md diff --git a/src/unsafe-deep-dive/introduction/warm-up/unsafe-block.md b/src/unsafe-deep-dive/introduction/warm-up/unsafe-block.md new file mode 100644 index 000000000000..0958567679e6 --- /dev/null +++ b/src/unsafe-deep-dive/introduction/warm-up/unsafe-block.md @@ -0,0 +1,51 @@ +--- +minutes: 8 +--- + +# Using an unsafe block + +```rust,editable +fn main() { + let numbers = vec![0, 1, 2, 3, 4]; + let i = numbers.len() / 2; + + let x = *numbers.get_unchecked(i); + assert_eq!(i, x); +} +``` + +
+ +Walk through the code. Confirm that the audience is familiar with the +dereference operator. + +Attempt to compile the code, trigger the compiler error. + +Add the unsafe block: + +```rust +let x = unsafe { *numbers.get_unchecked(i) } +``` + +Prompt audience for a code review. Guide learners towards adding a safety +comment. + +Add the safety comment: + +```rust +// SAFETY: `i` must be within 0..a.len() +``` + +_Suggested Solution_ + +```rust +fn main() { + let numbers = vec![0, 1, 2, 3, 4]; + let i = a.len() / 2; + + let x = unsafe { *numbers.get_unchecked(i) } + assert!(i, x); +} +``` + +
diff --git a/src/unsafe-deep-dive/introduction/warm-up/unsafe-fn.md b/src/unsafe-deep-dive/introduction/warm-up/unsafe-fn.md new file mode 100644 index 000000000000..e7b86f0aecc3 --- /dev/null +++ b/src/unsafe-deep-dive/introduction/warm-up/unsafe-fn.md @@ -0,0 +1,62 @@ +--- +minutes: 8 +--- + +# Defining an unsafe function + +```rust,editable +/// Convert a nullable pointer to a reference. +/// +/// Returns `None` when `p` is null, otherwise wraps `val` in `Some`. +fn ptr_to_ref<'a, T>(ptr: *mut T) -> Option<&'a mut T> { + if ptr.is_null() { + None + } else { + // SAFETY: `ptr` is non-null + unsafe { Some(&mut *ptr) } + } +} +``` + +
+ +“This looks as though it’s safe code, however it actually requires an unsafe +block.” + +Highlight the dereference operation, i.e. `*p` within the unsafe block. + +“Callers must ensure that the `ptr` is null, or that it may be converted to a +reference. + +“It may be counter-intuitive, but many pointers cannot be converted to +references. + +“Among other issues, a pointer could be created that points to some arbitrary +bits rather than a valid value. That’s not something that Rust allows and +something that this function needs to protect itself against. + +“So we, as API designers, have two paths. We can either try to assume +responsibility for guarding against invalid inputs, or we can shift that +responsibility to the caller with the unsafe keyword.” + +“The first path is a difficult one. We’re accepting a generic type T, which is +all possible types that implement Sized. That’s a lot of types! + +“Therefore, the second path makes more sense. + +_Extra content (time permitting)_ + +“By the way, if you’re interested in the details of pointers and what the rules +of converting them to references are, the standard library has a lot of useful +documentation. You should also look into the source code of many of the methods +on std::pointer. + +“For example, the `ptr_to_ref` function on this slide actually exists in the +standard library as the `as_mut` method on pointers.” + +Open the documentation for [std::pointer.as_mut] and highlight the Safety +section. + +
+ +[std::pointer.as_mut]: https://doc.rust-lang.org/std/primitive.pointer.html#method.as_mut diff --git a/src/unsafe-deep-dive/introduction/warm-up/unsafe-impl.md b/src/unsafe-deep-dive/introduction/warm-up/unsafe-impl.md new file mode 100644 index 000000000000..db5dfc238c18 --- /dev/null +++ b/src/unsafe-deep-dive/introduction/warm-up/unsafe-impl.md @@ -0,0 +1,47 @@ +--- +minutes: 4 +--- + +# Implementing an unsafe trait + +```rust,editable +pub struct LogicalClock { + inner: std::sync::Arc, +} + +// ... + +impl Send for LogicalClock {} +impl Sync for LogicalClock {} +``` + +
+ +“Before we take a look at the code, we should double check that everyone knows +what a trait is. Is anyone able to explain traits for the rest of the class? + +- “Traits are often described as a way to create shared behavior. Thinking about + traits as shared behavior focuses on the syntax of methods and their + signatures. +- “There’s also a deeper way to think of traits: as sets of requirements. This + emphasizes the shared semantics of the implementing types. + +“Can anyone explain what the `Send` and `Sync` traits are? They have + +- If no + - “Send and Sync relate to concurrency. There are many details, but broadly + speaking. Send types are able to be shared between threads by value. Sync + traits are able to be shared by reference. + - There are many rules to follow to ensure that it’s safe to share data across + thread boundaries. Those rules cannot be checked by the compiler, and + therefore . + - Arc implements Send and Sync, therefore it’s safe for our clock to as well. + - It may be useful to point out that the word atomic has “indivisible” or + “whole” from Ancient Greek, rather than the contemporary English sense of + “tiny particle”. + +“Now let’s define our own unsafe trait.” + +Transition to next slide. + +
diff --git a/src/unsafe-deep-dive/introduction/warm-up/unsafe-trait.md b/src/unsafe-deep-dive/introduction/warm-up/unsafe-trait.md new file mode 100644 index 000000000000..e823a1b3c485 --- /dev/null +++ b/src/unsafe-deep-dive/introduction/warm-up/unsafe-trait.md @@ -0,0 +1,26 @@ +--- +minutes: 5 +--- + +# Defining an unsafe trait + +```rust,editable +/// Indicates that the type uses 32 bits of memory. +pub trait Size32 {} +``` + +
+ +“Now let’s define our own unsafe trait.” + +Add the unsafe keyword and compile the code. + +“If the requirements of the trait are semantic, then your trait may not need any +methods at all. The documentation is essential, however.” + +“Traits without methods are called marker traits. When implementing them for +types, you are adding information to the type system. You have now given the +compiler the ability to talk about types that meet the requirements described in +the documentation.” + +
diff --git a/src/unsafe-deep-dive/memory-lifecycle.md b/src/unsafe-deep-dive/memory-lifecycle.md new file mode 100644 index 000000000000..3615628408b7 --- /dev/null +++ b/src/unsafe-deep-dive/memory-lifecycle.md @@ -0,0 +1,24 @@ +# Memory Lifecycle + +Memory moves through different phases as objects (values) are created and +destroyed + +| Memory State | Readable from Safe Rust? | +| ------------ | ------------------------ | +| Available | No | +| Allocated | No | +| Initialized | Yes | + +
+ +“This section discusses what it happens as memory from the operating system +becomes a valid variable in the program. + +When memory is available, the operating system has provided our program with it. + +When memory is allocated, it is reserved for values to be written to it. We call +this uninitialized memory. + +When memory is initialized, it is safe to read from. + +
diff --git a/src/unsafe-deep-dive/motivations.md b/src/unsafe-deep-dive/motivations.md index c4acb81994d1..30df07082d6a 100644 --- a/src/unsafe-deep-dive/motivations.md +++ b/src/unsafe-deep-dive/motivations.md @@ -2,7 +2,7 @@ minutes: 1 --- -# Motivations +# Why for learning unsafe Rust We know that writing code without the guarantees that Rust provides ... @@ -14,6 +14,8 @@ We know that writing code without the guarantees that Rust provides ... ... so why is `unsafe` part of the language? +## Outline + {{%segment outline}}
diff --git a/src/unsafe-deep-dive/motivations/data-structures.md b/src/unsafe-deep-dive/motivations/data-structures.md deleted file mode 100644 index 0e1b3a84697d..000000000000 --- a/src/unsafe-deep-dive/motivations/data-structures.md +++ /dev/null @@ -1,30 +0,0 @@ ---- -minutes: 5 ---- - -# Data Structures - -Some families of data structures are impossible to create in safe Rust. - -- graphs -- bit twiddling -- self-referential types -- intrusive data structures - -
- -Graphs: General-purpose graphs cannot be created as they may need to represent -cycles. Cycles are impossible for the type system to reason about. - -Bit twiddling: Overloading bits with multiple meanings. Examples include using -the NaN bits in `f64` for some other purpose or the higher-order bits of -pointers on `x86_64` platforms. This is somewhat common when writing language -interpreters to keep representations within the word size the target platform. - -Self-referential types are too hard for the borrow checker to verify. - -Intrusive data structures: store structural metadata (like pointers to other -elements) inside the elements themselves, which requires careful handling of -aliasing. - -
diff --git a/src/unsafe-deep-dive/motivations/interop.md b/src/unsafe-deep-dive/motivations/interop.md deleted file mode 100644 index 85505b21fb35..000000000000 --- a/src/unsafe-deep-dive/motivations/interop.md +++ /dev/null @@ -1,243 +0,0 @@ ---- -minutes: 5 ---- - -> TODO: Refactor this content into multiple slides as this slide is intended as -> an introduction to the motivations only, rather than to be an elaborate -> discussion of the whole problem. - -# Interoperability - -Language interoperability allows you to: - -- Call functions written in other languages from Rust -- Write functions in Rust that are callable from other languages - -However, this requires unsafe. - -```rust,editable,ignore -unsafe extern "C" { - safe fn random() -> libc::c_long; -} - -fn main() { - let a = random() as i64; - println!("{a:?}"); -} -``` - -
- -The Rust compiler can't enforce any safety guarantees for programs that it -hasn't compiled, so it delegates that responsibility to you through the unsafe -keyword. - -The code example we're seeing shows how to call the random function provided by -libc within Rust. libc is available to scripts in the Rust Playground. - -This uses Rust's _foreign function interface_. - -This isn't the only style of interoperability, however it is the method that's -needed if you want to work between Rust and some other language in a zero cost -way. Another important strategy is message passing. - -Message passing avoids unsafe, but serialization, allocation, data transfer and -parsing all take energy and time. - -## Answers to questions - -- _Where does "random" come from?_\ - libc is dynamically linked to Rust programs by default, allowing our code to - rely on its symbols, including `random`, being available to our program. -- _What is the "safe" keyword?_\ - It allows callers to call the function without needing to wrap that call in - `unsafe`. The [`safe` function qualifier][safe] was introduced in the 2024 - edition of Rust and can only be used within `extern` blocks. It was introduced - because `unsafe` became a mandatory qualifier for `extern` blocks in that - edition. -- _What is the [`std::ffi::c_long`] type?_\ - According to the C standard, an integer that's at least 32 bits wide. On - today's systems, It's an `i32` on Windows and an `i64` on Linux. - -[`std::ffi::c_long`]: https://doc.rust-lang.org/std/ffi/type.c_long.html -[safe]: https://doc.rust-lang.org/stable/edition-guide/rust-2024/unsafe-extern.html - -## Consideration: type safety - -Modify the code example to remove the need for type casting later. Discuss the -potential UB - long's width is defined by the target. - -```rust -unsafe extern "C" { - safe fn random() -> i64; -} - -fn main() { - let a = random(); - println!("{a:?}"); -} -``` - -> Changes from the original: -> -> ```diff -> unsafe extern "C" { -> - safe fn random() -> libc::c_long; -> + safe fn random() -> i64; -> } -> -> fn main() { -> - let a = random() as i64; -> + let a = random(); -> println!("{a:?}"); -> } -> ``` - -It's also possible to completely ignore the intended type and create undefined -behavior in multiple ways. The code below produces output most of the time, but -generally results in a stack overflow. It may also produce illegal `char` -values. Although `char` is represented in 4 bytes (32 bits), -[not all bit patterns are permitted as a `char`][char]. - -Stress that the Rust compiler will trust that the wrapper is telling the truth. - -[char]: https://doc.rust-lang.org/std/primitive.char.html#validity-and-layout - - - -```rust,ignore -unsafe extern "C" { - safe fn random() -> [char; 2]; -} - -fn main() { - let a = random(); - println!("{a:?}"); -} -``` - -> Changes from the original: -> -> ```diff -> unsafe extern "C" { -> - safe fn random() -> libc::c_long; -> + safe fn random() -> [char; 2]; -> } -> -> fn main() { -> - let a = random() as i64; -> - println!("{a}"); -> + let a = random(); -> + println!("{a:?}"); -> } -> ``` - -> Attempting to print a `[char; 2]` from randomly generated input will often -> produce strange output, including: -> -> ```ignore -> thread 'main' panicked at library/std/src/io/stdio.rs:1165:9: -> failed printing to stdout: Bad address (os error 14) -> ``` -> -> ```ignore -> thread 'main' has overflowed its stack -> fatal runtime error: stack overflow, aborting -> ``` - -Mention that type safety is generally not a large concern in practice. Tools -that produce wrappers automatically, i.e. bindgen, are excellent at reading -header files and producing values of the correct type. - -## Consideration: Ownership and lifetime management - -While libc's `random` function doesn't use pointers, many do. This creates many -more possibilities for unsoundness. - -- both sides might attempt to free the memory (double free) -- both sides can attempt to write to the data - -For example, some C libraries expose functions that write to static buffers that -are re-used between calls. - - - - - -```rust,ignore -use std::ffi::{CStr, c_char}; -use std::time::{SystemTime, UNIX_EPOCH}; - -unsafe extern "C" { - /// Create a formatted time based on time `t`, including trailing newline. - /// Read `man 3 ctime` details. - fn ctime(t: *const libc::time_t) -> *const c_char; -} - -unsafe fn format_timestamp<'a>(t: u64) -> &'a str { - let t = t as libc::time_t; - - unsafe { - let fmt_ptr = ctime(&t); - CStr::from_ptr(fmt_ptr).to_str().unwrap() - } -} - -fn main() { - let now = SystemTime::now().duration_since(UNIX_EPOCH).unwrap(); - - let now = now.as_secs(); - let now_fmt = unsafe { format_timestamp(now) }; - print!("now (1): {}", now_fmt); - - let future = now + 60; - let future_fmt = unsafe { format_timestamp(future) }; - print!("future: {}", future_fmt); - - print!("now (2): {}", now_fmt); -} -``` - -> _Aside:_ Lifetimes in the `format_timestamp()` function -> -> Neither `'a`, nor `'static`, correctly describe the lifetime of the string -> that's returned. Rust treats it as an immutable reference, but subsequent -> calls to `ctime` will overwrite the static buffer that the string occupies. - -## Consideration: Representation mismatch - -Different programming languages have made different design decisions and this -can create impedance mismatches between different domains. - -Consider string handling. C++ defines `std::string`, which has an incompatible -memory layout with Rust's `String` type. `String` also requires text to be -encoded as UTF-8, whereas `std::string` does not. In C, text is represented by a -null-terminated sequence of bytes (`char*`). - -```rust -fn main() { - let c_repr = b"Hello, C\0"; - let rust_repr = (b"Hello, Rust", 11); - - let c: &str = unsafe { - let ptr = c_repr.as_ptr() as *const i8; - std::ffi::CStr::from_ptr(ptr).to_str().unwrap() - }; - println!("{c}"); - - let rust: &str = unsafe { - let ptr = rust_repr.0.as_ptr(); - let bytes = std::slice::from_raw_parts(ptr, rust_repr.1); - std::str::from_utf8_unchecked(bytes) - }; - println!("{rust}"); -} -``` - -
diff --git a/src/unsafe-deep-dive/motivations/performance.md b/src/unsafe-deep-dive/motivations/performance.md deleted file mode 100644 index 0b32e8600afa..000000000000 --- a/src/unsafe-deep-dive/motivations/performance.md +++ /dev/null @@ -1,10 +0,0 @@ ---- -minutes: 5 ---- - -# Performance - -> TODO: Stub for now - -It's easy to think of performance as the main reason for unsafe, but high -performance code makes up the minority of unsafe blocks. diff --git a/src/unsafe-deep-dive/pinning.md b/src/unsafe-deep-dive/pinning.md new file mode 100644 index 000000000000..768bc818311b --- /dev/null +++ b/src/unsafe-deep-dive/pinning.md @@ -0,0 +1,34 @@ +# Pinning + +This segment of the course covers: + +- What "pinning" is +- Why it is necessary +- How Rust implements it +- How it interacts with unsafe and FFI + +## Outline + +{{%segment outline}} + +
+ +"Pinning, or holding a value's memory address in a fixed location,is one of the +more challenging concepts in Rust." + +"Normally only seen within async code, i.e. +[`poll(self: Pin<&mut Self>)`][poll], pinning has wider applicability." + +Some data structures that are difficult or impossible to write without the +unsafe keyword, including self-referential structs and intrusive data +structures. + +FFI with C++ is a prominent use case that's related to this. Rust must assume +that any C++ with a reference might be a self-referential data structure. + +"To understand this conflict in more detail, we'll first need to make sure that +we have a strong understanding of Rust's move semantics." + +
+ +[poll]: https://doc.rust-lang.org/std/future/trait.Future.html#tymethod.poll diff --git a/src/unsafe-deep-dive/pinning/README.md b/src/unsafe-deep-dive/pinning/README.md new file mode 100644 index 000000000000..384124da357b --- /dev/null +++ b/src/unsafe-deep-dive/pinning/README.md @@ -0,0 +1,19 @@ +# pinning + +> **Important Note** +> +> To not add this section to the project's SUMMARY.md yet. Once CLs/PRs to +> accept all the new segments for the Unsafe Deep Dive have been included in the +> repository, an update to SUMMARY.md will be made. + +## About + +This segment explains pinning, Rust's `Pin` type and concepts that relate +to FFI rather than its async use case. Treatment of the `Unpin` trait and the +`PhantomPinned` type is provided. + +## Status + +Provisional/beta. + +## Outline diff --git a/src/unsafe-deep-dive/pinning/definition-of-pin.md b/src/unsafe-deep-dive/pinning/definition-of-pin.md new file mode 100644 index 000000000000..0a46acb76e05 --- /dev/null +++ b/src/unsafe-deep-dive/pinning/definition-of-pin.md @@ -0,0 +1,37 @@ +--- +minutes: 5 +--- + +# Definition of Pin + +```rust,ignore +#[repr(transparent)] +pub struct Pin { + pointer: Ptr, +} + +impl> Pin { + pub fn new(pointer: Ptr) -> Pin { ... } +} + +impl Pin { + pub unsafe fn new_unchecked(pointer: Ptr) -> Pin { ... } +} +``` + +
+ +`Pin` is a minimal wrapper around a _pointer type_, which is defined as a type +that implements `Deref`. + +However, `Pin::new()` only accepts types that dereference into a target that +implements `Unpin` (`Deref`). This allows `Pin` to rely on the +the type system to enforce its guarantees. + +Types that do not implement `Unpin`, i.e. types that require pinning, must +create a `Pin` via the unsafe `Pin::new_unchecked()`. + +Aside: Unlike other `new()`/`new_unchecked()` method pairs, `new` does not do +any runtime checking. The check is a zero-cost compile-time check. + +
diff --git a/src/unsafe-deep-dive/pinning/drop-and-not-unpin-worked-example.md b/src/unsafe-deep-dive/pinning/drop-and-not-unpin-worked-example.md new file mode 100644 index 000000000000..d41a89868139 --- /dev/null +++ b/src/unsafe-deep-dive/pinning/drop-and-not-unpin-worked-example.md @@ -0,0 +1,112 @@ +--- +minutes: 5 +--- + +# Worked example Implementing `Drop` for `!Unpin` types + +```rust,editable,ignore +use std::cell::RefCell; +use std::marker::PhantomPinned; +use std::mem; +use std::pin::Pin; + +thread_local! { + static BATCH_FOR_PROCESSING: RefCell> = RefCell::new(Vec::new()); +} + +#[derive(Debug)] +struct CustomString(String); + +#[derive(Debug)] +struct SelfRef { + data: CustomString, + ptr: *const CustomString, + _pin: PhantomPinned, +} + +impl SelfRef { + fn new(data: &str) -> Pin> { + let mut boxed = Box::pin(SelfRef { + data: CustomString(data.to_owned()), + ptr: std::ptr::null(), + _pin: PhantomPinned, + }); + + let ptr: *const CustomString = &boxed.data; + unsafe { + Pin::get_unchecked_mut(Pin::as_mut(&mut boxed)).ptr = ptr; + } + boxed + } +} + +impl Drop for SelfRef { + fn drop(&mut self) { + // SAFETY: Safe because we are reading bytes from a String + let payload = unsafe { std::ptr::read(&self.data) }; + BATCH_FOR_PROCESSING.with(|log| log.borrow_mut().push(payload.0)); + } +} + +fn main() { + let pinned = SelfRef::new("Rust 🦀"); + drop(pinned); + + BATCH_FOR_PROCESSING.with(|batch| { + println!("Batch: {:?}", batch.borrow()); + }); +} +``` + +
+ +This example uses the `Drop` trait to add data for some post-processing, such as +telemetry or logging. + +**The Safety comment is incorrect.** `ptr::read` creates a bitwise copy, leaving +`self.data` in an invalid state. `self.data` will be dropped again at the end of +the method, which is a double free. + +Ask the class for fix the code. + +**Suggestion 0: Redesign** + +Redesign the post-processing system to work without `Drop`. + +**Suggestion 1: Clone** + +Using `.clone()` is an obvious first choice, but it allocates memory. + +```rust,ignore +impl Drop for SelfRef { + fn drop(&mut self) { + let payload = self.data.0.clone(); + BATCH_FOR_PROCESSING.with(|log| log.borrow_mut().push(payload)); + } +} +``` + +**Suggestion 2: ManuallyDrop** + +Wrapping `CustomString` in `ManuallyDrop` prevents the (second) automatic drop +at the end of the `Drop` impl. + +```rust,ignore +struct SelfRef { + data: ManuallyDrop, + ptr: *const CustomString, + _pin: PhantomPinned, +} + +// ... + +impl Drop for SelfRef { + fn drop(&mut self) { + // SAFETY: self.data + let payload = unsafe { ManuallyDrop::take(&mut self.data) }; + BATCH_FOR_PROCESSING.with(|log| log.borrow_mut().push(payload.0)); + } +} +``` + +
diff --git a/src/unsafe-deep-dive/pinning/phantompinned.md b/src/unsafe-deep-dive/pinning/phantompinned.md new file mode 100644 index 000000000000..728d76eda1a4 --- /dev/null +++ b/src/unsafe-deep-dive/pinning/phantompinned.md @@ -0,0 +1,36 @@ +--- +minutes: 5 +--- + +# PhantomPinned + +## Definition + +```rust,ignore +pub struct PhantomPinned; + +impl !Unpin for PhantomPinned {} +``` + +## Usage + +```rust,editable +pub struct DynamicBuffer { + data: Vec, + cursor: NonNull, + _pin: std::marker::PhantomPinned, +} +``` + +
+ +`PhantomPinned` is a marker type. + +If a type contains a `PhantomPinned`, it will not implement `Unpin` by default. + +This has the effect of enforcing pinning when `DynamicBuffer` is wrapped by +`Pin`. + +
+ + diff --git a/src/unsafe-deep-dive/pinning/pin-and-drop.md b/src/unsafe-deep-dive/pinning/pin-and-drop.md new file mode 100644 index 000000000000..51de7e6da9e8 --- /dev/null +++ b/src/unsafe-deep-dive/pinning/pin-and-drop.md @@ -0,0 +1,95 @@ +--- +minutes: 10 +--- + +# Pin<Ptr> and Drop + +A key challenge with pinned, `!Unpin` types is implementing the `Drop` trait. +The `drop` method takes `&mut self`, which allows moving the value. However, +pinned values must not be moved. + +## An Incorrect `Drop` Implementation + +It's easy to accidentally move a value inside `drop`. Operations like assignment, +`ptr::read`, and `mem::replace` can silently break the pinning guarantee. + +```rust,editable +struct SelfRef { + data: String, + ptr: *const String, +} + +impl Drop for SelfRef { + fn drop(&mut self) { + // BAD: `ptr::read` moves `self.data` out of `self`. + // When `_dupe` is dropped at the end of the function, it's a double free! + let _dupe = unsafe { std::ptr::read(&self.data) }; + } +} +``` + +
+`!Unpin` types can make it difficult to safely implement `Drop`. Implementations +must not move pinned values. + +Pinned types make guarantees about memory stability. Operations like `ptr::read` +and `mem::replace` can silently break these guarantees by moving or duplicating +data, invalidating internal pointers without the type system's knowledge. + +In this `drop()` method, `_dupe` is a bitwise copy of `self.data`. At the end of +the method, it will be dropped along with `self`. This double drop is undefined +behavior. +
+ +## A Correct `Drop` Implementation + +To implement `Drop` correctly for a `!Unpin` type, you must ensure that the +value is not moved. A common pattern is to create a helper function that operates +on `Pin<&mut T>`. + +```rust,editable +use std::{marker::PhantomPinned, pin::Pin}; + +struct SelfRef { + data: String, + ptr: *const String, + _pin: PhantomPinned, +} + +impl SelfRef { + fn new(data: impl Into) -> Pin> { + let mut this = Box::pin(SelfRef { + data: data.into(), + ptr: std::ptr::null(), + _pin: PhantomPinned, + }); + let ptr: *const String = &this.data; + // SAFETY: `this` is pinned before we create the self-reference. + unsafe { + Pin::as_mut(&mut this).get_unchecked_mut().ptr = ptr; + } + this + } + + // This function can only be called on a pinned `SelfRef`. + unsafe fn drop_pinned(self: Pin<&mut SelfRef>) { + // `self` is pinned, so we must not move out of it. + println!("dropping {}", self.data); + } +} + +impl Drop for SelfRef { + fn drop(&mut self) { + // We can safely call `drop_pinned` because `drop` is the last time + // the value is used. We use `new_unchecked` because we know `self` + // will not be moved again. + unsafe { + SelfRef::drop_pinned(Pin::new_unchecked(self)); + } + } +} + +fn main() { + let _pinned = SelfRef::new("Hello, "); +} // `Drop` runs without moving the pinned value +``` diff --git a/src/unsafe-deep-dive/pinning/self-referential-buffer.md b/src/unsafe-deep-dive/pinning/self-referential-buffer.md new file mode 100644 index 000000000000..29a507502304 --- /dev/null +++ b/src/unsafe-deep-dive/pinning/self-referential-buffer.md @@ -0,0 +1,26 @@ +--- +minutes: 5 +--- + +# Self-Referential Buffer Example + +A "self-referential buffer" is a type that has a reference to one of its own +fields: + +```rust,ignore +pub struct SelfReferentialBuffer { + data: [u8; 1024], + cursor: *mut u8, +} +``` + +This kind of structure is not typical in Rust, because there's no way to update +the cursor's address when instances of `SelfReferentialBuffer` move. + +However, this kind of setup is more natural in other languages that provide +garbage collection, and also C++ that allows users to define their own behavior +during moves and copies. + +## Outline + +{{%segment outline}} diff --git a/src/unsafe-deep-dive/pinning/self-referential-buffer/cpp.md b/src/unsafe-deep-dive/pinning/self-referential-buffer/cpp.md new file mode 100644 index 000000000000..5c98661d9d44 --- /dev/null +++ b/src/unsafe-deep-dive/pinning/self-referential-buffer/cpp.md @@ -0,0 +1,36 @@ +--- +minutes: 15 +--- + +# Modelled in C++ + +```cpp,editable,ignore +#include +#include + +class SelfReferentialBuffer { + std::byte data[1024]; + std::byte* cursor = data; + +public: + SelfReferentialBuffer(SelfReferentialBuffer&& other) + : cursor{data + (other.cursor - other.data)} + { + std::memcpy(data, other.data, 1024); + } +}; +``` + +Investigate on [Compiler Explorer](https://godbolt.org/z/ascME6aje) + +
+ +The `SelfReferentialBuffer` contains two members, `data` is a kilobyte of memory +and `cursor` is a pointer into the former. + +Its move constructor ensures that cursor is updated to the new memory address. + +This type can't be expressed easily in Rust. + + +
diff --git a/src/unsafe-deep-dive/pinning/self-referential-buffer/rust-offset.md b/src/unsafe-deep-dive/pinning/self-referential-buffer/rust-offset.md new file mode 100644 index 000000000000..c9c2347991d8 --- /dev/null +++ b/src/unsafe-deep-dive/pinning/self-referential-buffer/rust-offset.md @@ -0,0 +1,39 @@ +--- +minutes: 5 +--- + +# With Offset + +```rust,editable +#[derive(Debug)] +pub struct SelfReferentialBuffer { + data: [u8; 1024], + position: usize, +} + +impl SelfReferentialBuffer { + pub fn new() -> Self { + SelfReferentialBuffer { data: [0; 1024], position: 0 } + } + + pub fn read(&self, n_bytes: usize) -> &[u8] { + let available = self.data.len().saturating_sub(self.position); + let len = n_bytes.min(available); + &self.data[self.position..self.position + len] + } + + pub fn write(&mut self, bytes: &[u8]) { + let available = self.data.len().saturating_sub(self.position); + let len = bytes.len().min(available); + self.data[self.position..self.position + len].copy_from_slice(&bytes[..len]); + self.position += len; + } +} +``` + +
+ +In Rust, it's more idiomatic to use an offset variable and to create references +on-demand. + +
diff --git a/src/unsafe-deep-dive/pinning/self-referential-buffer/rust-pin.md b/src/unsafe-deep-dive/pinning/self-referential-buffer/rust-pin.md new file mode 100644 index 000000000000..94e49039efbc --- /dev/null +++ b/src/unsafe-deep-dive/pinning/self-referential-buffer/rust-pin.md @@ -0,0 +1,85 @@ +--- +minutes: 10 +--- + +# With Pin<Ptr> + +Pinning allows Rust programmers to create a type which is much more similar to +the C++ class. + +```rust,editable +use std::marker::PhantomPinned; +use std::pin::Pin; + +/// A self-referential buffer that cannot be moved. +#[derive(Debug)] +pub struct SelfReferentialBuffer { + data: [u8; 1024], + cursor: *mut u8, + _pin: PhantomPinned, +} + +impl SelfReferentialBuffer { + pub fn new() -> Pin> { + let buffer = SelfReferentialBuffer { + data: [0; 1024], + cursor: std::ptr::null_mut(), + _pin: PhantomPinned, + }; + let mut pinned = Box::pin(buffer); + + unsafe { + let mut_ref = Pin::get_unchecked_mut(pinned.as_mut()); + mut_ref.cursor = mut_ref.data.as_mut_ptr(); + } + + pinned + } + + pub fn read(&self, n_bytes: usize) -> &[u8] { + unsafe { + let start = self.data.as_ptr(); + let end = start.add(self.data.len()); + let cursor = self.cursor as *const u8; + + assert!((start..=end).contains(&cursor), "cursor is out of bounds"); + + let offset = cursor.offset_from(start) as usize; + let available = self.data.len().saturating_sub(offset); + let len = n_bytes.min(available); + + &self.data[offset..offset + len] + } + } + + pub fn write(mut self: Pin<&mut Self>, bytes: &[u8]) { + let this = unsafe { self.as_mut().get_unchecked_mut() }; + unsafe { + let start = this.data.as_mut_ptr(); + let end = start.add(1024); + + assert!((start..=end).contains(&this.cursor), "cursor is out of bounds"); + let available = end.offset_from(this.cursor) as usize; + let len = bytes.len().min(available); + + std::ptr::copy_nonoverlapping(bytes.as_ptr(), this.cursor, len); + this.cursor = this.cursor.add(len); + } + } +} +``` + +
+ +Note that the function signatures have now changed. For example, `::new()` +returns `Pin>` rather than `Self`. This incurs a heap allocation +because `Pin` must work with a pointer type like `Box`. + +In `::new()`, we use `Pin::get_unchecked_mut()` to get a mutable reference to the +buffer *after* it has been pinned. This is `unsafe` because we are breaking the +pinning guarantee for a moment to initialize the `cursor`. We must make sure +not to move the `SelfReferentialBuffer` after this point. The safety contract of +`Pin` is that once a value is pinned, its memory location is fixed until it is +dropped. + +
diff --git a/src/unsafe-deep-dive/pinning/self-referential-buffer/rust-raw-pointers.md b/src/unsafe-deep-dive/pinning/self-referential-buffer/rust-raw-pointers.md new file mode 100644 index 000000000000..5438b6964f57 --- /dev/null +++ b/src/unsafe-deep-dive/pinning/self-referential-buffer/rust-raw-pointers.md @@ -0,0 +1,75 @@ +--- +minutes: 5 +--- + +# With a raw pointer + +```rust,editable +#[derive(Debug)] +pub struct SelfReferentialBuffer { + data: [u8; 1024], + cursor: *mut u8, +} + +impl SelfReferentialBuffer { + pub fn new() -> Self { + let mut buffer = + SelfReferentialBuffer { data: [0; 1024], cursor: std::ptr::null_mut() }; + + buffer.update_cursor(); + buffer + } + + // Danger: must be called after every move + pub fn update_cursor(&mut self) { + self.cursor = self.data.as_mut_ptr(); + } + + pub fn read(&self, n_bytes: usize) -> &[u8] { + unsafe { + let start = self.data.as_ptr(); + let end = start.add(1024); + let cursor = self.cursor as *const u8; + + assert!((start..=end).contains(&cursor), "cursor is out of bounds"); + + let available = end.offset_from(cursor) as usize; + let len = n_bytes.min(available); + std::slice::from_raw_parts(cursor, len) + } + } + + pub fn write(&mut self, bytes: &[u8]) { + unsafe { + let start = self.data.as_mut_ptr(); + let end = start.add(1024); + + assert!((start..=end).contains(&self.cursor), "cursor is out of bounds"); + let available = end.offset_from(self.cursor) as usize; + let len = bytes.len().min(available); + + std::ptr::copy_nonoverlapping(bytes.as_ptr(), self.cursor, len); + self.cursor = self.cursor.add(len); + } + } +} +``` + +
+ +Avoid spending too much time here. + +Talking points: + +- Emphasize that `unsafe` appears frequently. This is a hint that another design + may be more appropriate. +- `unsafe` blocks lack of safety comments. Therefore, this code is unsound. +- `unsafe` blocks are too broad. Good practice uses smaller `unsafe` blocks with + specific behavior, specific preconditions and specific safety comments. + +Questions: + +Q: Should the `read()` and `write()` methods be marked as unsafe?\ +A: Yes, because `self.cursor` will be a null pointer unless written to. + +
diff --git a/src/unsafe-deep-dive/pinning/self-referential-buffer/rust.md b/src/unsafe-deep-dive/pinning/self-referential-buffer/rust.md new file mode 100644 index 000000000000..18004032cebf --- /dev/null +++ b/src/unsafe-deep-dive/pinning/self-referential-buffer/rust.md @@ -0,0 +1,48 @@ +--- +minutes: 10 +--- + +# Modeled in Rust + +```rust,ignore +/// Raw pointers +pub struct SelfReferentialBuffer { + data: [u8; 1024], + cursor: *mut u8, +} + +/// Integer offsets +pub struct SelfReferentialBuffer { + data: [u8; 1024], + cursor: usize, +} + +/// Pinning +pub struct SelfReferentialBuffer { + data: [u8; 1024], + cursor: *mut u8, + _pin: std::marker::PhantomPinned, +} +``` + +## Original C++ class definition for reference + +```cpp,ignore +class SelfReferentialBuffer { + char data[1024]; + char* cursor; +} +``` + +
+ +The next few slides show three approaches to creating a Rust type with the same +semantics as the original C++. + +- Using raw pointers: matches C++ very closely, but using the resulting type is + extremely hazardous +- Storing integer offsets: more natural in Rust, but references need to be + created manually +- Pinning: allows raw pointers with fewer `unsafe` blocks + +
diff --git a/src/unsafe-deep-dive/pinning/unpin-trait.md b/src/unsafe-deep-dive/pinning/unpin-trait.md new file mode 100644 index 000000000000..18fd1896df16 --- /dev/null +++ b/src/unsafe-deep-dive/pinning/unpin-trait.md @@ -0,0 +1,30 @@ +# Unpin trait + +- `Unpin` type allows types to move freely, even when they're wrapped by a `Pin` +- Most types implement `Unpin`, because it is an "`auto trait`" +- `auto trait` behavior can be changed: + - `!Unpin` types must never move + - Types containing a `PhantomPinned` field do not implement `Unpin` by default + +
+ +Explain that when a trait implements `Unpin`, the pinning behavior of `Pin` +does not get invoked. The value is free to move. + +Explain that almost all types implement `Unpin`; automatically implemented by +the compiler. + +Types implementing `Unpin` are saying: 'I promise I have no self-references, so +moving me is always safe.' + +Ask: What types might be `!Unpin`? + +- Compiler-generated futures +- Types containing a `PhantomPinned` field +- Some types wrapping C++ objects + +`!Unpin` types cannot be moved once pinned + +
+ +[`PhantomPinned`]: https://doc.rust-lang.org/std/marker/struct.PhantomPinned.html diff --git a/src/unsafe-deep-dive/pinning/what-a-move-is.md b/src/unsafe-deep-dive/pinning/what-a-move-is.md new file mode 100644 index 000000000000..329515563d87 --- /dev/null +++ b/src/unsafe-deep-dive/pinning/what-a-move-is.md @@ -0,0 +1,66 @@ +# What a move is in Rust + +Always a bitwise copy, even for types that do not implement `Copy`: + +```rust +#[derive(Debug, Default)] +pub struct DynamicBuffer { + data: Vec, + position: usize, +}; + +pub fn move_and_inspect(x: DynamicBuffer) { println!("{x:?}"); } + +pub fn main() { + let a = DynamicBuffer::default(); + let mut b = a; + b.data.push(b'R'); + b.data.push(b'U'); + b.data.push(b'S'); + b.data.push(b'T'); + move_and_inspect(b); +} +``` + +Generated [LLVM IR] for calling `move_and_expect()`: + +```llvm +call void @llvm.memcpy.p0.p0.i64(ptr align 8 %_12, ptr align 8 %b, i64 32, i1 false) +invoke void @move_and_inspect(ptr align 8 %_12) +``` + +- `memcpy` from variable `%b` to `%_12` +- Call to `move_and_inspect` with `%_12` (the copy) + +
+ +Note that `DynamicBuffer` does not implement `Copy`. + +Implication: a value's memory address is not stable. + +To show movement as a bitwise copy, either [open the code in the playground][LLVM IR] +and look at the or [the Compiler Explorer]. + +Optional for those who prefer assembly output: + +The Compiler Explorer is useful for discussing the generated assembly and focus +the cursor assembly output in the `main` function on lines 128-136 (should be +highlighted in pink). + +Relevant code generated output `move_and_inspect`: + +```assembly +mov rax, qword ptr [rsp + 16] +mov qword ptr [rsp + 48], rax +mov rax, qword ptr [rsp + 24] +mov qword ptr [rsp + 56], rax +movups xmm0, xmmword ptr [rsp] +movaps xmmword ptr [rsp + 32], xmm0 +lea rdi, [rsp + 32] +call qword ptr [rip + move_and_inspect@GOTPCREL] +``` + +
+ +[LLVM IR]: https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=6f587283e8e0ec02f1ea8e871fc9ac72 +[The Compiler Explorer]: https://rust.godbolt.org/z/6o6nP7do4 diff --git a/src/unsafe-deep-dive/pinning/what-pinning-is.md b/src/unsafe-deep-dive/pinning/what-pinning-is.md new file mode 100644 index 000000000000..3dfe7df157d9 --- /dev/null +++ b/src/unsafe-deep-dive/pinning/what-pinning-is.md @@ -0,0 +1,43 @@ +--- +minutes: 5 +--- + +# What pinning is + +- A pinned type cannot change its memory address (move) +- The pointed-to value cannot be moved by safe code + +`Pin` makes use of the ownership system to control how the pinned value is +accessed. Rather than changing the language, Rust's ownership system is used to +enforce pinning. `Pin` owns its contents and nothing in its safe API triggers a +move. + +This is explained in + +
+ +Conceptually, pinning prevents the default movement behavior. + +This appears to be a change in the language itself. + +However, the `Pin` wrapper doesn't actually change anything fundamental about +the language. + +`Pin` doesn't expose safe APIs that would allow a move. Thus, it can prevent +bitwise copy. + +Unsafe APIs allow library authors to wrap types that do not implement `Unpin`, +but they must uphold the same guarantees. + +The documentation of `Pin` uses the term "pointer types". + +The term "pointer type" is much more broad than the pointer primitive type in +the language. + +A "pointer type" wraps every type that implements `Deref` with a target that +implements `Unpin`. + +Rust style note: This trait bound is enforced through trait bounds on the +`::new()` constructor, rather than on the type itself. + +
diff --git a/src/unsafe-deep-dive/pinning/why-difficult.md b/src/unsafe-deep-dive/pinning/why-difficult.md new file mode 100644 index 000000000000..974c75cd5fb0 --- /dev/null +++ b/src/unsafe-deep-dive/pinning/why-difficult.md @@ -0,0 +1,24 @@ +# Why Pin is difficult to use + +- `Pin` is "just" a type defined in the standard library +- This satisfied the needs of its original audience, the creators of async + runtimes, without needing to extending the core language +- That audience could accept some of its ergonomic downsides, as users of + `async` would rarely interact with `Pin` directly + +
+ +"You might wonder why Pin is so awkward to use. The answer is largely +historical." + +"`Pin` offered a simpler implementation for the Rust project than +alternatives". + +"Pin was designed primarily for the ~100 people in the world who write async +runtimes. The Rust team chose a simpler (for the compiler) but less ergonomic +design." + +"More user-friendly proposals existed but were rejected as too complex for the +primary audience, who could handle the complexity." + +
diff --git a/src/unsafe-deep-dive/rules-of-the-game.md b/src/unsafe-deep-dive/rules-of-the-game.md new file mode 100644 index 000000000000..18ae4d183786 --- /dev/null +++ b/src/unsafe-deep-dive/rules-of-the-game.md @@ -0,0 +1,21 @@ +# Rules of the game + +
+ +“We've seen many examples of code that has problems in the class, but we lack +consistent terminology + +“The goal of the next section is to introduce some terms that describe many of +the concepts that we have been thinking about. + +- undefined behavior +- sound +- unsound + +“Given that many safety preconditions are semantic rather than syntactic, it's +important to use a shared vocabulary. That way we can agree on semantics. + +“The overarching goal is to develop a mental framework of what soundness is and +ensure that Rust code that contains unsafe remains sound.” + +
diff --git a/src/unsafe-deep-dive/rules-of-the-game/3-shapes-of-sound-rust.md b/src/unsafe-deep-dive/rules-of-the-game/3-shapes-of-sound-rust.md new file mode 100644 index 000000000000..4e8166414a0c --- /dev/null +++ b/src/unsafe-deep-dive/rules-of-the-game/3-shapes-of-sound-rust.md @@ -0,0 +1,25 @@ +--- +minutes: 5 +--- + +# 3 shapes of sound Rust + +- Functions written only in Safe Rust +- Functions that contain `unsafe` blocks which are impossible to misuse +- Unsafe functions that have their safety preconditions documented + +
+ +- We want to write sound code. +- Sound code can only have the following shapes: + - safe functions that contain no unsafe blocks + - safe functions that completely encapsulate unsafe blocks, meaning that the + caller does not need to know about them + - unsafe functions that contain unsafe blocks but don't encapsulate them, and + pass the proof burden to their caller +- Burden of proof + - safe functions with only Safe Rust -> compiler + - safe functions with unsafe blocks -> function author + - unsafe functions -> function caller + +
diff --git a/src/unsafe-deep-dive/rules-of-the-game/copying-memory.md b/src/unsafe-deep-dive/rules-of-the-game/copying-memory.md new file mode 100644 index 000000000000..d5cd7774d146 --- /dev/null +++ b/src/unsafe-deep-dive/rules-of-the-game/copying-memory.md @@ -0,0 +1,21 @@ +--- +minutes: 3 +--- + +# Copying memory - Introduction + +```rust,ignore +/// Reads bytes from `source` and writes them to `dest` +pub fn copy(dest: &mut [u8], source: &[u8]) { ... } +``` + +
+ +“Here is our initial function prototype.” + +“`copy` accepts two slices as arguments. `dest` (destination) is mutable, +whereas `source` is not.” + +“Let's see the shapes of sound Rust code.” + +
diff --git a/src/unsafe-deep-dive/rules-of-the-game/copying-memory/crying-wolf.md b/src/unsafe-deep-dive/rules-of-the-game/copying-memory/crying-wolf.md new file mode 100644 index 000000000000..eed69bff8ecb --- /dev/null +++ b/src/unsafe-deep-dive/rules-of-the-game/copying-memory/crying-wolf.md @@ -0,0 +1,31 @@ +--- +minutes: 5 +--- + +# Crying Wolf + +```rust,editable +pub unsafe fn copy(dest: &mut [u8], source: &[u8]) { + for (dest, src) in dest.iter_mut().zip(source) { + *dest = *src; + } +} + +fn main() { + let a = &[114, 117, 115, 116]; + let b = &mut [82, 85, 83, 84]; + + println!("{}", String::from_utf8_lossy(b)); + unsafe { copy(b, a) }; + println!("{}", String::from_utf8_lossy(b)); +} +``` + +
+ +“It is also possible to create so-called crying wolf functions. + +“These are functions which are tagged as unsafe, but which have no safety +preconditions that programmers need to check. + +
diff --git a/src/unsafe-deep-dive/rules-of-the-game/copying-memory/documented-safety-preconditions.md b/src/unsafe-deep-dive/rules-of-the-game/copying-memory/documented-safety-preconditions.md new file mode 100644 index 000000000000..e48a275fc595 --- /dev/null +++ b/src/unsafe-deep-dive/rules-of-the-game/copying-memory/documented-safety-preconditions.md @@ -0,0 +1,60 @@ +--- +minutes: 5 +--- + +# Documented safety preconditions + +```rust,editable +/// ... +/// +/// # Safety +/// +/// This function can easily trigger undefined behavior. Ensure that: +/// +/// - `source` pointer is non-null and non-dangling +/// - `source` data ends with a null byte within its memory allocation +/// - `source` data is not freed (its lifetime invariants are preserved) +/// - `source` data contains fewer than `isize::MAX` bytes +pub fn unsafe copy(dest: &mut [u8], source: *const u8) { + let source = { + let mut len = 0; + + let mut end = source; + // SAFETY: Caller has provided a non-null pointer + while unsafe { *end != 0 } { + len += 1; + // SAFETY: Caller has provided a data with length < isize:MAX + end = unsafe { end.add(1) }; + } + + // SAFETY: Caller maintains lifetime and aliasing requirements + unsafe { std::slice::from_raw_parts(source, len + 1) } + }; + + for (dest, src) in dest.iter_mut().zip(source) { + *dest = *src; + } +} + +fn main() { + let a = [114, 117, 115, 116].as_ptr(); + let b = &mut [82, 85, 83, 84, 0]; + + println!("{}", String::from_utf8_lossy(b)); + copy(b, a); + println!("{}", String::from_utf8_lossy(b)); +} +``` + +
+ +Changes to previous iterations: + +- `copy` marked as unsafe +- Safety preconditions are documented +- inline safety comments + +An unsafe function is sound when both its safety preconditions and its internal +unsafe blocks are documented. + +
diff --git a/src/unsafe-deep-dive/rules-of-the-game/copying-memory/encapsulated-unsafe.md b/src/unsafe-deep-dive/rules-of-the-game/copying-memory/encapsulated-unsafe.md new file mode 100644 index 000000000000..4948dc534489 --- /dev/null +++ b/src/unsafe-deep-dive/rules-of-the-game/copying-memory/encapsulated-unsafe.md @@ -0,0 +1,53 @@ +--- +minutes: 5 +--- + +# Encapsulated Unsafe Rust + +```rust,editable +pub fn copy(dest: &mut [u8], source: &[u8]) { + let len = dest.len().min(source.len()); + let mut i = 0; + while i < len { + // SAFETY: `i` must be in-bounds as it was produced by source.len() + let new = unsafe { source.get_unchecked(i) }; + + // SAFETY: `i` must be in-bounds as it was produced by dest.len() + let old = unsafe { dest.get_unchecked_mut(i) }; + + *old = *new; + i += 1; + } + + for (dest, src) in dest.iter_mut().zip(source) { + *dest = *src; + } +} + +fn main() { + let a = &[114, 117, 115, 116]; + let b = &mut [82, 85, 83, 84]; + + println!("{}", String::from_utf8_lossy(b)); + copy(b, a); + println!("{}", String::from_utf8_lossy(b)); +} +``` + +
+ +“Here we have a safe function that encapsulates unsafe blocks that are used +internally. + +“This implementation avoids iterators. Instead, the implementor is accessing +memory manually.” + +“Is this correct?” “Are there any problems?” + +“Who has responsibility for ensuring that correctness? The author of the +function. + +“A Safe Rust function that contains unsafe blocks remains sound if it’s +impossible for an input to cause memory safety issues. + +
diff --git a/src/unsafe-deep-dive/rules-of-the-game/copying-memory/exposed-unsafe.md b/src/unsafe-deep-dive/rules-of-the-game/copying-memory/exposed-unsafe.md new file mode 100644 index 000000000000..762283fdf302 --- /dev/null +++ b/src/unsafe-deep-dive/rules-of-the-game/copying-memory/exposed-unsafe.md @@ -0,0 +1,82 @@ +--- +minutes: 5 +--- + +# Exposed Unsafe Rust + +```rust,editable +pub fn copy(dest: &mut [u8], source: *const u8) { + let source = { + let mut len = 0; + + let mut end = source; + while unsafe { *end != 0 } { + len += 1; + end = unsafe { end.add(1) }; + } + + unsafe { std::slice::from_raw_parts(source, len + 1) } + }; + + for (dest, src) in dest.iter_mut().zip(source) { + *dest = *src; + } +} + +fn main() { + let a = [114, 117, 115, 116].as_ptr(); + let b = &mut [82, 85, 83, 84, 0]; + + println!("{}", String::from_utf8_lossy(b)); + copy(b, a); + println!("{}", String::from_utf8_lossy(b)); +} +``` + +
+ +onality of copying bytes from one place to the next remains the same. + +“However, we need to manually create a slice. To do that, we first need to find +the end of the data. + +“As we’re working with text, we’ll use the C convention of a null-terminated +string. + +Compile the code. See that the output remains the same. + +“An unsound function can still work correctly for some inputs. Just because your +tests pass, does not mean that you have a sound function.” + +“Can anyone spot any issues?” + +- Readability: difficult to quickly scan code +- `source` pointer might be null +- `source` pointer might be dangling, i.e. point to freed or uninitialized + memory +- `source` might not be null-terminated + +“Assume that we cannot change the function signature, what improvements could we +make to the code to address these issues?” + +- Null pointer: Add null check with early return + (`if source.is_null() { return; }`) +- Readability: Use a well-tested library rather than implementing “find first + null byte” ourselves + +“Some safety requirements are impossible to defensively check for, however, +i.e.:” + +- dangling pointer +- no null termination byte + +“How can we make this function sound?” + +- Either + - Change the type of the `source` input argument to something that has a known + length, i.e. use a slice like the previous example. +- Or + - Mark the function as unsafe + - Document the safety preconditions + +
diff --git a/src/unsafe-deep-dive/rules-of-the-game/copying-memory/safe.md b/src/unsafe-deep-dive/rules-of-the-game/copying-memory/safe.md new file mode 100644 index 000000000000..73ef4fe7bbe2 --- /dev/null +++ b/src/unsafe-deep-dive/rules-of-the-game/copying-memory/safe.md @@ -0,0 +1,54 @@ +--- +minutes: 5 +--- + +# Safe Rust + +```rust,editable +pub fn copy(dest: &mut [u8], source: &[u8]) { + for (dest, src) in dest.iter_mut().zip(source) { + *dest = *src; + } +} + +fn main() { + let a = &[114, 117, 115, 116]; + let b = &mut [82, 85, 83, 84]; + + println!("{}", String::from_utf8_lossy(b)); + copy(b, a); + println!("{}", String::from_utf8_lossy(b)); +} +``` + +
+ +“The implementation only uses unsafe Rust + +What can we learn from this? + +“It is impossible for `copy` to trigger memory safety issues when implemented in +Safe Rust. This is true for all possible input arguments.” + +“For example, by using Rust’s iterators, we can ensure that we’ll never trigger +errors relating to handling pointers directly, such as needing null pointer or +bounds checks.” + +Ask: “Can you think of any others?” + +- No aliasing issues +- Dangling pointers are impossible +- Alignment will be correct +- Cannot accidentally read from uninitialized memory + +“We can say that the `copy` function is _sound_ because Rust ensures that all of +the safety preconditions are satisfied.” + +“From the point of view of the programmer, as this function is implemented in +safe Rust, we can think of it as having no safety preconditions.” + +“This does not mean that `copy` will always do what the caller might want. If +there is insufficient space available in the `dest` slice, then data will not be +copied across.” + +
diff --git a/src/unsafe-deep-dive/rules-of-the-game/rust-is-sound.md b/src/unsafe-deep-dive/rules-of-the-game/rust-is-sound.md new file mode 100644 index 000000000000..956492d8a707 --- /dev/null +++ b/src/unsafe-deep-dive/rules-of-the-game/rust-is-sound.md @@ -0,0 +1,31 @@ +--- +minutes: 5 +--- + +# Rust is sound + +- Soundness is fundamental to Rust +- Soundness ≈ impossible to cause memory safety problems +- Sound functions have common “shapes” + +
+ +“A fundamental principle of Rust code is that it is sound. + +“We’ll create a formal definition of the term soundness shortly. In the +meantime, think of sound code as code that cannot trigger memory safety +problems. + +“Sound code is made up of _sound functions_ and _sound operations_. + +“A sound function as a function where none of its possible inputs could provoke +soundness problems. + +Sound functions have common shapes. + +Those shapes are what we’ll look at now. + +“We’ll start with one that’s implemented in Safe Rust, and then see what could +happen when we introduce `unsafe` to different parts. + +
diff --git a/src/unsafe-deep-dive/rules-of-the-game/soundness-proof.md b/src/unsafe-deep-dive/rules-of-the-game/soundness-proof.md new file mode 100644 index 000000000000..321eba722740 --- /dev/null +++ b/src/unsafe-deep-dive/rules-of-the-game/soundness-proof.md @@ -0,0 +1 @@ +# Soundness Proof diff --git a/src/unsafe-deep-dive/rules-of-the-game/soundness-proof/corollary.md b/src/unsafe-deep-dive/rules-of-the-game/soundness-proof/corollary.md new file mode 100644 index 000000000000..b0ed860b74f3 --- /dev/null +++ b/src/unsafe-deep-dive/rules-of-the-game/soundness-proof/corollary.md @@ -0,0 +1,33 @@ +--- +minutes: 2 +--- + +# Soundness Proof (Part 2) + +A sound function is one that can't trigger UB if its safety preconditions are +satisfied. + +Corollary: All functions implemented in pure safe Rust are sound. + +Proof: + +- Safe Rust code has no safety preconditions. + +- Therefore, callers of functions implemented in pure safe Rust always trivially + satisfy the empty set of preconditions. + +- Safe Rust code can't trigger UB. + +QED. + +
+ +- Read the corollary. + +- Explain the proof. + +- Translate into informal terms: all safe Rust code is nice. It does not have + safety preconditions that the programmer has to think of, always plays by the + rules, and never triggers UB. + +
diff --git a/src/unsafe-deep-dive/rules-of-the-game/soundness-proof/soundness.md b/src/unsafe-deep-dive/rules-of-the-game/soundness-proof/soundness.md new file mode 100644 index 000000000000..b449213b5441 --- /dev/null +++ b/src/unsafe-deep-dive/rules-of-the-game/soundness-proof/soundness.md @@ -0,0 +1,21 @@ +--- +minutes: 2 +--- + +# Soundness + +A sound function is one that can't trigger UB if its safety preconditions are +satisfied. + +
+ +- Read the definition of sound functions. + +- Remind the students that the programmer who implements the caller is + responsible to satisfy the safety precondition, the compiler is not helping. + +- Translate into informal terms. Soundness means that the function is nice and + plays by the rules. It documents its safety preconditions, and when the caller + satisfies them, the function behaves well (no UB). + +
diff --git a/src/unsafe-deep-dive/rules-of-the-game/soundness-proof/unsoundness.md b/src/unsafe-deep-dive/rules-of-the-game/soundness-proof/unsoundness.md new file mode 100644 index 000000000000..0cb739104a18 --- /dev/null +++ b/src/unsafe-deep-dive/rules-of-the-game/soundness-proof/unsoundness.md @@ -0,0 +1,27 @@ +--- +minutes: 2 +--- + +# Unsoundness + +A sound function is one that can't trigger UB if its safety preconditions are +satisfied. + +An unsound function can trigger UB even if you satisfy the documented safety +preconditions. + +Unsound code is _bad_. + +
+ +- Read the definition of unsound functions. + +- Translate into infomal terms: unsound code is not nice. No, that's an + understatement. Unsound code is BAD. Even if you play by the documented rules, + unsound code can still trigger UB! + +- We don't want any unsound code in our repositories. + +- Finding unsound code is the **primary** goal of the code review + +
diff --git a/src/unsafe-deep-dive/safety-preconditions.md b/src/unsafe-deep-dive/safety-preconditions.md new file mode 100644 index 000000000000..3a856812de01 --- /dev/null +++ b/src/unsafe-deep-dive/safety-preconditions.md @@ -0,0 +1,18 @@ +# Safety Preconditions + +Safety preconditions are conditions on an action that must be satisfied before +that action will be safe. + +
+ +“Safety preconditions are conditions on code that must be satisfied to maintain +Rust's safety guarantees + +“You're likely to see a strong affinity between safety preconditions and the +rules of Safe Rust.” + +Q: Can you list any? + +(Fuller list in the next slide) + +
diff --git a/src/unsafe-deep-dive/safety-preconditions/ascii.md b/src/unsafe-deep-dive/safety-preconditions/ascii.md new file mode 100644 index 000000000000..ea6a72997f3c --- /dev/null +++ b/src/unsafe-deep-dive/safety-preconditions/ascii.md @@ -0,0 +1,45 @@ +--- +minutes: 5 +--- + +# Example: ASCII Type + +```rust,editable +/// Text that is guaranteed to be encoded within 7-bit ASCII. +pub struct Ascii<'a>(&'a mut [u8]); + +impl Ascii<'_> { + fn new(bytes: &'a mut [u8]) -> Option { + bytes.iter().all(|&b| b.is_ascii()).then(Ascii(bytes)) + } + + /// Creates a new `Ascii` from a byte slice without checking for ASCII + /// validity. + /// + /// # Safety + /// + /// Providing non-ASCII bytes results in undefined behavior. + unsafe fn new_unchecked(bytes: &mut [u8]) -> Self { + Ascii(bytes) + } +} +``` + +
+ +"The `Ascii` type is a minimal wrapper around a byte slice. Internally, they +share the same representation. However, `Ascii` requires that the high bit must +is not be used." + +Optional: Extend the example to mention that it's possible to use +`debug_assert!` to test the preconditions during tests without impacting release +builds. + +```rust,ignore +unsafe fn new_unchecked(bytes: &mut [u8]) -> Self { + debug_assert!(bytes.iter().all(|&b| b.is_ascii())) + Ascii(bytes) +} +``` + +
diff --git a/src/unsafe-deep-dive/safety-preconditions/common-preconditions.md b/src/unsafe-deep-dive/safety-preconditions/common-preconditions.md new file mode 100644 index 000000000000..672bb0224db4 --- /dev/null +++ b/src/unsafe-deep-dive/safety-preconditions/common-preconditions.md @@ -0,0 +1,77 @@ +--- +minutes: 6 +--- + +# Common safety preconditions + +- Aliasing and Mutability +- Alignment +- Array access is in-bound +- Initialization +- Lifetimes +- Pointer provenance +- Validity +- Memory + +
+ +Avoid spending too much time explaining every precondition: we will be working +through the details during the course. The intent is to show that there are +several. + +"An incomplete list, but these are a few of the major safety preconditions to +get us thinking." + +- Validity. Values must be valid values of the type that they represent. Rust's + references may not be null. Creating one with `unsafe` causes the. +- Alignment. References to values must be well-aligned, which means th +- Aliasing. All Rust code must uphold Rust's borrowing rules. If you are + manually creating mutable references (`&mut T`) from pointers, then you may + only create one +- Initialization. All instances of Rust types must be fully initialized. To + create a value from raw memory, we need to make sure that we've written +- Pointer provenance. The origin of a pointer is important. Casting a `usize` to + a raw pointer is no longer allowed. +- Lifetimes. References must not outlive their referent. + +Some conditions are even more subtle than they first seem. + +Consider "in-bounds array access". Reading from the memory location, i.e. +dereferencing, is not required to break the program. Creating an out-of-bounds +reference already break's the compiler's assumptions, leading to erratic +behavior. + +Rust tells LLVM to use its `getelementptr inbounds` assumption. That assumption +will cause later optimization passes within the compiler to misbehave (because +out-of-bounds memory access cannot occur). + +Optional: open [the playground][1], which shows the code below. Explain that +this is essentially a C function written in Rust syntax that gets items from an +array. Generate the LLVM IR with the **Show LLVM IR** button. Highlight +`getelementptr inbounds i32, ptr %array, i64 %offset`. + +```rust,ignore +#[unsafe(no_mangle)] +pub unsafe fn get(array: *const i32, offset: isize) -> i32 { + unsafe { *array.offset(offset) } +} +``` + +Expected output (the line to highlight starts with `%_3): + +```llvm +define noundef i32 @get(ptr noundef readonly captures(none) %array, i64 noundef %offset) unnamed_addr #0 { +start: + %_3 = getelementptr inbounds i32, ptr %array, i64 %offset + %_0 = load i32, ptr %_3, align 4, !noundef !3 + ret i32 %_0 +} +``` + +[1]: https://play.rust-lang.org/?version=stable&mode=release&edition=2024&gist=4116c4de01c863cac918f193448210b1 + +Bounds: You correctly noted that creating an out-of-bounds pointer (beyond the +"one-past-the-end" rule) is UB, even without dereferencing, due to LLVM's +inbounds assumptions. + +
diff --git a/src/unsafe-deep-dive/safety-preconditions/defining.md b/src/unsafe-deep-dive/safety-preconditions/defining.md new file mode 100644 index 000000000000..7beee82a700a --- /dev/null +++ b/src/unsafe-deep-dive/safety-preconditions/defining.md @@ -0,0 +1,8 @@ +--- +minutes: 2 +--- + +# Defining your own preconditions + +- User-defined types are entitled to have their own safety preconditions +- Include documentation so that they can later be determined and satisfied diff --git a/src/unsafe-deep-dive/safety-preconditions/determining.md b/src/unsafe-deep-dive/safety-preconditions/determining.md new file mode 100644 index 000000000000..ec4fae7a5a38 --- /dev/null +++ b/src/unsafe-deep-dive/safety-preconditions/determining.md @@ -0,0 +1,48 @@ +# Determining Preconditions + +Where do you find the safety preconditions? + +```rust,editable +fn main() { + let b: *mut i32 = std::ptr::null_mut(); + println!("{:?}", b.as_mut()); +} +``` + +
+ +Attempt to compile the program to trigger the compiler error ("error\[E0133\]: +call to unsafe function ..."). + +Ask: “Where would you look if you wanted to know the preconditions for a +function? Here we need to understand when it's safe to convert from a null +pointer to a mutable reference.” + +Locations to look: + +- A function's API documentation, especially its safety section +- The source code and its internal safety comments +- Module documentation +- Rust Reference + +Consult [the documentation] for the `as_mut` method. + +Highlight Safety section. + +> **Safety** +> +> When calling this method, you have to ensure that either the pointer is null +> or the pointer is convertible to a reference. + +Click the "convertible to a reference" hyperlink to the "Pointer to reference +conversion" + +Track down the rules for converting a pointer to a reference, aka is +"_deferencerable_". + +Consider the implications of this excerpt (Rust 1.90.0) "You must enforce Rust’s +aliasing rules. The exact aliasing rules are not decided yet, ..." + +
+ +[the documentation]: https://doc.rust-lang.org/std/primitive.pointer.html#method.as_mut diff --git a/src/unsafe-deep-dive/safety-preconditions/getter.md b/src/unsafe-deep-dive/safety-preconditions/getter.md new file mode 100644 index 000000000000..a879fe75fa7e --- /dev/null +++ b/src/unsafe-deep-dive/safety-preconditions/getter.md @@ -0,0 +1,46 @@ +# Getter example + +```rust,editable +/// Return the element at `index` from `arr` +unsafe fn get(arr: *const i32, index: usize) -> i32 { + unsafe { *arr.add(index) } +} +``` + +
+ +“Safety preconditions are conditions on code that must be satisfied to maintain +Rust's safety guarantees + +“You're likely to see a strong affinity between safety preconditions and the +rules of Safe Rust.” + +Ask: “What are the safety preconditions of `get`?” + +- The pointer `arr` is non-null, well-aligned and refers to an array of `i32` +- `index` is in-bounds + +Add safety comments: + +```rust,ignore +/// Return the element at `index` from `arr` +/// +/// # Safety +/// +/// - `arr` is non-null, correctly aligned and points to a valid `i32` +/// - `index` is in-bounds for the array +unsafe fn get(arr: *const i32, index: usize) -> i32 { + // SAFETY: Caller guarantees that index is inbounds + unsafe { *arr.add(index) } +} +``` + +Optional: Add runtime checks can be added in debug builds to provide some extra +robustness. + +```rust,ignore +debug_assert!(!arr.is_null()); +debug_assert_eq!(arr as usize % std::mem::align_of::(), 0); +``` + +
diff --git a/src/unsafe-deep-dive/safety-preconditions/references.md b/src/unsafe-deep-dive/safety-preconditions/references.md new file mode 100644 index 000000000000..da6eef3b8df7 --- /dev/null +++ b/src/unsafe-deep-dive/safety-preconditions/references.md @@ -0,0 +1,102 @@ +--- +minutes: 10 +--- + +# Example: references + +```rust,editable +fn main() { + let mut boxed = Box::new(123); + let a: *mut i32 = &mut *boxed as *mut i32; + let b: *mut i32 = std::ptr::null_mut(); + + println!("{:?}", *a); + println!("{:?}", b.as_mut()); +} +``` + +Confirm understanding of the syntax + +`Box` type is a reference to an integer on the heap that is owned by the +box. + +`*mut i32` type is a so-called raw pointer to an integer that the compiler does +not know the ownership of. Programmers need to ensure the rules are enforced +without assistance from the compiler. + +a reference, i.e. `&mut i32`, means borrowed/ + + - a raw pointer does not provide ownership info to Rust: + - a pointer can be semantically owning the data, or semantically borrowing, + but that information only exists in the programmer's mind + +- `&mut *boxed as *mut _` expression: + - `*boxed` is ... + - `&mut *boxed` is ... + - finally, `as *mut i32` casts the reference to a pointer. + +Confirm understanding of ownership + +- Step through code + - (Line 3) Creates raw pointer to the `123` by de-referencing the box, + creating a new reference and casting the new reference as a pointer + - (Line 4) Creates raw pointer with a NULL value + - (Line 7) Converts the raw pointer to an Option with `.as_mut()` + +- Highlight that pointers are nullable in Rust (unlike references). + +- Compile to reveal the error messages + +- Discuss + - (Line 6) `println!("{:?}", *a);` + - Prefix star dereferences a raw pointer + - It is an explicit operation. Whereas regular references have implicit + dereferencing most of the time thanks to the Deref trait. This is referred + to as "auto-deref". + - Dereferencing a raw pointer is an unsafe operation + - Requires an unsafe block + - (Line 7) `println!("{:?}", b.as_mut());` + - `as_mut()` is an unsafe function. + - Calling unsafe function requires an unsafe block + +- Demonstrate: Fix the code (add unsafe blocks) and compile again to show the + working program + +- Demonstrate: Replace `as *mut i32` with `as *mut _`, show that it compiles. + + - We can partially omit the target type in the cast. The Rust compiler knows + that the source of the cast is a `&mut i32`. This reference type can only be + converted to one pointer type, `*mut i32`. + +- Add safety comments + - We said that the unsafe code marks the responsibility shift from the + compiler to the programmer. + - How do we convey that we thought about our unusual responsibilities while + writing unsafe code? Safety comments. + - Safety comments explain why unsafe code is correct. + - Without a safety comment, unsafe code is not safe. + +- Discuss: Whether to use one large unsafe block or two smaller ones + - Possibility of using a single unsafe block rather than multiple + - Using more allows safety comments as specific as possible + +[ptr-as_mut]: https://doc.rust-lang.org/stable/std/primitive.pointer.html#method.as_mut + +_Suggested Solution_ + +```rust +fn main() { + let mut boxed = Box::new(123); + let a: *mut i32 = &mut *boxed as *mut i32; + let b: *mut i32 = std::ptr::null_mut(); + + // SAFETY: `a` is a non-null pointer to i32, it is initialized and still + // allocated. + println!("{:?}", unsafe { *a }); + + // SAFETY: `b` is a null pointer, which `as_mut()` converts to `None`. + println!("{:?}", unsafe { b.as_mut() }); +} +``` + +
diff --git a/src/unsafe-deep-dive/safety-preconditions/semantic-preconditions.md b/src/unsafe-deep-dive/safety-preconditions/semantic-preconditions.md new file mode 100644 index 000000000000..44c72bf0b239 --- /dev/null +++ b/src/unsafe-deep-dive/safety-preconditions/semantic-preconditions.md @@ -0,0 +1 @@ +# Semantic preconditions diff --git a/src/unsafe-deep-dive/safety-preconditions/u8-to-bool.md b/src/unsafe-deep-dive/safety-preconditions/u8-to-bool.md new file mode 100644 index 000000000000..a9f7287dce9d --- /dev/null +++ b/src/unsafe-deep-dive/safety-preconditions/u8-to-bool.md @@ -0,0 +1 @@ +# Example: u8 to bool diff --git a/src/unsafe-deep-dive/welcome.md b/src/unsafe-deep-dive/welcome.md index 2291281c1bbd..6d3c7853b2e7 100644 --- a/src/unsafe-deep-dive/welcome.md +++ b/src/unsafe-deep-dive/welcome.md @@ -1,11 +1,9 @@ --- -course: Unsafe -session: Day 1 Morning -target_minutes: 300 +course: Unsafe Deep Dive +session: Unsafe Deep Dive +target_minutes: 600 --- -# Welcome to Unsafe Rust - > IMPORTANT: THIS MODULE IS IN AN EARLY STAGE OF DEVELOPMENT > > Please do not consider this module of Comprehensive Rust to be complete. With @@ -17,30 +15,47 @@ target_minutes: 300 [GitHub issue tracker]: https://github.com/google/comprehensive-rust/issues -The `unsafe` keyword is easy to type, but hard to master. When used -appropriately, it forms a useful and indeed essential part of the Rust -programming language. +# Welcome to the Unsafe Rust Deep Dive + +This deep dive aims to enable you to work productively with Unsafe Rust. + +We’ll work on three areas: + +- establishing a mental model of Unsafe Rust +- practicing reading & writing Unsafe Rust +- practicing code review for Unsafe Rust + +
+ +The goal of this class is to teach you enough Unsafe Rust for you to be able to +review easy cases yourself, and distinguish difficult cases that need to be +reviewed my more experienced Unsafe Rust engineers. -By the end of this deep dive, you'll know how to work with `unsafe` code, review -others' changes that include the `unsafe` keyword, and produce your own. +- Establishing a mental model of Unsafe Rust + - what the `unsafe` keyword means + - a shared vocabulary for talking about safety + - a mental model of how memory works + - common patterns + - expectations for code that uses `unsafe` -What you'll learn: +- Practice working with unsafe + - reading and writing both code and documentation + - use unsafe APIs + - design and implement them -- What the terms undefined behavior, soundness, and safety mean -- Why the `unsafe` keyword exists in the Rust language -- How to write your own code using `unsafe` safely -- How to review `unsafe` code +- Review code + - the confidence to self-review easy cases + - the knowledge to detect difficult cases -## Links to other sections of the course +“We'll be using a spiral model of teaching. This means that we revisit the same +topic multiple times with increasing depth.” -The `unsafe` keyword has treatment in: +A round of introductions is useful, particularly if the class participants don't +know each other well. Ask everyone to introduce themselves, noting down any +particular goals for the class. -- _Rust Fundamentals_, the main module of Comprehensive Rust, includes a session - on [Unsafe Rust] in its last day. -- _Rust in Chromium_ discusses how to [interoperate with C++]. Consult that - material if you are looking into FFI. -- _Bare Metal Rust_ uses unsafe heavily to interact with the underlying host, - among other things. +- Who are you? +- What are you working on? +- What are your goals for this class? -[interoperate with C++]: ../chromium/interoperability-with-cpp.md -[Unsafe Rust]: ../unsafe-rust.html +