Skip to content

Conversation

@oconnor663
Copy link
Contributor

@oconnor663 oconnor663 commented Dec 4, 2025

This PR cribs a lot of @ibraheemdev's work on #20732.

This depends on a couple of upstream changes, and the first commit is a temporary/dummy commit that pins git hashes for these:

Those pins (at least the second one, which overwrites a published version number) aren't intended to land, and I'll need to fix up this PR once / assuming I can get the upstream changes shipped. Update: This is done.

@oconnor663 oconnor663 changed the title add SyntheticTypedDictType and implement normalized and is_equivalent_to [ty] add SyntheticTypedDictType and implement normalized and is_equivalent_to Dec 4, 2025
@oconnor663 oconnor663 added the ty Multi-file analysis & type inference label Dec 4, 2025
@astral-sh-bot
Copy link

astral-sh-bot bot commented Dec 4, 2025

Diagnostic diff on typing conformance tests

No changes detected when running ty on typing conformance tests ✅

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@astral-sh-bot
Copy link

astral-sh-bot bot commented Dec 4, 2025

mypy_primer results

Changes were detected when running on open source projects
hydra-zen (https://github.com/mit-ll-responsible-ai/hydra-zen)
+ src/hydra_zen/structured_configs/_implementations.py:2982:60: error[invalid-argument-type] Argument to function `make_dataclass` is incorrect: Expected `dict[str, Any] | None`, found `bool`
+ src/hydra_zen/structured_configs/_implementations.py:2982:60: error[invalid-argument-type] Argument to function `make_dataclass` is incorrect: Expected `bool`, found `str | None`
+ src/hydra_zen/structured_configs/_implementations.py:2982:60: error[invalid-argument-type] Argument to function `make_dataclass` is incorrect: Expected `bool`, found `dict[str, Any] | None`
+ src/hydra_zen/structured_configs/_implementations.py:3342:52: error[invalid-argument-type] Argument to function `make_dataclass` is incorrect: Expected `dict[str, Any] | None`, found `bool`
+ src/hydra_zen/structured_configs/_implementations.py:3342:52: error[invalid-argument-type] Argument to function `make_dataclass` is incorrect: Expected `bool`, found `str | None`
+ src/hydra_zen/structured_configs/_implementations.py:3342:52: error[invalid-argument-type] Argument to function `make_dataclass` is incorrect: Expected `bool`, found `dict[str, Any] | None`
- Found 540 diagnostics
+ Found 546 diagnostics

scikit-build-core (https://github.com/scikit-build/scikit-build-core)
- src/scikit_build_core/_logging.py:153:13: warning[unsupported-base] Unsupported class base with type `<class 'Mapping[str, Style]'> | <class 'Mapping[str, Divergent]'>`
- src/scikit_build_core/build/wheel.py:98:20: error[no-matching-overload] No overload of bound method `__init__` matches arguments
- Found 43 diagnostics
+ Found 41 diagnostics

pydantic (https://github.com/pydantic/pydantic)
- pydantic/_internal/_schema_gather.py:126:28: error[invalid-key] Unknown key "steps" for TypedDict `DataclassSchema` - did you mean "type"?
+ pydantic/_internal/_schema_gather.py:126:28: error[invalid-key] Unknown key "steps" for TypedDict `DataclassSchema` - did you mean "slots"?
- pydantic/fields.py:943:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, int | float | str | ... omitted 3 union elements] | ((dict[str, int | float | str | ... omitted 3 union elements], /) -> None) | None`
+ pydantic/fields.py:943:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, Divergent] | ((dict[str, Divergent], /) -> None) | None`
- pydantic/fields.py:983:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, int | float | str | ... omitted 3 union elements] | ((dict[str, int | float | str | ... omitted 3 union elements], /) -> None) | None`
+ pydantic/fields.py:983:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, Divergent] | ((dict[str, Divergent], /) -> None) | None`
- pydantic/fields.py:1026:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, int | float | str | ... omitted 3 union elements] | ((dict[str, int | float | str | ... omitted 3 union elements], /) -> None) | None`
+ pydantic/fields.py:1026:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, Divergent] | ((dict[str, Divergent], /) -> None) | None`
- pydantic/fields.py:1066:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, int | float | str | ... omitted 3 union elements] | ((dict[str, int | float | str | ... omitted 3 union elements], /) -> None) | None`
+ pydantic/fields.py:1066:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, Divergent] | ((dict[str, Divergent], /) -> None) | None`
- pydantic/fields.py:1109:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, int | float | str | ... omitted 3 union elements] | ((dict[str, int | float | str | ... omitted 3 union elements], /) -> None) | None`
+ pydantic/fields.py:1109:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, Divergent] | ((dict[str, Divergent], /) -> None) | None`
- pydantic/fields.py:1148:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, int | float | str | ... omitted 3 union elements] | ((dict[str, int | float | str | ... omitted 3 union elements], /) -> None) | None`
+ pydantic/fields.py:1148:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, Divergent] | ((dict[str, Divergent], /) -> None) | None`
- pydantic/fields.py:1188:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, int | float | str | ... omitted 3 union elements] | ((dict[str, int | float | str | ... omitted 3 union elements], /) -> None) | None`
+ pydantic/fields.py:1188:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, Divergent] | ((dict[str, Divergent], /) -> None) | None`
- pydantic/fields.py:1567:13: error[invalid-argument-type] Argument is incorrect: Expected `dict[str, int | float | str | ... omitted 3 union elements] | ((dict[str, int | float | str | ... omitted 3 union elements], /) -> None) | None`, found `Top[dict[Unknown, Unknown]] | (((dict[str, int | float | str | ... omitted 3 union elements], /) -> None) & ~Top[dict[Unknown, Unknown]]) | None`
+ pydantic/fields.py:1567:13: error[invalid-argument-type] Argument is incorrect: Expected `dict[str, Divergent] | ((dict[str, Divergent], /) -> None) | None`, found `Top[dict[Unknown, Unknown]] | (((dict[str, Divergent], /) -> None) & ~Top[dict[Unknown, Unknown]]) | None`

Memory usage changes were detected when running on open source projects
sphinx (https://github.com/sphinx-doc/sphinx)
+ WARN expected `heap_size` to be provided by Salsa query `class_based_items`
+ WARN expected `heap_size` to be provided by Salsa query `class_based_items`
+ WARN expected `heap_size` to be provided by Salsa query `class_based_items`
+ WARN expected `heap_size` to be provided by Salsa query `class_based_items`
+ WARN expected `heap_size` to be provided by Salsa query `class_based_items`
+ WARN expected `heap_size` to be provided by Salsa query `class_based_items`

prefect (https://github.com/PrefectHQ/prefect)
+ WARN expected `heap_size` to be provided by Salsa query `class_based_items`
+ WARN expected `heap_size` to be provided by Salsa query `class_based_items`
+ WARN expected `heap_size` to be provided by Salsa query `class_based_items`
+ WARN expected `heap_size` to be provided by Salsa query `class_based_items`
+ WARN expected `heap_size` to be provided by Salsa query `class_based_items`
+ WARN expected `heap_size` to be provided by Salsa query `class_based_items`
+ WARN expected `heap_size` to be provided by Salsa query `class_based_items`
+ WARN expected `heap_size` to be provided by Salsa query `class_based_items`
+ WARN expected `heap_size` to be provided by Salsa query `class_based_items`
+ WARN expected `heap_size` to be provided by Salsa query `class_based_items`
+ WARN expected `heap_size` to be provided by Salsa query `class_based_items`
+ WARN expected `heap_size` to be provided by Salsa query `class_based_items`
+ WARN expected `heap_size` to be provided by Salsa query `class_based_items`
+ WARN expected `heap_size` to be provided by Salsa query `class_based_items`
+ WARN expected `heap_size` to be provided by Salsa query `class_based_items`
+ WARN expected `heap_size` to be provided by Salsa query `class_based_items`

TypedDictType::Class(_) => {
let synthesized =
SynthesizedTypedDictType::new(db, self.params(db), self.items(db));
TypedDictType::Synthesized(synthesized.normalized_impl(db, visitor))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to inline SynthesizedTypedDictType::normalized_impl here instead of creating two instances here and throwing the first away?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative that might be worth here is to cache synthesized instead of self.items()` like this

diff --git a/crates/ty_python_semantic/src/types/typed_dict.rs b/crates/ty_python_semantic/src/types/typed_dict.rs
index 89ef0016f1..0a3f2f9cdd 100644
--- a/crates/ty_python_semantic/src/types/typed_dict.rs
+++ b/crates/ty_python_semantic/src/types/typed_dict.rs
@@ -71,36 +71,10 @@ impl<'db> TypedDictType<'db> {
     }
 
     pub(crate) fn items(self, db: &'db dyn Db) -> &'db TypedDictSchema<'db> {
-        #[salsa::tracked(returns(ref))]
-        fn class_based_items<'db>(db: &'db dyn Db, class: ClassType<'db>) -> TypedDictSchema<'db> {
-            let (class_literal, specialization) = class.class_literal(db);
-            class_literal
-                .fields(db, specialization, CodeGeneratorKind::TypedDict)
-                .into_iter()
-                .map(|(name, field)| {
-                    let field = match field {
-                        Field {
-                            first_declaration,
-                            declared_ty,
-                            kind:
-                                FieldKind::TypedDict {
-                                    is_required,
-                                    is_read_only,
-                                },
-                        } => TypedDictFieldBuilder::new(*declared_ty)
-                            .required(*is_required)
-                            .read_only(*is_read_only)
-                            .first_declaration(*first_declaration)
-                            .build(),
-                        _ => unreachable!("TypedDict field expected"),
-                    };
-                    (name.clone(), field)
-                })
-                .collect()
-        }
-
         match self {
-            Self::Class(defining_class) => class_based_items(db, defining_class),
+            Self::Class(defining_class) => {
+                SynthesizedTypedDictType::for_class(db, defining_class).items(db)
+            }
             Self::Synthesized(synthesized) => synthesized.items(db),
         }
     }
@@ -300,8 +274,8 @@ impl<'db> TypedDictType<'db> {
 
     pub(crate) fn normalized_impl(self, db: &'db dyn Db, visitor: &NormalizedVisitor<'db>) -> Self {
         match self {
-            TypedDictType::Class(_) => {
-                let synthesized = SynthesizedTypedDictType::new(db, self.items(db));
+            TypedDictType::Class(class) => {
+                let synthesized = SynthesizedTypedDictType::for_class(db, class);
                 TypedDictType::Synthesized(synthesized.normalized_impl(db, visitor))
             }
             TypedDictType::Synthesized(synthesized) => {
@@ -323,11 +297,13 @@ impl<'db> TypedDictType<'db> {
         // Compare the fields without requiring them to be in sorted order. Class-based `TypedDict`
         // fields are not sorted. We do sort synthetic fields in `normalized_impl`, but there will
         // soon be other sources of `SynthesizedTypedDictType` besides normalization.
-        if self.items(db).len() != other.items(db).len() {
+        let self_items = self.items(db);
+        let other_items = other.items(db);
+        if self_items.len() != other_items.len() {
             return ConstraintSet::from(false);
         }
-        let other_items = other.items(db);
-        self.items(db).iter().when_all(db, |(name, field)| {
+
+        self_items.iter().when_all(db, |(name, field)| {
             let Some(other_field) = other_items.get(name) else {
                 return ConstraintSet::from(false);
             };
@@ -744,7 +720,37 @@ pub struct SynthesizedTypedDictType<'db> {
 // The Salsa heap is tracked separately.
 impl get_size2::GetSize for SynthesizedTypedDictType<'_> {}
 
+#[salsa::tracked]
 impl<'db> SynthesizedTypedDictType<'db> {
+    #[salsa::tracked]
+    fn for_class(db: &'db dyn Db, class: ClassType<'db>) -> SynthesizedTypedDictType<'db> {
+        let (class_literal, specialization) = class.class_literal(db);
+        let items: TypedDictSchema<'db> = class_literal
+            .fields(db, specialization, CodeGeneratorKind::TypedDict)
+            .into_iter()
+            .map(|(name, field)| {
+                let field = match field {
+                    Field {
+                        first_declaration,
+                        declared_ty,
+                        kind:
+                            FieldKind::TypedDict {
+                                is_required,
+                                is_read_only,
+                            },
+                    } => TypedDictFieldBuilder::new(*declared_ty)
+                        .required(*is_required)
+                        .read_only(*is_read_only)
+                        .first_declaration(*first_declaration)
+                        .build(),
+                    _ => unreachable!("TypedDict field expected"),
+                };
+                (name.clone(), field)
+            })
+            .collect();
+        SynthesizedTypedDictType::new(db, items)
+    }
+
     pub(super) fn apply_type_mapping_impl<'a>(
         self,
         db: &'db dyn Db,
@@ -768,15 +774,20 @@ impl<'db> SynthesizedTypedDictType<'db> {
     }
 
     pub(crate) fn normalized_impl(self, db: &'db dyn Db, visitor: &NormalizedVisitor<'db>) -> Self {
+        let mut changed = false;
         let items = self
             .items(db)
             .iter()
             .map(|(name, field)| {
-                let field = field.clone().normalized_impl(db, visitor);
-                (name.clone(), field)
+                let new_field = field.clone().normalized_impl(db, visitor);
+                if !changed && &new_field != field {
+                    changed = true;
+                }
+                (name.clone(), new_field)
             })
             .collect::<TypedDictSchema<'db>>();
-        Self::new(db, items)
+
+        if changed { Self::new(db, items) } else { self }
     }
 }
 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intuition is that normal code (read: not Pydantic) is going to spend more time looking up the .items() of regular class-based TypedDicts than it spends normalizing, so caching .items() makes sense to me.

@astral-sh-bot
Copy link

astral-sh-bot bot commented Dec 4, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@codspeed-hq
Copy link

codspeed-hq bot commented Dec 4, 2025

CodSpeed Performance Report

Merging #21784 will not alter performance

Comparing synthesized_typeddict (c127766) with main (a2fb2ee)

Summary

✅ 22 untouched
⏩ 30 skipped1

Footnotes

  1. 30 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@AlexWaygood
Copy link
Member

This depends on a couple of upstream changes, and the first commit is a temporary/dummy commit that pins git hashes for these:

while we wait for the upstream getsize2 change to land, you can use the implementations of heap_size we have in this repo for OrderMap and OrderSet:

/// An implementation of [`GetSize::get_heap_size`] for [`OrderSet`].
pub fn order_set_heap_size<T: GetSize, S>(set: &OrderSet<T, S>) -> usize {
(set.capacity() * T::get_stack_size()) + set.iter().map(heap_size).sum::<usize>()
}
/// An implementation of [`GetSize::get_heap_size`] for [`OrderMap`].
pub fn order_map_heap_size<K: GetSize, V: GetSize, S>(map: &OrderMap<K, V, S>) -> usize {
(map.capacity() * (K::get_stack_size() + V::get_stack_size()))
+ (map.iter())
.map(|(k, v)| heap_size(k) + heap_size(v))
.sum::<usize>()
}

But see my comment at #21784 (comment) -- I think we may actually need to use a BTreeMap for the items of a synthesized typeddict, since we need two equivalent typeddicts to normalize to synthesized typeddicts that compare equal

@bircni
Copy link

bircni commented Dec 4, 2025

Just published the new get-size version with the implementation!

@oconnor663
Copy link
Contributor Author

@bircni incredible thank you!

@astral-sh-bot
Copy link

astral-sh-bot bot commented Dec 4, 2025

ecosystem-analyzer results

Lint rule Added Removed Changed
unsupported-base 2 0 0
redundant-cast 0 0 1
Total 2 0 1

Full report with detailed diff (timing results)

@oconnor663
Copy link
Contributor Author

oconnor663 commented Dec 4, 2025

Hmm, the change I made to redundant cast warnings changed one unrelated union cast warning in the ecosystem analysis. Is this better or worse than before? Before:

warning[redundant-cast]: Value is already of type `Literal["function", "class", "method", "module"]`
  --> basic_checker.py:85:21
   |
83 |     lines = ["type", "number", "old number", "difference", "%documented", "%badname"]
84 |     for node_type in ("module", "class", "method", "function"):
85 |         node_type = cast(Literal["function", "class", "method", "module"], node_type)
   |                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
86 |         new = stats.get_node_count(node_type)
87 |         old = old_stats.get_node_count(node_type) if old_stats else None
   |
info: rule `redundant-cast` is enabled by default

After:

warning[redundant-cast]: Value is already of type `Literal["module", "class", "method", "function"]`, which is equivalent to `Literal["function", "class", "method", "module"]`
  --> basic_checker.py:85:21
   |
83 |     lines = ["type", "number", "old number", "difference", "%documented", "%badname"]
84 |     for node_type in ("module", "class", "method", "function"):
85 |         node_type = cast(Literal["function", "class", "method", "module"], node_type)
   |                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
86 |         new = stats.get_node_count(node_type)
87 |         old = old_stats.get_node_count(node_type) if old_stats else None
   |
info: rule `redundant-cast` is enabled by default

Is this unhelpful with unions? Like it's interesting and possibly surprising that Foo and Bar typeddicts can be equivalent, but it's boring and obvious that the same union in a different order is the same union? I could check specifically for TypedDict (and Protocol) when emitting this?

@oconnor663
Copy link
Contributor Author

@AlexWaygood says the beartype diagnostic in the ecosystem report is flaky, so other than the "eye doctor: better or worse, better or worse" question above, the report is clean.

@carljm
Copy link
Contributor

carljm commented Dec 5, 2025

I think the redundant-cast change is fine as-is, if anything a small improvement, even in the "obvious" case. I suppose we could even make it more explicit by naming both types in any case where they are equivalent-but-not-identical: "Value is already of type X which is equivalent to type Y". But I don't know that it's worth bothering to do that unless we get evidence that someone is confused by the shorter form that assumes you can read the cast type yourself.

EDIT: oops, I failed to scroll to the right and notice that the longer form I suggested is exactly what you already did! I think that's just fine, even in a more-obvious case.

@oconnor663 oconnor663 force-pushed the synthesized_typeddict branch from e197647 to 9dba965 Compare December 5, 2025 05:33
@oconnor663
Copy link
Contributor Author

oconnor663 commented Dec 5, 2025

The last unaddressed issue that I know of on this PR is the Pydantic regression. I'm looking at that now. It's presumably our old friend the gigantic recursive union?

Edit: very premature 😬

@AlexWaygood
Copy link
Member

I would only expect a pydantic regression on this PR if (1) we're using Type::is_equivalent_to in our codebase somewhere where we shouldn't be or (2) pydantic is making extensive use of assert_type and/or cast. It might be worth checking those things?

Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! A lot of this looks good, but I think there's a few issues here to iron out

}

/// Return the meta-type of this `TypedDict` type.
pub(super) fn to_meta_class(self, db: &'db dyn Db) -> ClassType<'db> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think to_meta_class is an accurate name for what this method is doing (and... I don't think it's doing what it's trying to do correctly either 😄). A synthesized typeddict doesn't have a class, so it doesn't have a metaclass either. A class-based TypedDict does have a class, and therefore it also has a metaclass, but this method doesn't return the metaclass of a class-based typeddict, it just returns that class-based typeddict's defining class.

It looks like this method is actually being used to return a ClassType that can then be used to construct the meta-type of a TypedDict instance-type. So on that basis, we should rename this method to_meta_type, and have it return a Type rather than a ClassType.

But I also don't think this method gives the correct answer for the meta-type of a synthesized typeddict. A typeddict's meta-type is used for looking up synthesized TypedDict methods such as __getitem__ -- if you return type[TypedDictFallback] here rather than type[<synthetic typeddict>], then when we start doing member lookups on synthetic TypedDicts, we'll return the wrong results for members like __getitem__. As you noted in https://github.com/astral-sh/ruff/pull/21784/files#r2587403973, we don't ever do any member lookups on synthesized typeddicts yet, but we will if we start adding them to intersections in order to fix the TypedDict part of astral-sh/ty#1479. Giving a good answer for the meta-type of a synthesized TypedDict may involve adding a new variant to the SubclassOfInner enum in subclass_of.rs.

For now, I would recommend applying something like this patch. Then we can revisit the question of an accurate meta-type for synthesized TypedDicts when we actually start using them in more places. It's very hard to write tests for right now, when we use them in so few places :-)

diff --git a/crates/ty_python_semantic/src/types.rs b/crates/ty_python_semantic/src/types.rs
index e7459ce7fb..2ab9c3beba 100644
--- a/crates/ty_python_semantic/src/types.rs
+++ b/crates/ty_python_semantic/src/types.rs
@@ -7524,7 +7524,13 @@ impl<'db> Type<'db> {
             Type::ProtocolInstance(protocol) => protocol.to_meta_type(db),
             // `TypedDict` instances are instances of `dict` at runtime, but its important that we
             // understand a more specific meta type in order to correctly handle `__getitem__`.
-            Type::TypedDict(typed_dict) => typed_dict.to_meta_class(db).into(),
+            Type::TypedDict(typed_dict) => match typed_dict {
+                TypedDictType::Class(class) => SubclassOfType::from(db, class),
+                TypedDictType::Synthesized(_) => SubclassOfType::from(
+                    db,
+                    todo_type!("TypedDict synthesized meta-type").expect_dynamic(),
+                ),
+            },
             Type::TypeAlias(alias) => alias.value_type(db).to_meta_type(db),
             Type::NewTypeInstance(newtype) => Type::from(newtype.base_class_type(db)),
         }
diff --git a/crates/ty_python_semantic/src/types/subclass_of.rs b/crates/ty_python_semantic/src/types/subclass_of.rs
index 1045817a53..c6bb9d0378 100644
--- a/crates/ty_python_semantic/src/types/subclass_of.rs
+++ b/crates/ty_python_semantic/src/types/subclass_of.rs
@@ -8,7 +8,7 @@ use crate::types::{
     ApplyTypeMappingVisitor, BoundTypeVarInstance, ClassType, DynamicType,
     FindLegacyTypeVarsVisitor, HasRelationToVisitor, IsDisjointVisitor, KnownClass,
     MaterializationKind, MemberLookupPolicy, NormalizedVisitor, SpecialFormType, Type, TypeContext,
-    TypeMapping, TypeRelation, TypeVarBoundOrConstraints, todo_type,
+    TypeMapping, TypeRelation, TypeVarBoundOrConstraints, TypedDictType, todo_type,
 };
 use crate::{Db, FxOrderSet};
 
@@ -381,7 +381,12 @@ impl<'db> SubclassOfInner<'db> {
     pub(crate) fn try_from_instance(db: &'db dyn Db, ty: Type<'db>) -> Option<Self> {
         Some(match ty {
             Type::NominalInstance(instance) => SubclassOfInner::Class(instance.class(db)),
-            Type::TypedDict(typed_dict) => SubclassOfInner::Class(typed_dict.to_meta_class(db)),
+            Type::TypedDict(typed_dict) => match typed_dict {
+                TypedDictType::Class(class) => SubclassOfInner::Class(class),
+                TypedDictType::Synthesized(_) => SubclassOfInner::Dynamic(
+                    todo_type!("type[T] for synthesized TypedDicts").expect_dynamic(),
+                ),
+            },
             Type::TypeVar(bound_typevar) => SubclassOfInner::TypeVar(bound_typevar),
             Type::Dynamic(DynamicType::Any) => SubclassOfInner::Dynamic(DynamicType::Any),
             Type::Dynamic(DynamicType::Unknown) => SubclassOfInner::Dynamic(DynamicType::Unknown),
diff --git a/crates/ty_python_semantic/src/types/typed_dict.rs b/crates/ty_python_semantic/src/types/typed_dict.rs
index 6e4d2d5726..a7e5360a06 100644
--- a/crates/ty_python_semantic/src/types/typed_dict.rs
+++ b/crates/ty_python_semantic/src/types/typed_dict.rs
@@ -282,24 +282,6 @@ impl<'db> TypedDictType<'db> {
         }
     }
 
-    /// Return the meta-type of this `TypedDict` type.
-    pub(super) fn to_meta_class(self, db: &'db dyn Db) -> ClassType<'db> {
-        // `TypedDict` instances are instances of `dict` at runtime, but its important that we
-        // understand a more specific meta type in order to correctly handle `__getitem__`.
-        match self {
-            TypedDictType::Class(defining_class) => defining_class,
-            TypedDictType::Synthesized(_) => KnownClass::TypedDictFallback
-                .try_to_class_literal(db)
-                .map(|class| class.default_specialization(db))
-                .unwrap_or_else(|| {
-                    KnownClass::Object
-                        .try_to_class_literal(db)
-                        .map(|class| class.default_specialization(db))
-                        .expect("object class must exist")
-                }),
-        }
-    }
-
     pub(crate) fn normalized_impl(self, db: &'db dyn Db, visitor: &NormalizedVisitor<'db>) -> Self {
         match self {
             TypedDictType::Class(_) => {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof, thanks for walking me through this.

Giving a good answer for the meta-type of a synthesized TypedDict may involve adding a new variant to the SubclassOfInner enum in subclass_of.rs.

Yes now that you mention it, that's what #20732 did.

Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!! LGTM, though I'd still love it if we could figure out why the big pydantic regression is occurring

TypedDictType::Class(_) => {
let synthesized =
SynthesizedTypedDictType::new(db, self.params(db), self.items(db));
TypedDictType::Synthesized(synthesized.normalized_impl(db, visitor))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine

@sharkdp sharkdp removed their request for review December 8, 2025 09:55
@AlexWaygood
Copy link
Member

I think the updates to the Cargo.toml/Cargo.lock files are probably not necessary with the latest version of this PR? They're obviously harmless, but they could probably be a standalone change at this point

@oconnor663
Copy link
Contributor Author

I think the updates to the Cargo.toml/Cargo.lock files are probably not necessary with the latest version of this PR? They're obviously harmless, but they could probably be a standalone change at this point

There are still a few places in class.rs where this PR changes IndexMap -> OrderMap, which depends on the Cargo.toml bumps. This PR no longer requires those changes after your patch, but we've also said (or at least Ibraheem said?) that would generally prefer to switch to OrderMap, so I've split all that out into a separate PR: #21854

@oconnor663
Copy link
Contributor Author

oconnor663 commented Dec 9, 2025

Some notes from digging into the performance regression today:

It's definitely our old friend CoreSchema, the gigantic TypedDict union. Not very surprising. One of the files affected in this particular PR is functional_serializers.py, which typechecks in ~0.05s on main and ~1s on this branch. I can shrink that entire file down to this while still preserving most of that gap:

from pydantic_core.core_schema import CoreSchema

def foo(core_schema: CoreSchema):
    core_schema.copy()

I'm not sure why .copy() in particular would be affected by these changes?

@carljm
Copy link
Contributor

carljm commented Dec 9, 2025

Can you profile main vs this branch checking that minimized example (e.g. using samply record)? With that big a difference in runtime, I would think the more-precise location of the extra time spent should jump out (and the stacks should help clarify from where we are calling the newly-expensive function).

@MichaReiser
Copy link
Member

I already shared this with @oconnor663 but for broader visibility:

  • Here's my profile https://share.firefox.dev/4pzdJ0u
  • What stands out is that we spend a significant amount within Eq and Hash that we didn't before. We call Eq a lot within UnionBuilder and Hash in normalize_impl (because we intern the value).

I'm not aware of any tricks to make either of those magically go away, but some of our typing wizards may do.

@oconnor663
Copy link
Contributor Author

@MichaReiser + @AlexWaygood, thanks for chatting up a storm with me this morning. Here's something more I've noticed staring at all our profiles. The reduced copy() example above spends ~all of its time inside of BoundMethodType::has_relation_to_impl. Two things jump out at me about that:

  • That methods recurses on both the function/method type being called, and on the receiver object. Given that CoreSchema shows up on both sides of that, that could be part of why we're blowing up here?
  • The function/method side of that immediately calls normalize in the Redundancy case.

Comment on lines 329 to 340
let other_items = other.items(db);
self.items(db).iter().when_all(db, |(name, field)| {
let Some(other_field) = other_items.get(name) else {
return ConstraintSet::from(false);
};
if field.flags != other_field.flags {
return ConstraintSet::from(false);
}
field
.declared_ty
.is_equivalent_to_impl(db, other_field.declared_ty, inferable, visitor)
})
Copy link
Member

@MichaReiser MichaReiser Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will change performance much but you could make use of the fact that you have two BTreeMap where both should return the keys in the same order if they have the same fields:

Suggested change
let other_items = other.items(db);
self.items(db).iter().when_all(db, |(name, field)| {
let Some(other_field) = other_items.get(name) else {
return ConstraintSet::from(false);
};
if field.flags != other_field.flags {
return ConstraintSet::from(false);
}
field
.declared_ty
.is_equivalent_to_impl(db, other_field.declared_ty, inferable, visitor)
})
let mut other_items_iter = other_items.iter();
self_items.iter().when_all(db, |(name, field)| {
let Some((other_name, other_field)) = other_items_iter.next() else {
return ConstraintSet::from(false);
};
if name != other_name || field.flags != other_field.flags {
return ConstraintSet::from(false);
}
field
.declared_ty
.is_equivalent_to_impl(db, other_field.declared_ty, inferable, visitor)
})

Doing the same in has_relation_to seems a bit trickier and would require peek_if

Copy link
Member

@MichaReiser MichaReiser Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how you could do the same in has_relation_to but it's a bit trickier:

let mut self_items_iter = self_items.iter().peekable();

        for (target_item_name, target_item_field) in target_items {
            // Skip over preceeding fields.
            let _ = {
                self_items_iter
                    .peeking_take_while(|(name, _)| *name < target_item_name)
                    .last()
            };
            let self_item_field = self_items_iter
                .peeking_next(|(name, _)| *name == target_item_name)
                .map(|(_, field)| field);

or use Itertools::merge_by

for pair in a.iter().merge_join_by(b.iter(), |(k1, _), (k2, _)| k1.cmp(k2)) {
    match pair {
        EitherOrBoth::Both((k, v1), (_, v2)) => {
            // Key exists in both maps
        }
        EitherOrBoth::Left((k, v1)) => {
            // Only in `a`
        }
        EitherOrBoth::Right((k, v2)) => {
            // Only in `b`
        }
    }
}

Copy link
Contributor Author

@oconnor663 oconnor663 Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes (to the first diff above), and the comment here about not being in sorted order is entirely wrong now :) I think since we're doing a length check up front, we can go ahead and zip the iterators, which is a little shorter: 3e5b534

@oconnor663
Copy link
Contributor Author

  • Moving the TypedDictType special case as early as possible in UnionBuilder::push_type doesn't make a difference here.
  • Caching the conversion from class-based to synthetic typeddict saves 10% here in this most pathological case, but it doesn't close the bulk of the gap. As far as I know, I can't cache normalized_impl directly (which is what I'd really prefer to do), because of the visitor it takes.

@oconnor663
Copy link
Contributor Author

oconnor663 commented Dec 10, 2025

I've pushed an EXPERIMENTAL COMMIT which I don't intend to actually land, which deletes the call to normalized that I think is at the heart of this PR's regression:

diff --git a/crates/ty_python_semantic/src/types/function.rs b/crates/ty_python_semantic/src/types/function.rs
index 421504e09b..9bf73f116d 100644
--- a/crates/ty_python_semantic/src/types/function.rs
+++ b/crates/ty_python_semantic/src/types/function.rs
@@ -1022,30 +1022,18 @@ impl<'db> FunctionType<'db> {
     pub(crate) fn has_relation_to_impl(
         self,
         db: &'db dyn Db,
         other: Self,
         inferable: InferableTypeVars<'_, 'db>,
         relation: TypeRelation<'db>,
         relation_visitor: &HasRelationToVisitor<'db>,
         disjointness_visitor: &IsDisjointVisitor<'db>,
     ) -> ConstraintSet<'db> {
-        // A function type is the subtype of itself, and not of any other function type. However,
-        // our representation of a function type includes any specialization that should be applied
-        // to the signature. Different specializations of the same function type are only subtypes
-        // of each other if they result in subtype signatures.
-        if matches!(
-            relation,
-            TypeRelation::Subtyping | TypeRelation::Redundancy | TypeRelation::SubtypingAssuming(_)
-        ) && self.normalized(db) == other.normalized(db)
-        {
-            return ConstraintSet::from(true);
-        }
-
         if self.literal(db) != other.literal(db) {
             return ConstraintSet::from(false);
         }

This does seem to fix the perf regression (and not break tests) locally. I'm curious to see what CodSpeed says. If CodSpeed is green, I will probably land this PR as-is (reverting this experimental commit of course) and fork off an issue to track this regression.

(btw I'm sure there's a more direct way to run CodSpeed on a random branch, either locally or in the cloud, so if anyone knows the non-dumb workflow for this please do let me know)

Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The performance experiment looks like it was pretty successful to me!! It totally solved the regression and had zero impact on both our test suite and the primer report for this PR. Great job 😃

This looks ready to go now

@oconnor663 oconnor663 force-pushed the synthesized_typeddict branch from ee5e060 to fd7b929 Compare December 10, 2025 19:53
@carljm
Copy link
Contributor

carljm commented Dec 10, 2025

If the "experimental" commit fixed the pydantic regression and didn't add regressions anywhere else (which is what it looks like to me in CodSpeed), then I don't see any reason to revert that change here or wait for later to re-evaluate in astral-sh/ty#1845 -- I think we should just include that change in this PR. What's the downside?

@carljm
Copy link
Contributor

carljm commented Dec 10, 2025

(btw I'm sure there's a more direct way to run CodSpeed on a random branch, either locally or in the cloud, so if anyone knows the non-dumb workflow for this please do let me know)

AFAIK you have to make a PR, but it could be a separate draft PR (possibly based on an existing PR) if you want to "hide" it more.

@AlexWaygood
Copy link
Member

If the "experimental" commit fixed the pydantic regression and didn't add regressions anywhere else (which is what it looks like to me in CodSpeed), then I don't see any reason to revert that change here or wait for later to re-evaluate in astral-sh/ty#1845 -- I think we should just include that change in this PR. What's the downside?

Wait, yeah, same comment! Why revert a successful experiment? If we don't introduce a performance regression in the first place, there's no need to create a followup issue. It seemed like that commit had zero downsides!

@oconnor663 oconnor663 enabled auto-merge (squash) December 10, 2025 20:35
@oconnor663 oconnor663 merged commit 1b44d7e into main Dec 10, 2025
41 checks passed
@oconnor663 oconnor663 deleted the synthesized_typeddict branch December 10, 2025 20:36
dcreager added a commit that referenced this pull request Dec 10, 2025
…-cycle

* origin/main:
  [ty] Support implicit type of `cls` in signatures (#21771)
  [ty] add `SyntheticTypedDictType` and implement `normalized` and `is_equivalent_to` (#21784)
  [ty] Fix disjointness checks with type-of `@final` classes (#21770)
  [ty] Fix negation upper bounds in constraint sets (#21897)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ecosystem-analyzer ty Multi-file analysis & type inference

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants