-
Notifications
You must be signed in to change notification settings - Fork 1.7k
[ty] add SyntheticTypedDictType and implement normalized and is_equivalent_to
#21784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
SyntheticTypedDictType and implement normalized and is_equivalent_toSyntheticTypedDictType and implement normalized and is_equivalent_to
Diagnostic diff on typing conformance testsNo changes detected when running ty on typing conformance tests ✅ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
| TypedDictType::Class(_) => { | ||
| let synthesized = | ||
| SynthesizedTypedDictType::new(db, self.params(db), self.items(db)); | ||
| TypedDictType::Synthesized(synthesized.normalized_impl(db, visitor)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to inline SynthesizedTypedDictType::normalized_impl here instead of creating two instances here and throwing the first away?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative that might be worth here is to cache synthesized instead of self.items()` like this
diff --git a/crates/ty_python_semantic/src/types/typed_dict.rs b/crates/ty_python_semantic/src/types/typed_dict.rs
index 89ef0016f1..0a3f2f9cdd 100644
--- a/crates/ty_python_semantic/src/types/typed_dict.rs
+++ b/crates/ty_python_semantic/src/types/typed_dict.rs
@@ -71,36 +71,10 @@ impl<'db> TypedDictType<'db> {
}
pub(crate) fn items(self, db: &'db dyn Db) -> &'db TypedDictSchema<'db> {
- #[salsa::tracked(returns(ref))]
- fn class_based_items<'db>(db: &'db dyn Db, class: ClassType<'db>) -> TypedDictSchema<'db> {
- let (class_literal, specialization) = class.class_literal(db);
- class_literal
- .fields(db, specialization, CodeGeneratorKind::TypedDict)
- .into_iter()
- .map(|(name, field)| {
- let field = match field {
- Field {
- first_declaration,
- declared_ty,
- kind:
- FieldKind::TypedDict {
- is_required,
- is_read_only,
- },
- } => TypedDictFieldBuilder::new(*declared_ty)
- .required(*is_required)
- .read_only(*is_read_only)
- .first_declaration(*first_declaration)
- .build(),
- _ => unreachable!("TypedDict field expected"),
- };
- (name.clone(), field)
- })
- .collect()
- }
-
match self {
- Self::Class(defining_class) => class_based_items(db, defining_class),
+ Self::Class(defining_class) => {
+ SynthesizedTypedDictType::for_class(db, defining_class).items(db)
+ }
Self::Synthesized(synthesized) => synthesized.items(db),
}
}
@@ -300,8 +274,8 @@ impl<'db> TypedDictType<'db> {
pub(crate) fn normalized_impl(self, db: &'db dyn Db, visitor: &NormalizedVisitor<'db>) -> Self {
match self {
- TypedDictType::Class(_) => {
- let synthesized = SynthesizedTypedDictType::new(db, self.items(db));
+ TypedDictType::Class(class) => {
+ let synthesized = SynthesizedTypedDictType::for_class(db, class);
TypedDictType::Synthesized(synthesized.normalized_impl(db, visitor))
}
TypedDictType::Synthesized(synthesized) => {
@@ -323,11 +297,13 @@ impl<'db> TypedDictType<'db> {
// Compare the fields without requiring them to be in sorted order. Class-based `TypedDict`
// fields are not sorted. We do sort synthetic fields in `normalized_impl`, but there will
// soon be other sources of `SynthesizedTypedDictType` besides normalization.
- if self.items(db).len() != other.items(db).len() {
+ let self_items = self.items(db);
+ let other_items = other.items(db);
+ if self_items.len() != other_items.len() {
return ConstraintSet::from(false);
}
- let other_items = other.items(db);
- self.items(db).iter().when_all(db, |(name, field)| {
+
+ self_items.iter().when_all(db, |(name, field)| {
let Some(other_field) = other_items.get(name) else {
return ConstraintSet::from(false);
};
@@ -744,7 +720,37 @@ pub struct SynthesizedTypedDictType<'db> {
// The Salsa heap is tracked separately.
impl get_size2::GetSize for SynthesizedTypedDictType<'_> {}
+#[salsa::tracked]
impl<'db> SynthesizedTypedDictType<'db> {
+ #[salsa::tracked]
+ fn for_class(db: &'db dyn Db, class: ClassType<'db>) -> SynthesizedTypedDictType<'db> {
+ let (class_literal, specialization) = class.class_literal(db);
+ let items: TypedDictSchema<'db> = class_literal
+ .fields(db, specialization, CodeGeneratorKind::TypedDict)
+ .into_iter()
+ .map(|(name, field)| {
+ let field = match field {
+ Field {
+ first_declaration,
+ declared_ty,
+ kind:
+ FieldKind::TypedDict {
+ is_required,
+ is_read_only,
+ },
+ } => TypedDictFieldBuilder::new(*declared_ty)
+ .required(*is_required)
+ .read_only(*is_read_only)
+ .first_declaration(*first_declaration)
+ .build(),
+ _ => unreachable!("TypedDict field expected"),
+ };
+ (name.clone(), field)
+ })
+ .collect();
+ SynthesizedTypedDictType::new(db, items)
+ }
+
pub(super) fn apply_type_mapping_impl<'a>(
self,
db: &'db dyn Db,
@@ -768,15 +774,20 @@ impl<'db> SynthesizedTypedDictType<'db> {
}
pub(crate) fn normalized_impl(self, db: &'db dyn Db, visitor: &NormalizedVisitor<'db>) -> Self {
+ let mut changed = false;
let items = self
.items(db)
.iter()
.map(|(name, field)| {
- let field = field.clone().normalized_impl(db, visitor);
- (name.clone(), field)
+ let new_field = field.clone().normalized_impl(db, visitor);
+ if !changed && &new_field != field {
+ changed = true;
+ }
+ (name.clone(), new_field)
})
.collect::<TypedDictSchema<'db>>();
- Self::new(db, items)
+
+ if changed { Self::new(db, items) } else { self }
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intuition is that normal code (read: not Pydantic) is going to spend more time looking up the .items() of regular class-based TypedDicts than it spends normalizing, so caching .items() makes sense to me.
|
CodSpeed Performance ReportMerging #21784 will not alter performanceComparing Summary
Footnotes
|
while we wait for the upstream ruff/crates/ruff_memory_usage/src/lib.rs Lines 46 to 57 in 326025d
But see my comment at #21784 (comment) -- I think we may actually need to use a |
|
Just published the new get-size version with the implementation! |
|
@bircni incredible thank you! |
|
| Lint rule | Added | Removed | Changed |
|---|---|---|---|
unsupported-base |
2 | 0 | 0 |
redundant-cast |
0 | 0 | 1 |
| Total | 2 | 0 | 1 |
|
Hmm, the change I made to redundant cast warnings changed one unrelated union cast warning in the ecosystem analysis. Is this better or worse than before? Before: After: Is this unhelpful with unions? Like it's interesting and possibly surprising that |
|
@AlexWaygood says the |
|
I think the redundant-cast change is fine as-is, if anything a small improvement, even in the "obvious" case. I suppose we could even make it more explicit by naming both types in any case where they are equivalent-but-not-identical: "Value is already of type X which is equivalent to type Y". But I don't know that it's worth bothering to do that unless we get evidence that someone is confused by the shorter form that assumes you can read the EDIT: oops, I failed to scroll to the right and notice that the longer form I suggested is exactly what you already did! I think that's just fine, even in a more-obvious case. |
e197647 to
9dba965
Compare
|
Edit: very premature 😬 |
|
I would only expect a pydantic regression on this PR if (1) we're using |
AlexWaygood
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! A lot of this looks good, but I think there's a few issues here to iron out
| } | ||
|
|
||
| /// Return the meta-type of this `TypedDict` type. | ||
| pub(super) fn to_meta_class(self, db: &'db dyn Db) -> ClassType<'db> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think to_meta_class is an accurate name for what this method is doing (and... I don't think it's doing what it's trying to do correctly either 😄). A synthesized typeddict doesn't have a class, so it doesn't have a metaclass either. A class-based TypedDict does have a class, and therefore it also has a metaclass, but this method doesn't return the metaclass of a class-based typeddict, it just returns that class-based typeddict's defining class.
It looks like this method is actually being used to return a ClassType that can then be used to construct the meta-type of a TypedDict instance-type. So on that basis, we should rename this method to_meta_type, and have it return a Type rather than a ClassType.
But I also don't think this method gives the correct answer for the meta-type of a synthesized typeddict. A typeddict's meta-type is used for looking up synthesized TypedDict methods such as __getitem__ -- if you return type[TypedDictFallback] here rather than type[<synthetic typeddict>], then when we start doing member lookups on synthetic TypedDicts, we'll return the wrong results for members like __getitem__. As you noted in https://github.com/astral-sh/ruff/pull/21784/files#r2587403973, we don't ever do any member lookups on synthesized typeddicts yet, but we will if we start adding them to intersections in order to fix the TypedDict part of astral-sh/ty#1479. Giving a good answer for the meta-type of a synthesized TypedDict may involve adding a new variant to the SubclassOfInner enum in subclass_of.rs.
For now, I would recommend applying something like this patch. Then we can revisit the question of an accurate meta-type for synthesized TypedDicts when we actually start using them in more places. It's very hard to write tests for right now, when we use them in so few places :-)
diff --git a/crates/ty_python_semantic/src/types.rs b/crates/ty_python_semantic/src/types.rs
index e7459ce7fb..2ab9c3beba 100644
--- a/crates/ty_python_semantic/src/types.rs
+++ b/crates/ty_python_semantic/src/types.rs
@@ -7524,7 +7524,13 @@ impl<'db> Type<'db> {
Type::ProtocolInstance(protocol) => protocol.to_meta_type(db),
// `TypedDict` instances are instances of `dict` at runtime, but its important that we
// understand a more specific meta type in order to correctly handle `__getitem__`.
- Type::TypedDict(typed_dict) => typed_dict.to_meta_class(db).into(),
+ Type::TypedDict(typed_dict) => match typed_dict {
+ TypedDictType::Class(class) => SubclassOfType::from(db, class),
+ TypedDictType::Synthesized(_) => SubclassOfType::from(
+ db,
+ todo_type!("TypedDict synthesized meta-type").expect_dynamic(),
+ ),
+ },
Type::TypeAlias(alias) => alias.value_type(db).to_meta_type(db),
Type::NewTypeInstance(newtype) => Type::from(newtype.base_class_type(db)),
}
diff --git a/crates/ty_python_semantic/src/types/subclass_of.rs b/crates/ty_python_semantic/src/types/subclass_of.rs
index 1045817a53..c6bb9d0378 100644
--- a/crates/ty_python_semantic/src/types/subclass_of.rs
+++ b/crates/ty_python_semantic/src/types/subclass_of.rs
@@ -8,7 +8,7 @@ use crate::types::{
ApplyTypeMappingVisitor, BoundTypeVarInstance, ClassType, DynamicType,
FindLegacyTypeVarsVisitor, HasRelationToVisitor, IsDisjointVisitor, KnownClass,
MaterializationKind, MemberLookupPolicy, NormalizedVisitor, SpecialFormType, Type, TypeContext,
- TypeMapping, TypeRelation, TypeVarBoundOrConstraints, todo_type,
+ TypeMapping, TypeRelation, TypeVarBoundOrConstraints, TypedDictType, todo_type,
};
use crate::{Db, FxOrderSet};
@@ -381,7 +381,12 @@ impl<'db> SubclassOfInner<'db> {
pub(crate) fn try_from_instance(db: &'db dyn Db, ty: Type<'db>) -> Option<Self> {
Some(match ty {
Type::NominalInstance(instance) => SubclassOfInner::Class(instance.class(db)),
- Type::TypedDict(typed_dict) => SubclassOfInner::Class(typed_dict.to_meta_class(db)),
+ Type::TypedDict(typed_dict) => match typed_dict {
+ TypedDictType::Class(class) => SubclassOfInner::Class(class),
+ TypedDictType::Synthesized(_) => SubclassOfInner::Dynamic(
+ todo_type!("type[T] for synthesized TypedDicts").expect_dynamic(),
+ ),
+ },
Type::TypeVar(bound_typevar) => SubclassOfInner::TypeVar(bound_typevar),
Type::Dynamic(DynamicType::Any) => SubclassOfInner::Dynamic(DynamicType::Any),
Type::Dynamic(DynamicType::Unknown) => SubclassOfInner::Dynamic(DynamicType::Unknown),
diff --git a/crates/ty_python_semantic/src/types/typed_dict.rs b/crates/ty_python_semantic/src/types/typed_dict.rs
index 6e4d2d5726..a7e5360a06 100644
--- a/crates/ty_python_semantic/src/types/typed_dict.rs
+++ b/crates/ty_python_semantic/src/types/typed_dict.rs
@@ -282,24 +282,6 @@ impl<'db> TypedDictType<'db> {
}
}
- /// Return the meta-type of this `TypedDict` type.
- pub(super) fn to_meta_class(self, db: &'db dyn Db) -> ClassType<'db> {
- // `TypedDict` instances are instances of `dict` at runtime, but its important that we
- // understand a more specific meta type in order to correctly handle `__getitem__`.
- match self {
- TypedDictType::Class(defining_class) => defining_class,
- TypedDictType::Synthesized(_) => KnownClass::TypedDictFallback
- .try_to_class_literal(db)
- .map(|class| class.default_specialization(db))
- .unwrap_or_else(|| {
- KnownClass::Object
- .try_to_class_literal(db)
- .map(|class| class.default_specialization(db))
- .expect("object class must exist")
- }),
- }
- }
-
pub(crate) fn normalized_impl(self, db: &'db dyn Db, visitor: &NormalizedVisitor<'db>) -> Self {
match self {
TypedDictType::Class(_) => {There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oof, thanks for walking me through this.
Giving a good answer for the meta-type of a synthesized TypedDict may involve adding a new variant to the SubclassOfInner enum in subclass_of.rs.
Yes now that you mention it, that's what #20732 did.
AlexWaygood
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!! LGTM, though I'd still love it if we could figure out why the big pydantic regression is occurring
| TypedDictType::Class(_) => { | ||
| let synthesized = | ||
| SynthesizedTypedDictType::new(db, self.params(db), self.items(db)); | ||
| TypedDictType::Synthesized(synthesized.normalized_impl(db, visitor)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine
|
I think the updates to the Cargo.toml/Cargo.lock files are probably not necessary with the latest version of this PR? They're obviously harmless, but they could probably be a standalone change at this point |
There are still a few places in |
|
Some notes from digging into the performance regression today: It's definitely our old friend from pydantic_core.core_schema import CoreSchema
def foo(core_schema: CoreSchema):
core_schema.copy()I'm not sure why |
|
Can you profile main vs this branch checking that minimized example (e.g. using |
|
I already shared this with @oconnor663 but for broader visibility:
I'm not aware of any tricks to make either of those magically go away, but some of our typing wizards may do. |
|
@MichaReiser + @AlexWaygood, thanks for chatting up a storm with me this morning. Here's something more I've noticed staring at all our profiles. The reduced
|
| let other_items = other.items(db); | ||
| self.items(db).iter().when_all(db, |(name, field)| { | ||
| let Some(other_field) = other_items.get(name) else { | ||
| return ConstraintSet::from(false); | ||
| }; | ||
| if field.flags != other_field.flags { | ||
| return ConstraintSet::from(false); | ||
| } | ||
| field | ||
| .declared_ty | ||
| .is_equivalent_to_impl(db, other_field.declared_ty, inferable, visitor) | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this will change performance much but you could make use of the fact that you have two BTreeMap where both should return the keys in the same order if they have the same fields:
| let other_items = other.items(db); | |
| self.items(db).iter().when_all(db, |(name, field)| { | |
| let Some(other_field) = other_items.get(name) else { | |
| return ConstraintSet::from(false); | |
| }; | |
| if field.flags != other_field.flags { | |
| return ConstraintSet::from(false); | |
| } | |
| field | |
| .declared_ty | |
| .is_equivalent_to_impl(db, other_field.declared_ty, inferable, visitor) | |
| }) | |
| let mut other_items_iter = other_items.iter(); | |
| self_items.iter().when_all(db, |(name, field)| { | |
| let Some((other_name, other_field)) = other_items_iter.next() else { | |
| return ConstraintSet::from(false); | |
| }; | |
| if name != other_name || field.flags != other_field.flags { | |
| return ConstraintSet::from(false); | |
| } | |
| field | |
| .declared_ty | |
| .is_equivalent_to_impl(db, other_field.declared_ty, inferable, visitor) | |
| }) |
Doing the same in has_relation_to seems a bit trickier and would require peek_if
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how you could do the same in has_relation_to but it's a bit trickier:
let mut self_items_iter = self_items.iter().peekable();
for (target_item_name, target_item_field) in target_items {
// Skip over preceeding fields.
let _ = {
self_items_iter
.peeking_take_while(|(name, _)| *name < target_item_name)
.last()
};
let self_item_field = self_items_iter
.peeking_next(|(name, _)| *name == target_item_name)
.map(|(_, field)| field);or use Itertools::merge_by
for pair in a.iter().merge_join_by(b.iter(), |(k1, _), (k2, _)| k1.cmp(k2)) {
match pair {
EitherOrBoth::Both((k, v1), (_, v2)) => {
// Key exists in both maps
}
EitherOrBoth::Left((k, v1)) => {
// Only in `a`
}
EitherOrBoth::Right((k, v2)) => {
// Only in `b`
}
}
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes (to the first diff above), and the comment here about not being in sorted order is entirely wrong now :) I think since we're doing a length check up front, we can go ahead and zip the iterators, which is a little shorter: 3e5b534
|
|
I've pushed an diff --git a/crates/ty_python_semantic/src/types/function.rs b/crates/ty_python_semantic/src/types/function.rs
index 421504e09b..9bf73f116d 100644
--- a/crates/ty_python_semantic/src/types/function.rs
+++ b/crates/ty_python_semantic/src/types/function.rs
@@ -1022,30 +1022,18 @@ impl<'db> FunctionType<'db> {
pub(crate) fn has_relation_to_impl(
self,
db: &'db dyn Db,
other: Self,
inferable: InferableTypeVars<'_, 'db>,
relation: TypeRelation<'db>,
relation_visitor: &HasRelationToVisitor<'db>,
disjointness_visitor: &IsDisjointVisitor<'db>,
) -> ConstraintSet<'db> {
- // A function type is the subtype of itself, and not of any other function type. However,
- // our representation of a function type includes any specialization that should be applied
- // to the signature. Different specializations of the same function type are only subtypes
- // of each other if they result in subtype signatures.
- if matches!(
- relation,
- TypeRelation::Subtyping | TypeRelation::Redundancy | TypeRelation::SubtypingAssuming(_)
- ) && self.normalized(db) == other.normalized(db)
- {
- return ConstraintSet::from(true);
- }
-
if self.literal(db) != other.literal(db) {
return ConstraintSet::from(false);
}This does seem to fix the perf regression (and not break tests) locally. I'm curious to see what CodSpeed says. If CodSpeed is green, I will probably land this PR as-is (reverting this experimental commit of course) and fork off an issue to track this regression. (btw I'm sure there's a more direct way to run CodSpeed on a random branch, either locally or in the cloud, so if anyone knows the non-dumb workflow for this please do let me know) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The performance experiment looks like it was pretty successful to me!! It totally solved the regression and had zero impact on both our test suite and the primer report for this PR. Great job 😃
This looks ready to go now
ee5e060 to
fd7b929
Compare
|
If the "experimental" commit fixed the pydantic regression and didn't add regressions anywhere else (which is what it looks like to me in CodSpeed), then I don't see any reason to revert that change here or wait for later to re-evaluate in astral-sh/ty#1845 -- I think we should just include that change in this PR. What's the downside? |
AFAIK you have to make a PR, but it could be a separate draft PR (possibly based on an existing PR) if you want to "hide" it more. |
Wait, yeah, same comment! Why revert a successful experiment? If we don't introduce a performance regression in the first place, there's no need to create a followup issue. It seemed like that commit had zero downsides! |
…-cycle * origin/main: [ty] Support implicit type of `cls` in signatures (#21771) [ty] add `SyntheticTypedDictType` and implement `normalized` and `is_equivalent_to` (#21784) [ty] Fix disjointness checks with type-of `@final` classes (#21770) [ty] Fix negation upper bounds in constraint sets (#21897)
This PR cribs a lot of @ibraheemdev's work on #20732.
This depends on a couple of upstream changes, and the first commit is a temporary/dummy commit that pins git hashes for these:
UpdateforOrderMapandOrderSetsalsa-rs/salsa#1033Those pins (at least the second one, which overwrites a published version number) aren't intended to land, and
I'll need to fix up this PR once / assuming I can get the upstream changes shipped.Update: This is done.