Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 15 additions & 5 deletions proto/substrait/algebra.proto
Original file line number Diff line number Diff line change
Expand Up @@ -388,6 +388,8 @@ message ConsistentPartitionWindowRel {
// sorts, and bounds, since those must be consistent across the various functions in this rel. Refer
// to the `WindowFunction` message for a description of these fields.
message WindowRelFunction {
// References a function_anchor defined in this plan.
// 0 is a valid anchor/reference.
uint32 function_reference = 1;

repeated FunctionArgument arguments = 9;
Expand Down Expand Up @@ -788,7 +790,8 @@ message ComparisonJoinKey {
// A custom comparison behavior is used. This can happen, for example, when using
// collations, where we might want to do something like a case-insensitive comparison.
//
// This must be a binary function with a boolean return type
// This must be a binary function with a boolean return type.
// 0 is a valid anchor/reference.
uint32 custom_function_reference = 2;
}
}
Expand Down Expand Up @@ -1046,6 +1049,7 @@ message Expression {
// optionally points to a type_variation_anchor defined in this plan.
// Applies to all members of union other than the Typed null (which should
// directly declare the type variation).
// 0 is a valid anchor/reference.
uint32 type_variation_reference = 51;

message VarChar {
Expand Down Expand Up @@ -1125,9 +1129,11 @@ message Expression {
message UserDefined {
oneof type_anchor_type {
// points to a type_anchor defined in this plan
// 0 is a valid anchor/reference.
uint32 type_reference = 1;

// points to a type_alias_anchor defined in this plan.
// 0 is a valid anchor/reference.
uint32 type_alias_reference = 5;
}

Expand All @@ -1152,6 +1158,7 @@ message Expression {

// Optionally points to a type_variation_anchor defined in this plan for
// the returned nested type.
// 0 is a valid anchor/reference.
uint32 type_variation_reference = 2;

oneof nested_type {
Expand Down Expand Up @@ -1189,8 +1196,8 @@ message Expression {
// A scalar function call.
message ScalarFunction {
// Points to a function_anchor defined in this plan, which must refer
// to a scalar function in the associated YAML file. Required; avoid
// using anchor/reference zero.
// to a scalar function in the associated YAML file.
// Required; 0 is a valid anchor/reference.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only one I am not sure about... I find the language here a bit confusing. Is it saying the two separate things:

  1. this field is required, and
  2. you must not use anchor/reference 0
    ?

Or is it saying "it is required not to use anchor/reference 0"?

I think it is the former. In which case there is nothing special about ScalarFunction.function_reference that prevents it from being 0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm that the intent of this is:

  1. It is required that this field be set.
  2. 0 is a valid anchor / reference

uint32 function_reference = 1;

// The arguments to be bound to the function. This must have exactly the
Expand Down Expand Up @@ -1649,6 +1656,7 @@ message DynamicParameter {
Type type = 1;

// The surrogate key used within a plan to reference a specific parameter binding.
// 0 is a valid anchor/reference.
uint32 parameter_reference = 2;
}

Expand All @@ -1658,6 +1666,8 @@ message SortField {

oneof sort_kind {
SortDirection direction = 2;
// References a function_anchor defined in this plan.
// 0 is a valid anchor/reference.
uint32 comparison_function_reference = 3;
}
enum SortDirection {
Expand Down Expand Up @@ -1703,8 +1713,8 @@ enum AggregationPhase {
// An aggregate function.
message AggregateFunction {
// Points to a function_anchor defined in this plan, which must refer
// to an aggregate function in the associated YAML file. Required; 0 is
// considered to be a valid anchor/reference.
// to an aggregate function in the associated YAML file.
// Required; 0 is a valid anchor/reference.
uint32 function_reference = 1;

// The arguments to be bound to the function. This must have exactly the
Expand Down
17 changes: 14 additions & 3 deletions proto/substrait/extensions/extensions.proto
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ message SimpleExtensionURI {
option deprecated = true;
// A surrogate key used in the context of a single plan used to reference the
// URI associated with an extension.
// 0 is a valid anchor/reference.
uint32 extension_uri_anchor = 1;

// The URI where this extension YAML can be retrieved. This is the "namespace"
Expand All @@ -25,6 +26,7 @@ message SimpleExtensionURI {
message SimpleExtensionURN {
// A surrogate key used in the context of a single plan used to reference the
// URN associated with an extension.
// 0 is a valid anchor/reference.
uint32 extension_urn_anchor = 1;

// The extension URN that uniquely identifies this extension. This must follow the
Expand All @@ -45,15 +47,18 @@ message SimpleExtensionDeclaration {
message ExtensionType {
// references the extension_uri_anchor defined for a specific extension URI.
// this is now deprecated and extension_urn_reference should be used instead.
// 0 is a valid anchor/reference.
uint32 extension_uri_reference = 1 [deprecated = true];

// references the extension_urn_anchor defined for a specific extension URN.
// If both extension_urn_reference and extension_uri_reference are present,
// extension_urn_reference takes precedence.
// 0 is a valid anchor/reference.
uint32 extension_urn_reference = 4;

// A surrogate key used in the context of a single plan to reference a
// specific extension type
// specific extension type.
// 0 is a valid anchor/reference.
uint32 type_anchor = 2;

// the name of the type in the defined extension YAML.
Expand All @@ -63,15 +68,18 @@ message SimpleExtensionDeclaration {
message ExtensionTypeVariation {
// references the extension_uri_anchor defined for a specific extension URI.
// this is now deprecated and extension_urn_reference should be used instead.
// 0 is a valid anchor/reference.
uint32 extension_uri_reference = 1 [deprecated = true];

// references the extension_urn_anchor defined for a specific extension URN.
// If both extension_urn_reference and extension_uri_reference are present,
// extension_urn_reference takes precedence.
// 0 is a valid anchor/reference.
uint32 extension_urn_reference = 4;

// A surrogate key used in the context of a single plan to reference a
// specific type variation
// specific type variation.
// 0 is a valid anchor/reference.
uint32 type_variation_anchor = 2;

// the name of the type in the defined extension YAML.
Expand All @@ -81,15 +89,18 @@ message SimpleExtensionDeclaration {
message ExtensionFunction {
// references the extension_uri_anchor defined for a specific extension URI.
// this is now deprecated and extension_urn_reference should be used instead.
// 0 is a valid anchor/reference.
uint32 extension_uri_reference = 1 [deprecated = true];

// references the extension_urn_anchor defined for a specific extension URN.
// If both extension_urn_reference and extension_uri_reference are present,
// extension_urn_reference takes precedence.
// 0 is a valid anchor/reference.
uint32 extension_urn_reference = 4;

// A surrogate key used in the context of a single plan to reference a
// specific function
// specific function.
// 0 is a valid anchor/reference.
uint32 function_anchor = 2;

// A function signature
Expand Down
1 change: 1 addition & 0 deletions proto/substrait/plan.proto
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,7 @@ message Version {
// Represents a binding for a dynamic parameter.
message DynamicParameterBinding {
// The parameter anchor that identifies the dynamic parameter reference.
// 0 is a valid anchor/reference.
uint32 parameter_anchor = 1;

// The literal value assigned to the parameter at runtime.
Expand Down
6 changes: 6 additions & 0 deletions proto/substrait/type.proto
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@ option go_package = "github.com/substrait-io/substrait-protobuf/go/substraitpb";
option java_multiple_files = true;
option java_package = "io.substrait.proto";

// Note: type_variation_reference fields within Type messages reference a
// type_variation_anchor defined in the plan's extension declarations. The value
// 0 represents the system-preferred variation and is a valid reference value.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't sure where a good place to put this was considering type_variation_reference shows up a ton of times.

message Type {
oneof kind {
Boolean bool = 1;
Expand Down Expand Up @@ -224,6 +227,8 @@ message Type {
}

message UserDefined {
// References a type_anchor defined in the plan's extension declarations.
// 0 is a valid anchor/reference.
uint32 type_reference = 1;
uint32 type_variation_reference = 2;
Nullability nullability = 3;
Expand Down Expand Up @@ -257,6 +262,7 @@ message Type {
message TypeAlias {
// A surrogate key used in the context of a single plan to reference a
// specific type alias.
// 0 is a valid anchor/reference.
uint32 type_alias_anchor = 1;

// A concrete type to be aliased.
Expand Down
2 changes: 1 addition & 1 deletion site/docs/extensions/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ This extension URN uses the format `extension:<OWNER>:<ID>`, where:

The YAML file is constructed according to the [YAML Schema](https://github.com/substrait-io/substrait/blob/main/text/simple_extensions_schema.yaml). Each definition in the file corresponds to the YAML-based serialization of the relevant data structure. If a user only wants to extend one of these types of objects (e.g. types), a developer does not have to provide definitions for the other extension points.

A Substrait plan can reference one or more YAML files via their extension URN. In the places where these entities are referenced, they will be referenced using an extension URN + name reference. The name scheme per type works as follows:
A Substrait plan can reference one or more YAML files via their extension URN. In the places where these entities are referenced, they will be referenced using an extension URN + name reference. Each extension entity (type, type variation, or function) is assigned an anchor value, which is a non-negative integer starting from 0. The anchor value 0 is valid and can be used to reference extension entities. The name scheme per type works as follows:

| Category | Naming scheme |
| ------------------ | ------------------------------------------------------------ |
Expand Down
2 changes: 2 additions & 0 deletions site/docs/serialization/binary_serialization.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,8 @@ Extension URNs and declarations are encapsulated in the top level of the plan. E

Once the YAML file extension URN anchor is defined, the anchor will be referenced by zero or more `SimpleExtensionDefinition`s. For each simple extension definition, an anchor is defined for that specific extension entity. This anchor is then referenced to within lower-level primitives (functions, etc.) to reference that specific extension. Message properties are named `*_anchor` where the anchor is defined and `*_reference` when referencing the anchor. For example `function_anchor` and `function_reference`.

Anchor values are non-negative integers starting from 0. A value of 0 is valid and can be used to reference an extension entity.

=== "Simple Extension Declaration"

```proto
Expand Down
1 change: 1 addition & 0 deletions site/docs/types/type_aliases.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ Type aliases allow a plan to declare a type once and reference it multiple times

A type alias is a mapping from an anchor to a concrete Substrait type. A valid type alias is described below.

* Anchors are non-negative integers starting from 0, meaning 0 is a valid anchor value.
* All type parameters must be specified.
* Cannot directly be another alias.
* Type parameters can reference other aliased types as long as no circular dependencies are introduced.
Expand Down
2 changes: 2 additions & 0 deletions site/docs/types/type_classes.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,8 @@ Compound type classes are type classes that need to be configured by means of a

User-defined type classes are defined as part of [simple extensions](../extensions/index.md#simple-extensions). An extension can declare an arbitrary number of user-defined extension types. Once a type has been declared, it can be used in function declarations.

User-defined types are referenced in a plan using a `type_reference` anchor value that corresponds to a `type_anchor` defined in the plan's extension declarations. The anchor value is a non-negative integer starting from 0, meaning 0 is a valid anchor value.

For example, the following declares a type named `point` (namespaced to the associated YAML file) and two scalar functions that operate on it.

```yaml
Expand Down
Loading