diff --git a/site/docs/extensions/index.md b/site/docs/extensions/index.md index 3e307024e..051fabca3 100644 --- a/site/docs/extensions/index.md +++ b/site/docs/extensions/index.md @@ -126,15 +126,88 @@ The `any[\d]` types (i.e. `any1`, `any2`, ..., `any9`) impose an additional rest ## Advanced Extensions -Less common extensions can be extended using customization support at the serialization level. This includes the following kinds of extensions: - -| Extension Type | Description | -| ------------------------------------ | ------------------------------------------------------------ | -| Relation Modification (semantic) | Extensions to an existing relation that will alter the semantics of that relation. These kinds of extensions require that any plan consumer understand the extension to be able to manipulate or execute that operator. Ignoring these extensions will result in an incorrect interpretation of the plan. An example extension might be creating a customized version of Aggregate that can optionally apply a filter before aggregating the data.

Note: Semantic-changing extensions shouldn't change the core characteristics of the underlying relation. For example, they should *not* change the default direct output field ordering, change the number of fields output or change the behavior of physical property characteristics. If one needs to change one of these behaviors, one should define a new relation as described below. | -| Relation Modification (optimization) | Extensions to an existing relation that can improve the efficiency of a plan consumer but don't fundamentally change the behavior of the operation. An example might be an estimated amount of memory the relation is expected to use or a particular algorithmic pattern that is perceived to be optimal. | -| New Relations | Creates an entirely new kind of relation. It is the most flexible way to extend Substrait but also make the Substrait plan the least interoperable. In most cases it is better to use a semantic changing relation as oppposed to a new relation as it means existing code patterns can easily be extended to work with the additional properties. | -| New Read Types | Defines a new subcategory of read that can be used in a ReadRel. One of Substrait is to provide a fairly extensive set of read patterns within the project as opposed to requiring people to define new types externally. As such, we suggest that you first talk with the Substrait community to determine whether you read type can be incorporated directly in the core specification. | -| New Write Types | Similar to a read type but for writes. As with reads, the community recommends that interested extenders first discuss with the community about developing new write types in the community before using the extension mechanisms. | -| Plan Extensions | Semantic and/or optimization based additions at the plan level. | - -Because extension mechanisms are different for each serialization format, please refer to the corresponding serialization sections to understand how these extensions are defined in more detail. +Advanced extensions provide a way to embed custom functionality that goes beyond the standard YAML-based simple extensions. Unlike simple extensions, advanced extensions allow arbitrary, custom schemas. In the Protocol Buffers implementation, the `google.protobuf.Any` type is used to embed arbitrary extension data directly into Substrait messages. + +### How Advanced Extensions Work + +Advanced extensions come in several main forms, discussed below: + +1. Embedded extensions: These use the `AdvancedExtension` message for adding custom data to existing Substrait messages +2. Custom read/write types: For defining new ways to read from or write to data sources +3. Custom relation types: For defining entirely new relational operations + +### Embedded Extensions via `AdvancedExtension` + +The simplest forms of advanced extensions use the `AdvancedExtension` message, which contains two types of extensions: + +```proto +%%% proto.extensions.AdvancedExtension %%% +``` + +!!! note "Enhancements vs Optimizations" + + * Use **optimizations** for performance hints that don't change semantics and can be safely ignored. + * Use **enhancements** for semantic changes that must be understood by consumers or the plan cannot be executed correctly. + +#### Optimizations + +- Provide hints to improve performance but don't change the meaning of operations +- Can be safely ignored by consumers that don't understand them +- Multiple optimizations can be attached to a single message +- Examples: memory usage hints, preferred algorithms, caching strategies + +#### Enhancements + +- Modify the semantic behavior of operations +- Must be understood by consumers or the plan cannot be executed correctly +- Only one enhancement per message +- Examples: specialized join conditions (e.g. fuzzy matching, geospatial) + +!!! note "Enhancement Constraints" + + Semantic-changing extensions shouldn't change the core characteristics of the underlying relation. For example, they should *not* change the default direct output field ordering, change the number of fields output or change the behavior of physical property characteristics. If one needs to change one of these behaviors, one should define a new relation as described below. + +#### Where `AdvancedExtension` Messages Can Be Used + +The `AdvancedExtension` message can be attached to various parts of a Substrait plan: + +| Location | Usage | +| --------------------------------- | ------------------------------------------- | +| **`Plan`** | Global extensions affecting the entire plan | +| **`RelCommon`** | Extensions for any relational operator | +| **Relations** (e.g. `ProjectRel`) | Extensions for a specific relation type | +| **Hints** | Extensions within optimization hints | +| **`ReadRel.NamedTable`** | Custom metadata to named table references | +| **`ReadRel.LocalFiles`** | Custom metadata to local file sources | +| **`WriteRel.NamedObjectWrite`** | Custom metadata to write targets | +| **`DdlRel.NamedObjectWrite`** | Custom metadata to DDL targets | + +### Custom Read and Write Types + +The second form of advanced extensions allows you to define extension data sources and destinations: + +| Extension Type | Description | Examples | +| ------------------------------ | ------------------------------------ | ---------------------------- | +| **`ReadRel.ExtensionTable`** | Define new table source types | APIs, specialized formats | +| **`WriteRel.ExtensionObject`** | Define new write destination types | APIs, specialized formats | +| **`DdlRel.ExtensionObject`** | Define new DDL destination types | Catalogs, schema registries | + +!!! note "Consider Core Specification First" + + Before implementing custom read/write types as extensions, consider [checking with the Substrait community](https://substrait.io/community/#get-in-touch). If your scenario turns out to be common enough, it may be more appropriate to add it directly to the specification rather than as an extension. + +### Custom Relations + +The third form of advanced extensions provides entirely new relational operations via dedicated extension relation types. These allow you to define custom relations while maintaining proper integration with the type system: + +| Relation Type | Description | Examples | +| ---------------------- | ----------------------------------------------- | -------- | +| **`ExtensionLeafRel`** | Custom relations with no inputs | Custom table sources | +| **`ExtensionSingleRel`** | Custom relations with one input | Custom transforms | +| **`ExtensionMultiRel`** | Custom relations with multiple inputs | Custom joins | + +These extension relations are first-class relation types in Substrait and can be used anywhere a standard relation would be used. + +#### When to Use What + +Custom relations are the most flexible but least interoperable option. In most cases it is better to use enhancements to existing relations rather than defining new custom relations, as it means existing code patterns can easily be extended to work with the additional properties. diff --git a/site/docs/relations/physical_relations.md b/site/docs/relations/physical_relations.md index 066442f9b..e32b43f6e 100644 --- a/site/docs/relations/physical_relations.md +++ b/site/docs/relations/physical_relations.md @@ -2,111 +2,103 @@ There is no true distinction between logical and physical operations in Substrait. By convention, certain operations are classified as physical, but all operations can be potentially used in any kind of plan. A particular set of transformations or target operators may (by convention) be considered the "physical plan" but this is a characteristic of the system consuming substrait as opposed to a definition within Substrait. - - ## Hash Equijoin Operator The hash equijoin join operator will build a hash table out of one input (default `right`) based on a set of join keys. It will then probe that hash table for the other input (default `left`), finding matches. -| Signature | Value | -| -------------------- | ------------------------------------------------------------ | -| Inputs | 2 | -| Outputs | 1 | -| Property Maintenance | Distribution is maintained. Orderedness is eliminated. | -| Input Order | Same as the [Join](logical_relations.md#join-operation) operator. | +| Signature | Value | +| -------------------- | ----------------------------------------------------------------- | +| Inputs | 2 | +| Outputs | 1 | +| Property Maintenance | Distribution is maintained. Orderedness is eliminated. | +| Input Order | Same as the [Join](logical_relations.md#join-operation) operator. | | Direct Output Order | Same as the [Join](logical_relations.md#join-operation) operator. | - ### Hash Equijoin Properties -| Property | Description | Required | -|---------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------| -| Left Input | A relational input. | Required | -| Right Input | A relational input. | Required | -| Build Input | Specifies which input is the `Build`. | Optional, defaults to build `Right`, probe `Left`. | -| Left Keys | References to the fields to join on in the left input. | Required | -| Right Keys | References to the fields to join on in the right input. | Required | -| Post Join Predicate | An additional expression that can be used to reduce the output of the join operation post the equality condition. Minimizes the overhead of secondary join conditions that cannot be evaluated using the equijoin keys. | Optional, defaults true. | -| Join Type | One of the join types defined in the Join operator. | Required | - +| Property | Description | Required | +| ------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------- | +| Left Input | A relational input. | Required | +| Right Input | A relational input. | Required | +| Build Input | Specifies which input is the `Build`. | Optional, defaults to build `Right`, probe `Left`. | +| Left Keys | References to the fields to join on in the left input. | Required | +| Right Keys | References to the fields to join on in the right input. | Required | +| Post Join Predicate | An additional expression that can be used to reduce the output of the join operation post the equality condition. Minimizes the overhead of secondary join conditions that cannot be evaluated using the equijoin keys. | Optional, defaults true. | +| Join Type | One of the join types defined in the Join operator. | Required | ## NLJ (Nested Loop Join) Operator The nested loop join operator does a join by holding the entire right input and then iterating over it using the left input, evaluating the join expression on the Cartesian product of all rows, only outputting rows where the expression is true. Will also include non-matching rows in the OUTER, LEFT and RIGHT operations per the join type requirements. -| Signature | Value | -| -------------------- | ------------------------------------------------------------ | -| Inputs | 2 | -| Outputs | 1 | -| Property Maintenance | Distribution is maintained. Orderedness is eliminated. | -| Input Order | Same as the [Join](logical_relations.md#join-operation) operator. | +| Signature | Value | +| -------------------- | ----------------------------------------------------------------- | +| Inputs | 2 | +| Outputs | 1 | +| Property Maintenance | Distribution is maintained. Orderedness is eliminated. | +| Input Order | Same as the [Join](logical_relations.md#join-operation) operator. | | Direct Output Order | Same as the [Join](logical_relations.md#join-operation) operator. | ### NLJ Properties -| Property | Description | Required | -| --------------- | ------------------------------------------------------------ | ---------------------------------------------- | -| Left Input | A relational input. | Required | -| Right Input | A relational input. | Required | +| Property | Description | Required | +| --------------- | --------------------------------------------------------------------------------------------------------------- | ---------------------------------------------- | +| Left Input | A relational input. | Required | +| Right Input | A relational input. | Required | | Join Expression | A boolean condition that describes whether each record from the left set "match" the record from the right set. | Optional. Defaults to true (a Cartesian join). | -| Join Type | One of the join types defined in the Join operator. | Required | - - +| Join Type | One of the join types defined in the Join operator. | Required | ## Merge Equijoin Operator The merge equijoin does a join by taking advantage of two sets that are sorted on the join keys. This allows the join operation to be done in a streaming fashion. -| Signature | Value | -| -------------------- | ------------------------------------------------------------ | -| Inputs | 2 | -| Outputs | 1 | -| Property Maintenance | Distribution is maintained. Orderedness is eliminated. | -| Input Order | Same as the [Join](logical_relations.md#join-operation) operator. | +| Signature | Value | +| -------------------- | ----------------------------------------------------------------- | +| Inputs | 2 | +| Outputs | 1 | +| Property Maintenance | Distribution is maintained. Orderedness is eliminated. | +| Input Order | Same as the [Join](logical_relations.md#join-operation) operator. | | Direct Output Order | Same as the [Join](logical_relations.md#join-operation) operator. | ### Merge Join Properties -| Property | Description | Required | -|---------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------| -| Left Input | A relational input. | Required | -| Right Input | A relational input. | Required | -| Left Keys | References to the fields to join on in the left input. | Required | -| Right Keys | References to the fields to join on in the right input. | Reauired | -| Post Join Predicate | An additional expression that can be used to reduce the output of the join operation post the equality condition. Minimizes the overhead of secondary join conditions that cannot be evaluated using the equijoin keys. | Optional, defaults true. | -| Join Type | One of the join types defined in the Join operator. | Required | +| Property | Description | Required | +| ------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------ | +| Left Input | A relational input. | Required | +| Right Input | A relational input. | Required | +| Left Keys | References to the fields to join on in the left input. | Required | +| Right Keys | References to the fields to join on in the right input. | Required | +| Post Join Predicate | An additional expression that can be used to reduce the output of the join operation post the equality condition. Minimizes the overhead of secondary join conditions that cannot be evaluated using the equijoin keys. | Optional, defaults true. | +| Join Type | One of the join types defined in the Join operator. | Required | ## Exchange Operator The exchange operator will redistribute data based on an exchange type definition. Applying this operation will lead to an output that presents the desired distribution. -| Signature | Value | -| -------------------- | ------------------------------------------------------------ | -| Inputs | 1 | -| Outputs | 1 | +| Signature | Value | +| -------------------- | ------------------------------------------------------------------------------ | +| Inputs | 1 | +| Outputs | 1 | | Property Maintenance | Orderedness is maintained. Distribution is overwritten based on configuration. | -| Direct Output Order | Order of the input. | +| Direct Output Order | Order of the input. | ### Exchange Types -|Type|Description| -|--|--| -|Scatter|Distribute data using a system defined hashing function that considers one or more fields. For the same type of fields and same ordering of values, the same partition target should be identified for different ExchangeRels| -|Single Bucket|Define an expression that provides a single `i32` bucket number. Optionally define whether the expression will only return values within the valid number of partition counts. If not, the system should modulo the return value to determine a target partition.| -|Multi Bucket|Define an expression that provides a `List` of bucket numbers. Optionally define whether the expression will only return values within the valid number of partition counts. If not, the system should modulo the return value to determine a target partition. The records should be sent to all bucket numbers provided by the expression.| -|Broadcast|Send all records to all partitions.| -|Round Robin|Send records to each target in sequence. Can follow either exact or approximate behavior. Approximate will attempt to balance the number of records sent to each destination but may not exactly distribute evenly and may send batches of records to each target before moving to the next.| +| Type | Description | +| ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| Scatter | Distribute data using a system defined hashing function that considers one or more fields. For the same type of fields and same ordering of values, the same partition target should be identified for different ExchangeRels | +| Single Bucket | Define an expression that provides a single `i32` bucket number. Optionally define whether the expression will only return values within the valid number of partition counts. If not, the system should modulo the return value to determine a target partition. | +| Multi Bucket | Define an expression that provides a `List` of bucket numbers. Optionally define whether the expression will only return values within the valid number of partition counts. If not, the system should modulo the return value to determine a target partition. The records should be sent to all bucket numbers provided by the expression. | +| Broadcast | Send all records to all partitions. | +| Round Robin | Send records to each target in sequence. Can follow either exact or approximate behavior. Approximate will attempt to balance the number of records sent to each destination but may not exactly distribute evenly and may send batches of records to each target before moving to the next. | ### Exchange Properties -| Property | Description | Required | -| ------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | -| Input | The relational input. | Required. | -| Distribution Type | One of the distribution types defined above. | Required. | -| Partition Count | The number of partitions targeted for output. | Optional. If not defined, implementation system should decide the number of partitions. Note that when not defined, single or multi bucket expressions should not be constrained to count. | -| Expression Mapping | Describes a relationship between each partition ID and the destination that partition should be sent to. | Optional. A partition may be sent to 0..N locations. Value can either be a URI or arbitrary value. | - - +| Property | Description | Required | +| ------------------ | -------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | +| Input | The relational input. | Required. | +| Distribution Type | One of the distribution types defined above. | Required. | +| Partition Count | The number of partitions targeted for output. | Optional. If not defined, implementation system should decide the number of partitions. Note that when not defined, single or multi bucket expressions should not be constrained to count. | +| Expression Mapping | Describes a relationship between each partition ID and the destination that partition should be sent to. | Optional. A partition may be sent to 0..N locations. Value can either be a URI or arbitrary value. | ## Merging Capture @@ -121,22 +113,20 @@ A receiving operation that will merge multiple ordered streams to maintain order ### Merging Capture Properties -| Property | Description | Required | -| -------- | ------------------------------------------------------------ | --------------------------- | +| Property | Description | Required | +| -------- | ------------------------------------------------------------------------------------------------------------------------------- | --------------------------- | | Blocking | Whether the merging should block incoming data. Blocking should be used carefully, based on whether a deadlock can be produced. | Optional, defaults to false | - - ## Simple Capture A receiving operation that will merge multiple streams in an arbitrary order. -| Signature | Value | -| -------------------- | ------------------------------------------------------------ | -| Inputs | 1 | -| Outputs | 1 | +| Signature | Value | +| -------------------- | --------------------------------------------------------------------- | +| Inputs | 1 | +| Outputs | 1 | | Property Maintenance | Orderness is empty after this operation. Distribution are maintained. | -| Direct Output Order | Order of the input. | +| Direct Output Order | Order of the input. | ### Naive Capture Properties @@ -144,77 +134,72 @@ A receiving operation that will merge multiple streams in an arbitrary order. | -------- | --------------------- | -------- | | Input | The relational input. | Required | - - ## Top-N Operation The top-N operator reorders a dataset based on one or more identified sort fields as well as a sorting function. Rather than sort the entire dataset, the top-N will only maintain the total number of records required to ensure a limited output. A top-n is a combination of a logical sort and logical fetch operations. -| Signature | Value | -| -------------------- | ------------------------------------------------------------ | -| Inputs | 1 | -| Outputs | 1 | +| Signature | Value | +| -------------------- | ------------------------------------------------------------------------------------------------------------------------ | +| Inputs | 1 | +| Outputs | 1 | | Property Maintenance | Will update orderedness property to the output of the sort operation. Distribution property only remapped based on emit. | -| Direct Output Order | The field order of the input. | +| Direct Output Order | The field order of the input. | ### Top-N Properties -| Property | Description | Required | -| ----------- | ------------------------------------------------------------ | ------------------------ | -| Input | The relational input. | Required | +| Property | Description | Required | +| ----------- | --------------------------------------------------------------------------------------------------------------------- | ------------------------ | +| Input | The relational input. | Required | | Sort Fields | List of one or more fields to sort by. Uses the same properties as the [orderedness](basics.md#orderedness) property. | One sort field required | -| Offset | A positive integer. Declares the offset for retrieval of records. | Optional, defaults to 0. | -| Count | A positive integer. Declares the number of records that should be returned. | Required | - - +| Offset | A positive integer. Declares the offset for retrieval of records. | Optional, defaults to 0. | +| Count | A positive integer. Declares the number of records that should be returned. | Required | ## Hash Aggregate Operation The hash aggregate operation maintains a hash table for each grouping set to coalesce equivalent tuples. -| Signature | Value | -| -------------------- | ------------------------------------------------------------ | -| Inputs | 1 | -| Outputs | 1 | +| Signature | Value | +| -------------------- | --------------------------------------------------------------------------------------------------------------- | +| Inputs | 1 | +| Outputs | 1 | | Property Maintenance | Maintains distribution if all distribution fields are contained in every grouping set. No orderness guaranteed. | -| Direct Output Order | Same as defined by [Aggregate](logical_relations.md#aggregate-operation) operation. | +| Direct Output Order | Same as defined by [Aggregate](logical_relations.md#aggregate-operation) operation. | ### Hash Aggregate Properties -| Property | Description | Required | -| ---------------- | ------------------------------------------------------------ | --------------------------------------- | -| Input | The relational input. | Required | -| Grouping Sets | One or more grouping sets. | Optional, required if no measures. | -| Per Grouping Set | A list of expression grouping that the aggregation measured should be calculated for. | Optional, defaults to 0. | +| Property | Description | Required | +| ---------------- | ------------------------------------------------------------------------------------------------------------------- | --------------------------------------- | +| Input | The relational input. | Required | +| Grouping Sets | One or more grouping sets. | Optional, required if no measures. | +| Per Grouping Set | A list of expression grouping that the aggregation measured should be calculated for. | Optional, defaults to 0. | | Measures | A list of one or more aggregate expressions. Implementations may or may not support aggregate ordering expressions. | Optional, required if no grouping sets. | - - ## Streaming Aggregate Operation The streaming aggregate operation leverages data ordered by the grouping expressions to calculate data each grouping set tuple-by-tuple in streaming fashion. All grouping sets and orderings requested on each aggregate must be compatible to allow multiple grouping sets or aggregate orderings. -| Signature | Value | -| -------------------- | ------------------------------------------------------------ | -| Inputs | 1 | -| Outputs | 1 | +| Signature | Value | +| -------------------- | ---------------------------------------------------------------------------------------------------------------- | +| Inputs | 1 | +| Outputs | 1 | | Property Maintenance | Maintains distribution if all distribution fields are contained in every grouping set. Maintains input ordering. | -| Direct Output Order | Same as defined by [Aggregate](logical_relations.md#aggregate-operation) operation. | +| Direct Output Order | Same as defined by [Aggregate](logical_relations.md#aggregate-operation) operation. | ### Streaming Aggregate Properties -| Property | Description | Required | -| ---------------- | ------------------------------------------------------------ | --------------------------------------- | -| Input | The relational input. | Required | -| Grouping Sets | One or more grouping sets. If multiple grouping sets are declared, sets must all be compatible with the input sortedness. | Optional, required if no measures. | -| Per Grouping Set | A list of expression grouping that the aggregation measured should be calculated for. | Optional, defaults to 0. | +| Property | Description | Required | +| ---------------- | ----------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------- | +| Input | The relational input. | Required | +| Grouping Sets | One or more grouping sets. If multiple grouping sets are declared, sets must all be compatible with the input sortedness. | Optional, required if no measures. | +| Per Grouping Set | A list of expression grouping that the aggregation measured should be calculated for. | Optional, defaults to 0. | | Measures | A list of one or more aggregate expressions. Aggregate expressions ordering requirements must be compatible with expected ordering. | Optional, required if no grouping sets. | ## Consistent Partition Window Operation + A consistent partition window operation is a special type of project operation where every function is a window function and all of the window functions share the same sorting and partitioning. This allows for the sort and partition to be calculated once and shared between the various function evaluations. | Signature | Value | -| -------------------- |----------------------------------------------------------------------| +| -------------------- | -------------------------------------------------------------------- | | Inputs | 1 | | Outputs | 1 | | Property Maintenance | Maintains distribution and ordering. | @@ -222,47 +207,46 @@ A consistent partition window operation is a special type of project operation w ### Window Properties -| Property | Description | Required | -| ------------------ | ------------------------------- | ---------------------- | -| Input | The relational input. | Required | +| Property | Description | Required | +| ---------------- | ----------------------------- | ---------------------- | +| Input | The relational input. | Required | | Window Functions | One or more window functions. | At least one required. | - ## Expand Operation -The expand operation creates duplicates of input records based on the Expand Fields. Each Expand Field can be a Switching Field or an expression. Switching Fields are described below. If an Expand Field is an expression then its value is consistent across all duplicate rows. +The expand operation creates duplicates of input records based on the Expand Fields. Each Expand Field can be a Switching Field or an expression. Switching Fields are described below. If an Expand Field is an expression then its value is consistent across all duplicate rows. -| Signature | Value | -| -------------------- |--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| Inputs | 1 | -| Outputs | 1 | -| Property Maintenance | Distribution is maintained if all the distribution fields are consistent fields with direct references. Ordering can only be maintained down to the level of consistent fields that are kept.| -| Direct Output Order | The expand fields followed by an i32 column describing the index of the duplicate that the row is derived from. | +| Signature | Value | +| -------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| Inputs | 1 | +| Outputs | 1 | +| Property Maintenance | Distribution is maintained if all the distribution fields are consistent fields with direct references. Ordering can only be maintained down to the level of consistent fields that are kept. | +| Direct Output Order | The expand fields followed by an i32 column describing the index of the duplicate that the row is derived from. | ### Expand Properties -| Property | Description | Required | -| --------- |--------------------------------------| -------- | -| Input | The relational input. | Required | -| Direct Fields | Expressions describing the output fields. These refer to the schema of the input. Each Direct Field must be an expression or a Switching Field | Required | +| Property | Description | Required | +| ------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- | -------- | +| Input | The relational input. | Required | +| Direct Fields | Expressions describing the output fields. These refer to the schema of the input. Each Direct Field must be an expression or a Switching Field | Required | ### Switching Field Properties -A switching field is a field whose value is different in each duplicated row. All switching fields in an Expand Operation must have the same number of duplicates. +A switching field is a field whose value is different in each duplicated row. All switching fields in an Expand Operation must have the same number of duplicates. -| Property | Description | Required | -| --------- |--------------------------------------| -------- | -| Duplicates | List of one or more expressions. The output will contain a row for each expression. | Required | +| Property | Description | Required | +| ---------- | ----------------------------------------------------------------------------------- | -------- | +| Duplicates | List of one or more expressions. The output will contain a row for each expression. | Required | ## Hashing Window Operation A window aggregate operation that will build hash tables for each distinct partition expression. -| Signature | Value | -| -------------------- | ------------------------------------------------------------ | -| Inputs | 1 | -| Outputs | 1 | -| Property Maintenance | Maintains distribution. Eliminates ordering. | +| Signature | Value | +| -------------------- | -------------------------------------------------------------------- | +| Inputs | 1 | +| Outputs | 1 | +| Property Maintenance | Maintains distribution. Eliminates ordering. | | Direct Output Order | Same as Project operator (input followed by each window expression). | ### Hashing Window Properties @@ -272,23 +256,20 @@ A window aggregate operation that will build hash tables for each distinct parti | Input | The relational input. | Required | | Window Expressions | One or more window expressions. | At least one required. | - - ## Streaming Window Operation A window aggregate operation that relies on a partition/ordering sorted input. -| Signature | Value | -| -------------------- | ------------------------------------------------------------ | -| Inputs | 1 | -| Outputs | 1 | -| Property Maintenance | Maintains distribution. Eliminates ordering. | +| Signature | Value | +| -------------------- | -------------------------------------------------------------------- | +| Inputs | 1 | +| Outputs | 1 | +| Property Maintenance | Maintains distribution. Eliminates ordering. | | Direct Output Order | Same as Project operator (input followed by each window expression). | ### Streaming Window Properties -| Property | Description | Required | -| ------------------ | ------------------------------------------------------------ | ---------------------- | -| Input | The relational input. | Required | +| Property | Description | Required | +| ------------------ | --------------------------------------------------------------------------------- | ---------------------- | +| Input | The relational input. | Required | | Window Expressions | One or more window expressions. Must be supported by the sortedness of the input. | At least one required. | -