-
Notifications
You must be signed in to change notification settings - Fork 380
Update dependency io.openlineage:openlineage-java to v1.40.1 #3002
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
renovate
wants to merge
1
commit into
main
Choose a base branch
from
renovate/openlineageversion
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
❌ Deploy Preview for peppy-sprite-186812 failed.
|
d415db2 to
a641137
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3002 +/- ##
=========================================
Coverage 81.18% 81.18%
Complexity 1506 1506
=========================================
Files 268 268
Lines 7356 7356
Branches 325 325
=========================================
Hits 5972 5972
Misses 1226 1226
Partials 158 158 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
403306b to
96a3b7b
Compare
dc448cf to
ebaf22c
Compare
ebaf22c to
a164571
Compare
a164571 to
f11be42
Compare
432f8d4 to
9d7550b
Compare
43064c3 to
7c19900
Compare
7c19900 to
828aa83
Compare
828aa83 to
269418c
Compare
269418c to
b269206
Compare
b269206 to
42f8ad2
Compare
42f8ad2 to
ca5d001
Compare
ca5d001 to
dcce969
Compare
dcce969 to
d45585c
Compare
d45585c to
d31b116
Compare
d31b116 to
bec3771
Compare
bec3771 to
cfc84da
Compare
cfc84da to
f9f7c18
Compare
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
f9f7c18 to
3b1b2ea
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
1.23.0->1.40.1Release Notes
OpenLineage/OpenLineage (io.openlineage:openlineage-java)
v1.40.1Compare Source
Fixed
#4135@mobuchowskiFixes breaking change in version 1.40.0.
v1.40.0Compare Source
Added
#4109@jakub-moravecAdd a standardized batch API endpoint to OpenLineage specification for handling multiple events in a single request.
#4116@mobuchowskiAdd ordinal_position field to track the position of fields in schema (1-indexed).
#4112@kacpermudaIntroduce JobDependenciesRunFacet to track dependencies between jobs.
#4103@jakub-moravecAdd support for temporary datasets to enable job-to-job lineage tracking.
#4075@luke-hoffman1Add fallback configuration for BigQuery project ID in Metastore integration.
#4123@kacpermudaInclude examples in Python generated classes for better documentation.
#4077@dolfinusAdd support for parsing jTDS JDBC URL format in Java client.
#4066@tnazarewAdd ParentRunFacet to Hive integration for tracking parent-child run relationships.
#4097@tnazarewAdd support for tracking LOAD and IMPORT operations in Hive.
#4085@tnazarewAdd support for tracking EXPORT operations in Hive.
#4079@tnazarewAdd START event emission support to Hive integration.
Fixed
#4121@usamakunwarFix Spark dataset facet builders for input datasets.
#4114@kchledowskiFix job name trimming logic in Spark integration.
#4113@pawel-big-lebowskiFix putAll operation failing on immutable maps.
#4108@pawel-big-lebowskiFix multiple issues with RDD job handling in Spark.
#4102@kchledowskiFix JDBC dbtable parsing to support any FROM clauses.
#4083@pawel-big-lebowskiFix Spark connector configuration for Databricks environments.
#4099@mobuchowskiCatch NoClassDefFoundError when buggy implementations exist on classpath.
#4104@mobuchowskiFix Snowflake identifier parsing to handle quoted identifiers correctly.
#4105@mobuchowskiStrip quotes from Snowflake account names for proper handling.
#4092@fm100Fix facet property names from snake_case to camelCase for consistency.
#4111@kacpermudaFix Python client facet generator after moving to UV build system.
#4093@antonlin1Fix retry configuration default merge with user-defined config in HTTP transports.
#4084@mandalbalmukundUpgrade commons-lang3 version to fix CVE security vulnerability.
#4126@dolfinusEnsure START and STOP events share the same runId in Hive integration.
v1.39.0Compare Source
Added
#3996@pawel-big-lebowskiAdd configurable dataset name normalization with support for date patterns, key-value pairs, and S3 location detection to enable proper dataset subsetting.
#4057@kchledowskiAdd missing input symlink facets for Databricks Unity Catalog tables.
Changed
#4058@kchledowskiRefactor column-level lineage dependency collector tests for better organization and maintainability.
Fixed
#4069@fm100Fix typo in IcebergCommitReportOutputDatasetFacet property name.
#4061@pawel-big-lebowskiFix dataset name trimming for column-level lineage inputs.
#4062@kacpermudaRemove unnecessary numpy import from Python client.
Removed
#3844@kacpermudaRemove Dagster integration from the repository.
v1.38.0Compare Source
Added
#4008@pawel-big-lebowskiAdd subset dataset facets to OpenLineage specification for representing dataset relationships.
#3978@heron--Allow attaching dataset quality information outside of InputDatasetFacet.
#4018@tnazarewAdd support for Spark structured streaming microbatch source write operations.
#4016@ddebowczyk92Add catalog properties support to Spark integration for better catalog metadata tracking.
#4039@ddebowczyk92Enhance BigQuery integration with GCP project ID and location in catalog properties.
#3972@kchledowskiAdd support for tracking COALESCE transformations in Spark jobs.
#3982@ddebowczyk92Add catalog facet support for vanilla Hive table operations.
#4013@pawel-big-lebowskiOutput statistics now available in complete events for better observability.
#3977@pawel-big-lebowskiAdd output statistics tracking for Spark RDD-based jobs.
#4050@pawel-big-lebowskiImprove generated model classes with proper equals and hashcode implementations.
#4022@mobuchowskiAdd support for capturing dbt tags in OpenLineage events.
#4017@mobuchowskiAdd dbt Cloud account ID tracking to dbt run facets.
#3987@mobuchowskiEnhance DbtRunRunFacet with additional metadata for better observability.
#4006@ddebowczyk92Add native Google Cloud Platform Lineage transport for Python client.
#3983@JDarDagranAdd fsspec filesystem support to FileTransport for broader filesystem compatibility.
#3980@kacpermudaAutomatically add OpenLineage client version as default tag in events.
#3986@gabrysiaolszAdd GCP Cloud Composer environment metadata facets to Airflow integration.
Changed
#4055@mobuchowskiUse dbt model aliases when generating dataset names for more accurate lineage.
#4029@EugeneYushinSerialize OpenLineage events to JSON format for improved debug logging.
#4030@EugeneYushinProperly respect user-overridden application names in event emission.
#4003@kchledowskiRefactor column-level lineage expression dependency collector for better maintainability.
#3994@JDarDagranEnhance logging for Iceberg input statistics collection.
#3985@pawel-big-lebowskiOptimize S3 operations by limiting external getFileStatus calls for large object sets.
#3964@kchledowskiRefactor TransformationInfo into shared Java client for cross-integration reuse.
#4026@dolfinusEnhance logging capabilities in asynchronous HTTP transport.
#4000@JDarDagranSupport Python type aliases in client code generation.
#3997@JDarDagranImprove code generation to properly handle nearly identical class definitions.
#4014@dolfinusFail fast with clear errors when custom token providers fail to load.
#4015@dolfinusImprove error visibility by not silencing import errors in transport factory.
#3968@kacpermudaUpdate import paths to use versioned facet and event modules.
#4012@JDarDagranImprove thread pool management in Java client utilities.
#3965@JDarDagranMigrate from pre-commit to prek for pre-commit hook management.
Fixed
#4053@jsjasonsebaFix incorrect Glue catalog detection due to always attempting ARN resolution.
#4052@kchledowskiFix column-level lineage failures on Spark runtimes without spark-hive package.
#4031@kchledowskiFix missing input datasets and column-level lineage for CreateDataSourceTableAsSelect and CreateHiveTableAsSelect commands.
#4044@EugeneYushinFix BigQuery intermediate job filtering by using bucket configuration.
#4002@MaciejGajewskiAdd additional exception handling for TypeNotPresentException in Spark 3.0.2.
#4034@JDarDagranCorrect license field specification in Python package metadata.
#4045@kacpermudaSupport both naming conventions for API key configuration parameter.
#4037@EugeneYushinFix build issue causing empty sources JAR files to be generated.
v1.37.0Compare Source
Added
#3950@mobuchowskiAdd Datadog transport with intelligent routing between sync/async transports based on configurable rules. Supports wildcard matching and provides seamless integration with Datadog's observability platform.
#3860@orthoxeroxAdd support for WriteDelta and WriteIcebergDelta logical plan nodes in Spark integration.
#3933@mobuchowskiAdd configuration option to override dbt job names in OpenLineage events.
#3923@kyungryunImprove JSON serialization performance with Jackson Blackbird module.
Changed
#3904@pawel-big-lebowskiDrop support for Spark 2.x versions. Minimum supported version is now Spark 3.x.
#3956@dolfinusOptimize HTTP transport performance by adjusting gzip compression level.
#3925@SalvadorRomoExtend streaming integration tests to support Spark 4.0.
Fixed
#3946@pawel-big-lebowskiLimit memory consumption, provide limits for the amount of dependencies processed (1M) and input fields returned in the facet (100K). Turns on dataset lineage by default.
#3949@ddebowczyk92Add limits to prevent performance issues with large schemas in column-level lineage processing.
#3934@pawel-big-lebowskiFix context factory implementation for Spark 4.0 compatibility.
#3930@yunchipangFix LogicalRelation constructor to maintain compatibility with Spark 4.0.
#3947@ddebowczyk92Fix parsing of vendor configurations in Spark OpenLineage configuration.
#3953@jroachgolf84Fix namespace handling in dbt external query facets.
#3943@JDarDagranFix configuration handling for user-supplied tags in Python client.
v1.36.0Compare Source
Added
#3877@pawel-big-lebowskiFix failing tests for Spark 4.0. Make delta integration tests pass with Delta 4.0 on Spark 4.
#3914@pawel-big-lebowskiExtend DebugFacet with additional information on Spark's driver memory configuration and current memory usage.
#3921@pawel-big-lebowskiAdd support for AlterTableCommand dataset building in Spark 4.0.
#3890@jroachgolf84Add query ID tracking to dbt integration.
#3918@mobuchowskiCapture query IDs from dbt structured logs for better traceability.
#3816@ddebowczyk92Formalize dataset naming conventions in Python client implementation.
Changed
#3907@pawel-big-lebowskiBump tested Spark versions.
#3851@dolfinusEnsure proper cleanup of OpenLineageClient when Spark application ends.
#3895@dolfinusReplace f-string usage in logging calls with proper logging formatting.
#3899@ShadiUpdate protobuf dependency to maintain compatibility with newer library versions.
#3869@mobuchowskiAdd documentation explaining compatibility testing processes.
Fixed
#3902@pawel-big-lebowskiMerge
SqlExecutionRDDVisitorandLogicalRDDVisitorclasses to avoid memory leak.#3909@pawel-big-lebowskiRefactor Iceberg handler implementation for better maintainability.
#3908@pawel-big-lebowskiAdd retry logic for handling empty row exceptions.
#3911@pawel-big-lebowskiFix Spark version configuration in Databricks test environment.
#3915@fettaFix kafka-upsert connector to properly identify kafka topics.
#3905@kacpermudaImprove test performance by implementing fail-fast behavior and reduced timeouts.
#3916@mobuchowskiImprove telemetry collection and fix performance issues with file reading.
#3894@mobuchowskiFix dbt version compatibility issues.
#3889@pawel-big-lebowskiFix filename handling to work correctly on Windows systems.
#3897@dolfinusAdjust logging level for transport aliasing messages.
#3901@kacpermudaImprove code documentation and add additional test coverage.
v1.35.0Compare Source
Added
#3848@dolfinusAdd spark_applicationDetails facet to all OpenLineage events emitted by the Spark integration
#3850@ddebowczyk92Adds support for additional facets in Spark integration
#3880@pawel-big-lebowskiAdd
spark.openlineage.disabledentry to disable OpenLineage integration through Spark config parameters#3779@pawel-big-lebowskiAdd extra timeout options to emit incomplete OpenLineage events in case of timeout when building facets. See
buildDatasetsTimePercentageandfacetsBuildingTimePercentagein docs for more details#3812@mobuchowskiAdds high-performance asynchronous HTTP transport with event ordering guarantees, configurable concurrency, and comprehensive error handling. Features START-before-completion event ordering, bounded queues, and real-time statistics
#3764@dolfinusAdds DbtRun facet for tracking dbt run information
#3829@kacpermudaAdds configuration options for CompositeTransport to control behavior and ordering
#3789@dolfinusAdds jobType facet to Hive integration
#3863@dolfinusAdds dialect field to SqlJobFacet for Hive integration
#3819@mobuchowskiAdds dialect field to SqlJobFacet specification
#3826@ddebowczyk92Formalizes job naming conventions in the specification
#3775@ddebowczyk92Formalizes dataset naming conventions in the specification
Changed
#3858@dolfinusUpdates Spark integration to use Hive as the default catalog implementation for Iceberg tables
#3856@ddebowczyk92Improves memory management in Spark integration by replacing weak hash-map implementation
#3811@pawel-big-lebowskiUpdates Spark integration to support the latest Databricks runtime
#3881@dolfinusRemoves wait_for_completion() method from Python transport interface
#3843@dolfinusImproves performance by reusing HTTP sessions in synchronous transport
#3857@dolfinusAdds proper cleanup methods for Datazone and Kinesis transports
#3855@dolfinusAdds cleanup method for TransformTransport
#3838@dolfinusAdds proper cleanup and completion methods for Kafka transport
#3841@dolfinusImproves reliability of CompositeTransport cleanup process
#3839@dolfinusAdds test coverage for OpenLineageClient cleanup methods
#3817@mobuchowskiAdds meaningful names to threads used in Java client for better debugging
#3854@dolfinusEnsures proper cleanup of OpenLineageClient in Flink 1.x integration
#3799@pan-siekierskiFixes configuration loading issue in Flink Event Emitter
#3796@dolfinusMakes invocation_id field optional in dbt integration
#3836@mobuchowskiImproves error handling for missing dbt nodes
#3800@dolfinusAdds docker-compose setup for local Hive integration testing
#3849@dolfinusEnsures all pending events are sent after DAG completion in Airflow integration
Fixed
#3793@mobuchowskiImproves log file handling in dbt integration
#3859@kacpermudaReplaces deprecated dbt configurations with current alternatives
.dbsuffix in database/namespace location name for BigQueryMetastoreCatalog#3874@ddebowczyk92Fixes database naming issue in BigQuery Metastore catalog implementation
#3835@ddebowczyk92Fixes missing catalog facet in output datasets for CTAS queries on Iceberg tables
#3871@pawel-big-lebowskiFixes Delta merge operation handling with column-level lineage
#3861@pawel-big-lebowskiAdds support for Delta Lake version 3.3.2
#3832@pawel-big-lebowskiFixes version handling when dataset can be properly identified
#3853@kacpermudaAdds error handling when CompositeTransport fails to emit any events
#3825@mobuchowskiFixes IntelliJ project reload issues
#3806@mobuchowskiFixes code formatting issues in Hive integration
#3830@mobuchowskiFixes missing field in SQL facet test case
#3887@kacpermudaFixes logging issue when using CompositeTransport with multiple transports
#3814@mobuchowskiFixes formatting warnings in Rust code
v1.34.0Compare Source
Added
#3555@tnazarew with @ddebowczyk92, @jphalipAdded OpenLineage Hive integration
#3691@pawel-big-lebowskiSupport lineage extraction from
UnionRddandNewHadoopRDD, which makes dynamic frames docker based test passing.#3781@dolfinusAdds hive_query facet for Hive integration
#3777@dolfinusAdds job sql facet for Hive integration
#3786@dolfinusAdds hive session facet for Hive integration
#3717@tnazarewAdd new symlink type representing physical location of dataset
#3715@pawel-big-lebowskiAutomatically turn on debug facet in case of spark connector anomalies detected
#3760@ddebowczyk92Adds support for BigQuery Metastore catalog in Spark integration
#3738@dolfinusAdds DbtRun facet for tracking dbt run information
#3739@dolfinusAdds initial support for ClickHouse in dbt integration
#3725@dolfinusAdds processing_engine facet for dbt integration
#3744@dolfinusAdds facet containing Flink job ID information
#3726@dolfinusAdds processing_engine facet for Flink integration
#3763@pawel-big-lebowskiAdds column-level lineage support for JDBC queries for Spark with single input table
#3748@dolfinusAdds contentType field to documentation facet specification
Changed
#3669@kacpermudaDrops support for Airflow versions below 2.5.0
#3731@dolfinusUses adapter's rows_affected for output statistics instead of custom calculation
#3713@dolfinusRefactors dbt facets organization by moving them from processor module
#3754@dolfinusImproves performance of UUID generation in Java client
#3709@dolfinusIncreases randomness in static UUID generation
#3766@mvitaleAdds logging when YAML configuration loading fails
#3751@ddebowczyk92Updates Spark 4 dependency to final 4.0.0 release
#3785@ddebowczyk92Disables generation of module metadata files in Spark integration
#3776@kacpermudaModernizes Python code to use newer attrs API
#3680@mobuchowskiRemoves the native proxy implementation
Fixed
#3773@dolfinusFixes issue where table path was missing in InsertIntoHadoopFsRelationCommand
#3722@pawel-big-lebowskiFilters out temporary inner jobs in BigQuery indirect mode
#3749@mobuchowskiPrevents errors when job completion occurs without corresponding start event
#3724@dolfinusImproves error visibility for OpenLineage configuration parsing in Flink
#3728@JDarDagranEnsures original events remain immutable during transport transformations
#3762@ngorchakovaCorrects visibility modifier for GCP transport configuration mode
v1.33.0Compare Source
Added
#3697@kacpermudaIntroduces the TransformTransport class for event transformations.
#3659@mobuchowskiAdds specification of the CatalogDatasetFacet.
#3695@mobuchowskiImplements CatalogDatasetFacet for Spark integration.
#3685@shinabelAdds support for publishing events to Amazon DataZone via dedicated transport.
#3696@dolfinusEnsures dbt start events have unique run identifiers when calling id generation methods multiple times.
#3706@kacpermudaAllows modifications of the parent run facet through JobNamespaceReplaceTransformer.
Changed
#3700@dolfinusPrevents empty facets from being emitted by dbt integration.
#3686@luke-hoffman1Broadens exception handling to address missing column-level lineage issues.
Fixed
#3707@dolfinusEnsures database field is optional and handled gracefully.
#3688@dolfinusImproves exception visibility during Iceberg table retrieval.
#3681@pawel-big-lebowskiCorrectly handles JDBC dataset naming conventions in Flink 2 integration.
v1.32.1Compare Source
Added
#3650@pawel-big-lebowskiThis PR adds support for schema facets in Avro datasets.
#3672@dolfinusProvides utility method for generating static UUIDs.
Fixed
#3682@MassyBForwards dbt's process return code correctly.
#3683@MassyBImplements log rotation handling to avoid log file overflow.
#3676@pawel-big-lebowskiCorrectly resolves dataset namespaces in Flink2.
#3667@luke-hoffman1Adds condition and unit test to correctly extract Spark Job Name suffix.
#3673@ddebowczyk92Prevents exceptions when Option.None is encountered.
#3663@ddebowczyk92Ensures nodes aren't revisited, avoiding duplicate InputDataset creation.
v1.32.0Compare Source
Added
#3652@HuangZhenQiuChanged
#3648@mobuchowski#3637[@dolfinus](https://redirecConfiguration
📅 Schedule: Branch creation - "every 3 months on the first day of the month" (UTC), Automerge - At any time (no schedule defined).
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Whenever PR is behind base branch, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR was generated by Mend Renovate. View the repository job log.