Skip to content

Conversation

@poikilotherm
Copy link
Contributor

@poikilotherm poikilotherm commented Nov 24, 2025

What this PR does / why we need it:

Which issue(s) this PR closes:

Special notes for your reviewer:
To be done.

Suggestions on how to test this:
Execute Migration DB Tests with mvn verify -PskipUnitTests -Dit.groups=migration
Edit: This PR absolutely needs to be tested with a real deployment on a real prod. database. Among other things, please test and confirm that any configured controlled vocabularies are still working, on the off chance that the :CVocConf setting is going to be treated differently. (L.A.)

Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Nope

Is there a release notes update needed for this change?:
Nope

Additional documentation:
Introducing proper migration test here for the first time. Will need discussion and documentation. Docs may be beyond scope of this PR.
Happy to talk about the idea during tech hours.

…nge upstream)

Upgraded Testcontainers to `2.0.2` and updated related artifact names for consistency (`testcontainers-junit-jupiter`, `testcontainers-postgresql`, etc.). Bumped `testcontainers-keycloak` to `4.0.0` which is compatible with TC2.
…guration

Before, the Postgres Server version was only present in the container profile. Now moving it into the parent POM and aligning with our documented settings allows reuse in tests, too.

Using the caps of Maven Failsafe to expose the server version in JUnit Tests will allow to use it with Testcontainers, deploying a Postgres Container for migration tests as documented in our guides.
Expanded integration test tags in `pom.xml` to include `migration` and added `DBunit` as a test-scoped dependency to support database migration testing. Updated `Tags.java` to reflect the new tag, to be used in tests that are migration tests (so they can be easily excluded or run as necessary).
…gration tests

Added a reusable PostgresContainer to optimize resource usage in database migration tests. Ensures a single container instance is shared across all migration test cases, with automatic cleanup on JVM shutdown.
…ration`

Introduced comprehensive tests for `V6_8_0_1__SettingsDataMigration` using Testcontainers and DBUnit. Includes scenarios for migrating settings to new formats, handling null and invalid values, and verifying JSON transformations.

This is a reproducer for an issue discovered after merging PR #11654 by @landreev. See also #11654 (comment)
The migration script `V6.8.0.1.sql` failed with a `PSQLException` stating "there is no unique or exclusion constraint matching the ON CONFLICT specification".

This occurred because the script used a partial index syntax for the conflict target:
`ON CONFLICT (name) WHERE lang IS NULL`

However, PostgreSQL's `ON CONFLICT` inference requires the target to syntactically match an existing unique index or constraint. The actual index on the `setting` table is a functional index defined as:
`CREATE UNIQUE INDEX unique_settings ON setting (name, (COALESCE(lang, '')))`

While `WHERE lang IS NULL` and `COALESCE(lang, '')` logically handle nulls similarly for uniqueness in this context, Postgres does not treat them as interchangeable for arbiter inference.

This commit updates the migration script to explicitly target the functional index expression:
`ON CONFLICT (name, (coalesce(lang, '')))`

References:
- PostgreSQL 16 Documentation on INSERT: https://www.postgresql.org/docs/16/sql-insert.html#SQL-ON-CONFLICT
@pdurbin pdurbin added this to the 6.9 milestone Nov 25, 2025
@pdurbin pdurbin moved this to In Progress 💻 in IQSS Dataverse Project Nov 25, 2025
…ak client is autoclosed

Removing the try-with-resources avoids autoclosing the resource, which triggers a logout. As we reuse the token in multiple tests, we don't want that.
@coveralls
Copy link

coveralls commented Nov 26, 2025

Coverage Status

coverage: 24.038% (+0.001%) from 24.037%
when pulling 403153d on 11996-fix-settings
into b8f5c1e on develop.

@github-actions

This comment has been minimized.

…res container in migration tests

Extracted DBUnit helper methods to a new `DBUnitHelper` class for reusability and cleaner test code. Updated `V6_8_0_1__SettingsDataMigrationIT` to utilize these helpers and the shared Postgres container for better maintainability and reduced redundancy in test setup.
… simplify access methods #11996

Replaced hard-coded workflow keys with structured enum-based keys in `TriggerType`. Updated `WorkflowServiceBean` and `SettingsServiceBean` to use consistent key resolution methods, improving readability and maintainability. Updated related database migration script to align with new key naming schema.

This adds the missing keys after introduction of naming restrictions in #11654. The config keys for the default workflows will no longer be cleansed from the database during deployment.
@github-actions

This comment has been minimized.

Expanded documentation to include new Settings API options for managing workflows. Added references for `WorkflowsAdminIpWhitelist`, `PrePublishDatasetWorkflowId`, and `PostPublishDatasetWorkflowId` with usage examples. Enhanced developer guide with a new "Administration" section for workflow-related settings.
…ings

Enhanced `V6_8_0_1__SettingsDataMigrationIT` to include tests for new workflow settings (`PrePublishDatasetWorkflowId` and `PostPublishDatasetWorkflowId`). Updated expected data assertions and display names for better clarity and completeness.
@github-actions

This comment has been minimized.

@poikilotherm poikilotherm marked this pull request as ready for review November 28, 2025 11:22
@landreev
Copy link
Contributor

landreev commented Dec 3, 2025

One last, small thing - is this a desired behavior, or an inconsistency?
If I have a JSON-structured setting - for ex., the :CVocConf or the new :TabularIngestSizeLimit, calling the settings API on that setting directly now outputs it the same way as before, i.e. as an escaped text string:

curl "http://localhost:8080/api/admin/settings/:TabularIngestSizeLimit" | jq .
{
  "status": "OK",
  "data": {
    "message": "{\"CSV\": 6000000, \"dta\": 50000000, \"Rdata\": 1000000, \"default\": 100000000}"
  }
}

However, if I run the API to output ALL of my settings, the value is shown as parsed json, i.e.:

curl "http://localhost:8080/api/admin/settings/" | jq .
{
  "status": "OK",
  "data": {
      ...
      ":SystemEmail": "[email protected]",
      ":TabularIngestSizeLimit": {
        "CSV": 6000000,
        "dta": 50000000,
        "Rdata": 1000000,
        "default": 100000000
    },
    ...
}

(this is new as of this PR)

@poikilotherm
Copy link
Contributor Author

No, this isn't new. It had been discussed before in the original PR and the decision was to maintain backward compatibility. Existing scripts might rely on the single setting being presented as escaped JSON.

I personally would favor using proper JSON (so its the same going in and out), but maybe that's just me.

@pdurbin
Copy link
Member

pdurbin commented Dec 3, 2025

I'm ok with proper JSON as long as we document the backward-incompatibility.

@landreev
Copy link
Contributor

landreev commented Dec 3, 2025

No, this isn't new. ...

Meaning, it is new in 6.9, but not in this PR? Yeah, I'm seeing it in the develop branch.

... the decision was to maintain backward compatibility.

As in, the behavior of /api/admin/settings/:SpecificSetting was left unchanged for backward compatibility?
I can read the discussion in the original pr when I have a sec. It does feel a little strange that the behavior is different between these 2 versions of the same API. ... but probably not the biggest problem in our lives.

@qqmyers
Copy link
Member

qqmyers commented Dec 3, 2025

+1 for proper JSON - as a new issue/PR

Copy link
Member

@qqmyers qqmyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. This has a fix for the migrate script issue and includes the missing workflow-related settings needed for 6.9, along with a new migration script test/testing facility discussed/reviewed yesterday in Tech Hour.

@github-project-automation github-project-automation bot moved this from In Review 🔎 to Ready for QA ⏩ in IQSS Dataverse Project Dec 3, 2025
@qqmyers qqmyers removed their assignment Dec 3, 2025
@landreev
Copy link
Contributor

landreev commented Dec 3, 2025

+1 for proper JSON - as a new issue/PR

As a counter-argument, the current (escaped) output of /api/admin/settings/:SpecificSetting is better suited for a round trip via curl -X PUT -d "..." ... on the command line.
I don't feel strongly about this either way.

@github-actions

This comment has been minimized.

@cmbz cmbz added the FY26 Sprint 12 FY26 Sprint 12 (2025-12-03 - 2025-12-17) label Dec 3, 2025
@poikilotherm
Copy link
Contributor Author

So what's the consensus for this? (Please note: I don't have much time to look into this today and tomorrow off)

@landreev
Copy link
Contributor

landreev commented Dec 4, 2025

@poikilotherm

So what's the consensus for this? (Please note: I don't have much time to look into this today and tomorrow off)

Note that @qqmyers approved this PR yesterday. I thought that was the consensus, that it was ready to go as is. The discussion of potential changes to json formatting was in the context of something to consider in the future. Per "as a new..." bit:

+1 for proper JSON - as a new issue/PR

It still needs to be QAed and merged. Hopefully somebody can do that quickly today. I am less crazy busy today, so if nobody beats me to it, I will do that.

@poikilotherm
Copy link
Contributor Author

Sounds good to me @landreev ! Thx for the clarification.

@landreev
Copy link
Contributor

landreev commented Dec 4, 2025

The deployment works on our prod. db.
All the obsolete settings are properly dropped from the database.
The ingest size limits setting is properly migrated, as {"xlsx": 1, "Rdata": 1048576, "default": 157286400}
However, the actual functionality that this setting provides is obviously broken.
This is the file uploads page on qa.dataverse.org:
Screenshot 2025-12-04 at 2 39 10 PM

vs. how it's supposed to look, in prod.:

Screenshot 2025-12-04 at 2 39 34 PM

This is an important piece of functionality, definitely for us at HDV and likely for many other prod. instances.

@landreev
Copy link
Contributor

landreev commented Dec 4, 2025

From server.log:

[2025-12-04T19:56:37.501+0000] [Payara 6.2025.3] [WARNING] [] [edu.harvard.iq.dataverse.util.SystemConfig] [tid: _ThreadID=125 _ThreadName=http-thread-pool::jk-connector(4)] [timeMillis: 1764878197501] [levelValue: 900] [[
  Disabling all tabular ingest completely until fixed!]]

@landreev landreev moved this from Ready for QA ⏩ to QA ✅ in IQSS Dataverse Project Dec 4, 2025
@landreev landreev self-assigned this Dec 4, 2025
@landreev landreev moved this from QA ✅ to In Progress 💻 in IQSS Dataverse Project Dec 4, 2025
@landreev landreev removed their assignment Dec 4, 2025
@pdurbin
Copy link
Member

pdurbin commented Dec 4, 2025

The timing is unfortunate since we'd like to start the release process for 6.9.

My understanding is that this show-stopping bug is in #11654, which has already been merged. I tried the "revert" button on that PR but got this message: "Sorry, this pull request couldn’t be reverted automatically. It may have already been reverted, or the content may have changed since it was merged."

We're also discussing in Slack: https://iqss.slack.com/archives/C03R1E7T4KA/p1764878723804329

@landreev
Copy link
Contributor

landreev commented Dec 4, 2025

This is another error message from server.log btw:

  Could not convert :TabularIngestSizeLimit to long from JSON integer. You must provide the long number as string (use quotes) for format xlsx

Is that what this is, does it expect this in the json: "xlsx": "1" instead of "xlsx": 1?

…ve validation and logging

Enhanced `getTabularIngestSizeLimits` to accept numeric values in addition to strings for size limits. Improved validation to handle invalid types, numbers, or decimal values. Updated related tests and configuration documentation.

This was done because the limitation to long numbers as string was artificial. Users can choose which way they like best. Also, the data migration uses numbers, so this lead to errors.
Copy link
Member

@qqmyers qqmyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update to allow numeric values in JSON looks fine, unit tests pass.

@github-project-automation github-project-automation bot moved this from In Progress 💻 to Ready for QA ⏩ in IQSS Dataverse Project Dec 4, 2025
…tSizeLimits

Refactored `getTabularIngestSizeLimits` to utilize try-with-resources for safer JSON parsing. Enhanced logging with lambdas and improved iteration over JSON entries.
@github-actions

This comment has been minimized.

1 similar comment
@github-actions
Copy link

github-actions bot commented Dec 4, 2025

📦 Pushed preview images as

ghcr.io/gdcc/dataverse:11996-fix-settings
ghcr.io/gdcc/configbaker:11996-fix-settings

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

FY26 Sprint 11 FY26 Sprint 11 (2025-11-20 - 2025-12-03) FY26 Sprint 12 FY26 Sprint 12 (2025-12-03 - 2025-12-17) Size: 10 A percentage of a sprint. 7 hours. Type: Bug a defect

Projects

Status: Ready for QA ⏩

Development

Successfully merging this pull request may close these issues.

Pre/Post Publication workflows broken as of #11654

7 participants