Skip to content

Conversation

@vouidaskis
Copy link
Contributor

@vouidaskis vouidaskis commented Apr 3, 2025

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Description of Change

Upgrade import csv functionality so the script to has the possibility to load complex files. Added the functionality to correctly import parent/child relations, handle none type and empty nodes correctly, better searching of concepts and fix the case where names containing "," would brake the import, fix using existing identifier if it exist, fix tile ordering if multiple tiles exist with the same nodegroup.
Added a functionality to download csv templates for each model. The template contains a column for each node.

Issues Solved

Closes #

Checklist

  • I targeted one of these branches:
    • dev/8.0.x (under development): features, bugfixes not covered below
    • dev/7.6.x (main support): regressions, crashing bugs, security issues, major bugs in new features
    • dev/6.2.x (extended support): major security issues, data loss issues
  • I added a changelog in arches/releases
  • I submitted a PR to arches-docs (if appropriate)
  • Unit tests pass locally with my changes
  • I added tests that prove my fix is effective or that my feature works
  • My test fails on the target branch


return terms

def transform_value_for_tile(self, value, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've actually been working on this in a branch off dev/7.6.x to enable more flexibility of acceptable value formats (e.g. a string pointing to a resourceinstance.legacyid, etc.)

def transform_value_for_tile(self, value, **kwargs):
        # kwargs config looks like this:
        # {
        #     "graphs": [
        #         {
        #             "name": "Person or Group",
        #             "graphid": "ccbd1537-ac5e-11e6-84a5-026d961c88e6",
        #             "relationshipConcept": "6f26aa04-52af-4b17-a656-674c547ade2a",
        #             "relationshipCollection": "00000000-0000-0000-0000-000000000005",
        #             "useOntologyRelationship": False,
        #             "inverseRelationshipConcept": "6f26aa04-52af-4b17-a656-674c547ade2a"
        #         }
        #     ],
        #     "searchDsl": "",
        #     "searchString": ""
        # }
        from arches.app.search.search_engine_factory import SearchEngineFactory

        relatable_graphs = kwargs.get("graphs", [])
        default_values_lookup = dict()
        for graph in relatable_graphs:
            if graph.get("useOntologyRelationship", False) or not graph.get(
                "relationshipConcept", None
            ):
                default_values_lookup[graph["graphid"]] = {
                    "ontologyProperty": "",
                    "inverseOntologyProperty": "",
                }
            else:
                default_values_lookup[graph["graphid"]] = {
                    "ontologyProperty": graph["relationshipConcept"],
                    "inverseOntologyProperty": graph["inverseRelationshipConcept"],
                }

        def build_resource_instance_object(hit):
            return {
                "resourceId": hit["_id"],
                "ontologyProperty": (
                    default_values_lookup[hit["_source"]["graph_id"]][
                        "ontologyProperty"
                    ]
                ),
                "inverseOntologyProperty": (
                    default_values_lookup[hit["_source"]["graph_id"]][
                        "inverseOntologyProperty"
                    ]
                ),
                "resourceXresourceId": str(uuid.uuid4()),
            }

        subtypes_dict = {
            "uuid": uuid.UUID,
            "dict": dict,
            "str": str,
            "int": int,
            "float": float,
        }

        if isinstance(value, str):
            for test_method in [uuid.UUID, json.loads, ast.literal_eval]:
                try:
                    converted_value = test_method(value)
                    break
                except:
                    converted_value = False

            if converted_value is False and value != "":
                converted_value = value.split(",")  # is a string, likely legacyid
                converted_value = [val.strip() for val in converted_value if val]
                try:
                    converted_value = [uuid.UUID(val) for val in converted_value]
                except:
                    pass
            elif converted_value is False:
                logger.warning("ResourceInstanceDataType: value is empty")
                return []
        else:
            converted_value = value

        value_type = None
        if not isinstance(converted_value, list):
            converted_value = [converted_value]
        for value_subtype_label, value_subtype_class in list(subtypes_dict.items()):
            if isinstance(converted_value[0], value_subtype_class):
                value_type = value_subtype_label
                break

        se = SearchEngineFactory().create()
        query = Query(se)
        query.include("graph_id")
        boolquery = Bool()
        transformed_value = []

        match value_type:
            case "uuid":
                results = query.search(
                    index=RESOURCES_INDEX, id=[str(val) for val in converted_value]
                )
                for hit in results["docs"]:
                    transformed_value.append(build_resource_instance_object(hit))

            case "dict":  # assume data correctly parsed via ast.literal
                for val in converted_value:
                    try:
                        uuid.UUID(val["resourceId"])
                    except:
                        continue
                    transformed_value.append(val)
            case _:  # default case (handles str/legacyid and any other types)
                if value_type != "str":
                    converted_value = [str(val) for val in converted_value]
                boolquery.must(
                    Terms(field="legacyid.keyword", terms=converted_value)
                )  # exact match on keyword
                query.add_query(boolquery)
                results = query.search(index=RESOURCES_INDEX)
                print(f"{len(results['hits']['hits'])} hits")
                for hit in results["hits"]["hits"]:
                    transformed_value.append(build_resource_instance_object(hit))

        if len(transformed_value) == 0:
            logger.warning(
                f"ResourceInstanceDataType: no resources found for {converted_value}"
            )
            return
        return transformed_value

@chiatt chiatt self-requested a review April 4, 2025 22:04
Copy link
Member

@chiatt chiatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this adds new functionality it needs to target version 8. Can you change the target to 8.0.x?

@vouidaskis vouidaskis requested a review from a team as a code owner April 17, 2025 10:19
@vouidaskis vouidaskis changed the base branch from dev/7.6.x to dev/8.1.x April 17, 2025 11:50
@jacobtylerwalls jacobtylerwalls removed the request for review from a team June 4, 2025 12:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants