Skip to content

Conversation

@eugeniavkim
Copy link
Contributor

@eugeniavkim eugeniavkim commented Jul 3, 2025

Description

In order to make harm_categories easier to search through and group by some alias names, this PR refactors our previous String harm categories into a class HarmCategory with categories that are not included in the list to be recognized in an OTHER category. This also does not include all harm categories, but others can continue to add harm areas that they would like to probe for and score with.

This PR is a breaking change as it does affect all datasets, SeedPrompt initialization, and tests.

The following items must be complete before changing from DRAFT Status:

  • Refactor each dataset to map its harm categories to the new class HarmCategory
  • Update DB queries

@eugeniavkim eugeniavkim changed the title [DRAFT] [BREAKING] FEAT Refactor Harm Category as an enum [DRAFT] [BREAKING] FEAT Refactor Harm Category as StrEnum Jul 3, 2025
"bias": cls.REPRESENTATIONAL,
"sexism": cls.REPRESENTATIONAL,
"racism": cls.REPRESENTATIONAL,
"homophobia": cls.REPRESENTATIONAL,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If multiple map to the same one then it's possible that a list of categories maps to a list with duplicates. We should eliminate the duplicates and have a test case for that. It could otherwise result in double counting depending on how downstream code consumes these.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants