Skip to content

Incorrect Labels #2

@michaelcalvinwood

Description

@michaelcalvinwood

First, thank you for the effort put into RAGTruth. There is a tremendous need for such a dataset.

Unfortunately, some of the labels are sorely inaccurate. Consider Response ID 11898 as one example. This response states three supposed hallucinations, all with implicit_true being false.

Consider the first:

  • Stated Hallucination: "Cons include potentially earning less than those with graduate degrees."
  • Annotator Explanation: "Passages have no mention of this earning less than those with graduate degrees."
  • Supporting Text in Passage: "graduates who are able to find work end up making a lot more than their undergraduate counterparts"

In other words, the provided passage does state that there is a potential for those with graduate degrees to earn more than their undergraduate counterparts; which means that there is a potential for undergrads to earn less than those with graduate degrees. Hence, the annotation is incorrect.

Consider the second:

  • Stated Hallucination: "earning a higher income upon graduation"
  • Annotator Explanation: "Passages have no mention of this detail."
  • Supporting Text in Passage: "the graduates who are able to find work end up making a lot more than their undergraduate counterparts; the median annual salary plus bonus for a person fresh out of grad school with an MBA is $105,000"

Yet, "fresh out of grad school" is equivalent to "upon graduation." And the whole context is "earning a higher income" ("making a lot more than their undergraduate counterparts"). Hence, the annotation is incorrect.

Finally, consider the third:

  • Stated Hallucination: "gaining practical experience"
  • Annotator Explanation: "Passages have no mention of this tip."
  • Supporting Text in Passage: None

Hence, this annotation is correct.

Naturally, the value of the dataset is directly proportional to the correctness of the annotations. While I recognize the immense effort that has gone into this dataset, there's still a need for additional annotators to fix errant labels (and there are a lot of errant labels).

Kindly consider fixing the errant labels to make RAGTruth the incredible resource that it can be.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions