Skip to content

Request for Detailed Model-Level Results for Hallucination Detection in RAGTruth  #5

@Jeryi-Sun

Description

@Jeryi-Sun

We are working on a project involving the evaluation of hallucination detection methods in retrieval-augmented generation models. Your work, "RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models," has been instrumental in guiding our research. We deeply appreciate the comprehensive dataset and insightful analyses you have provided.

We are particularly interested in the detailed model-level results presented in Table 5 of your paper, which summarizes the response-level hallucination detection performance for each baseline method across different tasks and models. The overall results are extremely helpful, but for our work, having access to the detailed results for each model (i.e., Llama-2-7B-chat, Llama-2-13B-chat, Llama-2-70B-chat†, Mistral-7B-Instruct) would significantly enhance our analysis and help us avoid unnecessary duplication of efforts.

Request:

Could you kindly provide the detailed experimental results for each model included in the RAGTruth dataset? Specifically, we are looking for the hallucination detection performance metrics (precision, recall, F1 score) broken down by each model used in your experiments:
Llama-2-7B-chat
Llama-2-13B-chat
Llama-2-70B-chat†
Mistral-7B-Instruct

Having this detailed information will greatly aid in advancing our research and help us build upon your findings more effectively. We understand the effort that goes into compiling and sharing such data, and we are immensely grateful for any assistance you can provide.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions