System Info
grouped_messages = [
[message_lst[i]["messages"][j] for i in range(len(message_lst))] for j in range(num_repeat_rollouts)
]
recipe/collabllm/reward_function.py
list index out of range problems in function :async def _compute_rewards_async(self, data: DataProto, return_dict: bool = False) -> torch.Tensor | dict[str, Any]:
Information
Tasks
Reproduction
1
Expected behavior
1