I would like to ask why both prompt audio and prompt text are required in voice cloning. Is it possible to omit the prompt text?