Trying to use a text to speach api

Unless you find an TTS API service that outputs a URL, you’re going to be in a world of pain. I have tried Deepgram, Google TTS, Azure TTS, IBM Watson and ResponsiveVoice. They all seem to output a binary of wav or mp3. I wasted countless hours trying to figure it out.
Apparently there is a solution using backend integration and switching between StP and DnD but i couldn’t get it to work: