Stuck for TTS(text-to-speech) : getting duration time or using other tts APIs

I’m running news app with tts.
But the thing is, the duration time varies depending on what the device is.
That’s why it’s not working out the way I intended. (screen changes when it’s done)
(I put different time for OS, but among android devices, the time is not the same)

So if there’s a way to figure out the duration time for this feature, it will help a lot.

But for now that’s not possible, so I was searching for other TTS APIs.
But as some of you guys noticed, i failed all the other ways. (even i succeed when using other no-code service like bubble). i assume Thunkable can’t take mp3 file through API.

If you guys are aware of this issue or if there’s a way, then please share the ideas.

The text-to-speech playback speed depends on various factors such as the language, punctuation, the speed set in thr device’s settings and that makes it hard to predict the actual duration of speaking the text. The way I approached it in the past took into consideration the average miliseconds needed for each character spoken which more-or-less works but it can never be accurate. Any other ideas are welcome.