TL;DR is it possible to determine the length of an incoming base 64 sound file as it returns form an API?
I’m successfully running text through Google’s Text-2-Speech API and it’s returning speech which I’m then playing in app just fine. My issue is that my app needs to do the following:
My issue is that I don’t know when to start recording the user to capture the verbal command because I am dynamically creating verbal greetings from the app. My question then: is it possible to determine the length of the incoming base 64 encoded sound file so I can dynamically program a recording delay?
No, it is not possible to determine the exact length of a base 64 encoded sound file just from the text string itself.
When you encode a sound file using base 64, the resulting text string will be longer than the original binary data. However, the length of the encoded string is not directly related to the length of the original sound file.
The only thing I can think of is to upload it somewhere like Cloudinary that would probably give you a duration value but the upload process would likely slow things down too much for your uses.
That being said, it would be super helpful to be able to determine a sound file’s length in Thunkable. Granted, it might not work for base64 but for other files I can see that working. That would be a good feature request on GitHub.
Yeh funny I posted this and then asked Bing the same thing hehe.
It suggested sending it to another API which would do just that. My issue is that it adds yet another stop en route and that just adds more lag (it’s bad enough as is). It’ll be voice > cloudinary > whisper tts > app > chatGPT > app > Google.
With that said, the Google stt API is pretty damned fast. I might test it.
I was also wondering if there was a way to do it by proxy and estimate it based on the length of the base64 string or something. Bing suggested there might be some rough maths that might give me an idea.
I also wondered if there was another tts API which might give me time stamps like the whisper one does and I could estimate it based on those.
Yeh it’s good. I’m not sure it’s as good as I’d like but for 1m characters a month free I will happily thank my Google overlords and be thankful. It’s not the ElevenLabs model, that’s very good!
Anyway, with a bit more prodding Bing thinks that it is possible to back calc the timing given that you know the encoding / compression model used for the original sound file. This differs based on the voice you choose in Google but LINEAR16 by the looks of it for the most part. Take it away Bing:
I think it’s basically sound quality but that’s a proxy for the amount of data per second. Thus, if you can work out the overall size of the file and the bitrate (the number of bits its churning through per second) you can just div one by the other and get back to a (rough) number of seconds. In thunkable though, I’d need to save the incoming base64 and get the size of it too, and even that looks difficult
OK - so I set up a PowerAutomate flow to stand as a sort of proxy API in the interim. I’m not 100% sure this is working but it seems like it is. It might just be close enough for my purposes. I was using Bing to help me pick through this.
Here’s what it’s doing:
Accepting an HTTP request to the url which contains a base64 string (contains MP3 as 32kbps)
Converts the base64 string to a string using the decodebase64 function
Calculates the length of the resulting string > integer of file length
Multiplies this by 8 chars / byte > number of bytes of the original file
Divides this by 32000 bps > File duration in seconds
Returns this by HTTP response
I tested it by giving it two text strings one pretty short which was estimated at 2 second (sounded right to me), the other was much longer and came out as 8 seconds (again, best I could time it, it was there abouts. It might have been a touch shorter than that).
Sorry I’m going to be dense. I just had a poke around and quickly looked through the forum but figured I’d just ask as I’ve not used the Webviewer before. Does this mean I need to set up a site somewhere and host that html or can it just be copied and pasted into the app somewhere e.g. as you would the body of an API post request?
So does that mean you can just take the string length of the base64 string and divide it by 4000 (L*8/32000 = L/4000)? That would be really easy to do internally in Thunkable if that’s the case.
Hmm… maybe not. The string I used was 137548 characters. Dividing that by 4000 gives 34.4. Not even close to 6.4 but… could it be a multiple? Anyway, it’s a stretch.