Measuring incoming base64 sound files

Hi All,

TL;DR is it possible to determine the length of an incoming base 64 sound file as it returns form an API?

I’m successfully running text through Google’s Text-2-Speech API and it’s returning speech which I’m then playing in app just fine. My issue is that my app needs to do the following:

{user} : Press button
{app} : issues verbal greeting / asks for command
{user} : issues verbal command
{app} : issues verbal acknowledgment

My issue is that I don’t know when to start recording the user to capture the verbal command because I am dynamically creating verbal greetings from the app. My question then: is it possible to determine the length of the incoming base 64 encoded sound file so I can dynamically program a recording delay?

Chat GPT says no:

No, it is not possible to determine the exact length of a base 64 encoded sound file just from the text string itself.

When you encode a sound file using base 64, the resulting text string will be longer than the original binary data. However, the length of the encoded string is not directly related to the length of the original sound file.

The only thing I can think of is to upload it somewhere like Cloudinary that would probably give you a duration value but the upload process would likely slow things down too much for your uses.

That being said, it would be super helpful to be able to determine a sound file’s length in Thunkable. Granted, it might not work for base64 but for other files I can see that working. That would be a good feature request on GitHub.

1 Like

Yeh funny I posted this and then asked Bing the same thing hehe.
It suggested sending it to another API which would do just that. My issue is that it adds yet another stop en route and that just adds more lag (it’s bad enough as is). It’ll be voice > cloudinary > whisper tts > app > chatGPT > app > Google.

With that said, the Google stt API is pretty damned fast. I might test it.

I was also wondering if there was a way to do it by proxy and estimate it based on the length of the base64 string or something. Bing suggested there might be some rough maths that might give me an idea.

I also wondered if there was another tts API which might give me time stamps like the whisper one does and I could estimate it based on those.

Are you using Google Cloud Voice? It’s awesome. Love it.

I was curious about the string length, too, but GPT-3 said it wasn’t related to the duration of the file. Shouldn’t it be, though?

1 Like

Yeh it’s good. I’m not sure it’s as good as I’d like but for 1m characters a month free I will happily thank my Google overlords and be thankful. It’s not the ElevenLabs model, that’s very good!

Anyway, with a bit more prodding Bing thinks that it is possible to back calc the timing given that you know the encoding / compression model used for the original sound file. This differs based on the voice you choose in Google but LINEAR16 by the looks of it for the most part. Take it away Bing:

According to Google Cloud Text-to-Speech documentation1, the default values for these properties depend on the voice and language you choose. For example, for a US English voice (en-US), the default bit depth is 16 bits, the default sample rate is 24000 Hz, and the default number of channels is 1 (mono)1. Therefore, using these values, we can calculate the bitrate as follows:

bitrate = bit depth * sample rate * number of channels bitrate = 16 * 24000 * 1 bitrate = 384000 bps

To get the length of a sound file from this bitrate, we can use this formula234:

audio file length (in seconds) = audio file size (in bits) / bitrate

For example, if you have a 10 MB sound file in LINEAR16 format with these settings, you can plug in these values into the formula:

audio file length = 10 * 8 * 1024 * 1024 / 384000 audio file length = 218.45 seconds

so now we need the audio file size. I’m not even sure we can do that either {sigh}.

@tatiang ever built an API? Wondering how hard it might be to build the functionality for that.

I tried saving the audio stream to a stored variable but couldn’t then automatically retrieve the size of the file :frowning:

Isn’t bitrate just a measure of sound quality? I don’t see how you could get duration from that. But I’m willing to be wrong.

And no, I’ve never built an API but I’m interested in doing so. I think @jared and some other folks here have experience with that.

This does feel like a problem that should have a solution!

I think it’s basically sound quality but that’s a proxy for the amount of data per second. Thus, if you can work out the overall size of the file and the bitrate (the number of bits its churning through per second) you can just div one by the other and get back to a (rough) number of seconds. In thunkable though, I’d need to save the incoming base64 and get the size of it too, and even that looks difficult

Bing for the win! I believe this Javascript works, embedded in html:

<!DOCTYPE html>
<html>
  <head>
    <title>Sound Duration Example</title>
  </head>
  <body>
    <audio id="audio" controls></audio>
    <script>
      const base64String = "base64 string here";
      const audio = document.getElementById("audio");
      audio.src = "data:audio/wav;base64," + base64String;
      audio.addEventListener("loadedmetadata", () => {
        console.log(`Duration: ${audio.duration} seconds`);
      });
    </script>
  </body>
</html>

I tested it with a 6 second sound and it returned “Duration: 6.426122 seconds” in the browser console.

Thunkable can send and receive Javascript using the Web Viewer component.

1 Like

OK - so I set up a PowerAutomate flow to stand as a sort of proxy API in the interim. I’m not 100% sure this is working but it seems like it is. It might just be close enough for my purposes. I was using Bing to help me pick through this.

Here’s what it’s doing:

  1. Accepting an HTTP request to the url which contains a base64 string (contains MP3 as 32kbps)
  2. Converts the base64 string to a string using the decodebase64 function
  3. Calculates the length of the resulting string > integer of file length
  4. Multiplies this by 8 chars / byte > number of bytes of the original file
  5. Divides this by 32000 bps > File duration in seconds
  6. Returns this by HTTP response

I tested it by giving it two text strings one pretty short which was estimated at 2 second (sounded right to me), the other was much longer and came out as 8 seconds (again, best I could time it, it was there abouts. It might have been a touch shorter than that).

1 Like

Lol - you beat me to it! I’ll definitely give this a whirl as I reckon that might be faster.

1 Like

Sorry I’m going to be dense. I just had a poke around and quickly looked through the forum but figured I’d just ask as I’ve not used the Webviewer before. Does this mean I need to set up a site somewhere and host that html or can it just be copied and pasted into the app somewhere e.g. as you would the body of an API post request?

So does that mean you can just take the string length of the base64 string and divide it by 4000 (L*8/32000 = L/4000)? That would be really easy to do internally in Thunkable if that’s the case.

Hmm… maybe not. The string I used was 137548 characters. Dividing that by 4000 gives 34.4. Not even close to 6.4 but… could it be a multiple? Anyway, it’s a stretch.

I think you need to decode it first. I’m doing the decodebase64 step in PowerAutomate. Not 100% sure what that function does to the numbers.

Also it depends on the format you’re using. I’m using 32kbps MP3. MP3 are uncompressed so you just need the bit rate and the file size. Bing win

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.