I am looking to make something which ought to be somewhat straightforward, but still I cannot crack the code. I hope some of you can help me.
I work at a small museum in a muliti-language area and to avoid having to much text in our exhibition, we want to make and app that kan make translations of our signs. We do however have bad experiences with machine translations and want to be able to control the output. To this end, we want to make an app that can take a picture, match this picture to an image database, and provide the guest of a translation of our own design.
I do believe that thunkable should be able to help, but can’t seem to “crack the nut”. Normally in these situations I have found ChatGDP to be somewhat helpfull but when it comes to Thunkable my AI-assistant is taking gibberish.
Can you explain a specific example of how this might work from a museum guest’s perspective? I’m not understanding how the image matching is supposed to function.
You can do this with Thunkable but it’s going to involve a lot of API work for image recognition and translation. ChatGPT might be able to do all of that for you but I don’t know how useful the image recognition data is going to be, especially if you’re trying to match that to an existing image database. I guess it depends on how similar/different each image is and what tags you provide in the database that can be matched by whatever ChatGPT comes up with. You might need to provide those same tags to ChatGPT in the form of custom instructions so that it can choose the most likely one instead of coming up with its own.
This is going to take A LOT of time to get right. I’ve done some of this work before, for example with this project. I’d expect to plan for at least 40 hours to code, test, and fully build such an app.
Sure - and thank you for your response. If the model you propose is the only one, we will seek professional help.
The app from the guest perspective:
They open the app, and select language (either German og English), then the app proceeds to the camera. With the press of a button they take a picture. If one of our signs (we have about 50) is in the photo, the app will recognize the sign, and provide the guest with a German or English (depending on their choice of language) translation of the sign. This translation is made beforehand and is stored in the app, so basically we need the app to recognize the sign and then provide a specific text to that sign.
The signs are quite simple - black letters on white text.
Can you provide a few visual examples of the signs? Are they words, symbols, things commonly seen on roads or buildings, etc.?
My sense is that you might want to hire someone on Fiverr to make a prototype of the app for you. Probably a true coder but there are people who work in Thunkable (myself included but I’m not available at the moment) in case you want something you can then modify relatively easily.
Then again, if the signs are only text-based, there are good options for OCR that can retrieve the text and then it would be pretty easy to translate that into another language.
It seems possible. The implementation might be different than what you are describing and it largely depends on whatever API for image recognition you are planning on using.
If you want a very low tech version of this, what I would recommend would be:
For your signs, if you had a number or a symbol that would be great. After they press English or German, they can walk around. If they want the translation, they just look for the number or symbol on the sign. Then they press the button that shows that number or symbol that then shows them the translation. Since you said that the translation is stored in app already, you don’t need any API in this way.
The signs are quite simple - plain text on a white-ish background printed on high-quality thick paper with a nice and visible texture. We did consider QR-codes or a number-reference-system but decided that it would be too compromising the exhibition-design (having both text to the individual artifact, curatation texts and quotes we would end up having quite a lot of references). That being said, the option - though maybe not preferable - may end up becoming our reality.