Both Google Home and Alexa use Text to Speech (TTS) or Text to Audio as some call it. Text to Speech interactions are amazing technologies. It improves the lives of people for example with language barriers, vision problems, physical problems, limited time to absorb content or just making everyday tasks easier to complete with just a command.
Building this tech is no easy feat. Human speech has many nuances. For example- the way we lower or raise our voices based on the intent of the content spoken, how we pronounce names or our speech styles due to our upbringing or location. Today Artificial Intelligence can detect these, understand it and respond. That’s impressive. According to Google, they have closed the gap with human speech by 70%.
It’s naive to think, that it was just coding and AI that got Google (or other companies) this far. There are many scenarios which coding can’t solve. Humans are required to understand what’s taking place in the scenario or content vocalized and what the true meaning is. An easy way to think about this is an Irish person from Ireland saying “chocolate” vs a person from Mexico saying “chocolate”. Their accents, speed of speech are significantly different. Humans can understand this and thus can code for this and improve audio technology to understand this.
So, is Google, Amazon, Apple, Microsoft and others listening to us speak? You can deduce from what I said above, the answer is clearly “yes”. But I expect and I trust those companies to keep the data: a) for a limited time b) have no PII (Personally Identifiable Information). I would say to them “Use my voice to improve your tech, help people improve their lives with it, but respect my privacy.”
With our app AudiBrow, we use Text To Speech (TTS) to convert articles into audio format. AudiBrow is an app that reads out loud written web content. So, instead of sitting in front of your computer or phone to read you can simply listen to the content while you commute or at the gym. We use services from the aforementioned companies.
My app users expect the content read as close to the human voice as possible. If it sounds robotic, I’d lose my users within a few minutes. To sound human, companies like Google have learned how to simulate breaking, voice changes, word relationships, etc.
I, other developers, and consumers look forward to the growth of this technology. Yes, it comes with concern, but we should hold those companies responsible to use our data properly.