Audio Publishing MythBusters: 5 misconceptions about AI voices you probably believe

The world of artificial intelligence (AI) seems to have ‘nothing to do with ordinary people.’ But what if we say that you can make your audiobook in 15 minutes with the help of only one user-friendly service? Yeah, it’s all about Speechki — the AI-generated audiobook production platform for publishers.

We’ve faced a lot of misconceptions since Speechki’s foundation in 2019. In this article, we are going to break down five common myths about artificial intelligence in the publishing industry, synthetic voices, and the future of audiobook production:

AI is something abstract, and it does not apply specifically to the publishing industry.
AI voices sound creepy.
Using AI is difficult and only possible for programmers.
The audience hates robot voices.
It’s very difficult and expensive to start.

Myths

AI is something abstract, it does not apply specifically to the publishing industry.

We all hear everywhere about the development of AI technologies. AI-powered self-driving cars, AI in healthcare, AI targeted advertising, AI disaster prediction, AI financial advisors, the list goes on…AI is everywhere! As for book publishing, it is commonly believed that AI is something distant. But this is not at all true!

AI is already here and has its place in publishing. For example, AI is used in creating short summaries of audiobooks and content translation. Thanks to the use of AI, there has been significant progress in creating professional translation automation tools. AI creates recommendation systems for readers so that they always have interesting books and their subscriptions are renewed.

And the next step is to automate the recording of audiobooks using AI. But this is not as far in the future as it may seem, and being at the forefront of cutting-edge voice technologies, we can declare, ‘The future is now!’.

This year there was a survey done showing that 67% of audiobook consumers agree that one of the reasons they enjoy listening to audiobooks is to reduce screen time. Clearly, the demand for audiobook content exists. But there needs to be enough supply to meet that demand. Today, only the most popular books get audio versions narrated by professionals. This is because recording is a time-consuming process that can’t practically scale to every published work. 95% of books never get audio versions. That’s a huge amount of content waiting for listeners, that AI voices could help bring to them in no time!. Which is why many publishing companies are now working on auto-generated narrators that can turn more books into audiobooks.

AI-generated voices sound like a robot — they are creepy.

It’s understandable why you might think that. In fact, only a few years ago, it was the case. And I’m sure everyone has heard what Stephen Hawking’s artificial voice sounded like. But there have been enormous developments in the last couple of years that you might not know anything about. The key factor is machine learning. This process allows the software to create machine voices that sound natural and realistic — in many cases indistinguishable from a human narrator. And not only can they imitate what a human sounds like, they can do so over a long period, creating full-length narrated texts.

👉 Check out these ten audio files. Just sort them into the correct category — human or robotic.

AI-powered audiobook recording is complex and confusing.

Some people think that using AI is only possible for programmers and computer scientists who know about AI technologies, machine learning, and all that. Now you will be shocked…

Modern AI synthetic voice editors can be used via simple interfaces that work like Google Docs or Microsoft Word. You make the changes you want in a user-friendly system. The system makes all complex changes itself without showing the source code to the proofer.

Also, no servers, no specialist equipment required — just a browser, headphones, keyboard, and mouse.

People will feel the deception — they will understand that this is a robot.

The audience hates robot voices. Well, that is half right. The audience WILL understand that they are listening to a robot voice. But the audience does not in practice actually hate robot voices.

There is no need to cheat! You can be absolutely upfront about the fact that you are using a synthetic voice. For example: more than 1200 audiobooks were recorded for Storytel, Eksmo, and other publishers around the globe using Speechki’s platform. All of them were marked ‘narrated by AI-narrator.’ And the audience was questioned. They said, ‘We love it because now these books are available in the audio format. The quality is enough for comfortable listening. And sometimes, human-narrated audiobooks sound worse’. Not every actor has a voice that’s pleasant to every listener. But mainly, it’s about availability.

It’s very difficult and expensive to jump into AI technologies.

Yes, it’s something new for them. But it’s actually neither difficult nor expensive. As I have mentioned, there are easy tools for doing it, and the cost is less than with a traditional narrator. So even if your budget for audiobook production is very small, you can use that small amount to produce six or eight audiobooks, test the water, and see how they do on the real market.

Think about a technological tool that’s very common now, say Uber. Fifteen years ago, 2006, nobody had ever heard of it. Now almost everyone uses it. It is convenient, and it works rapidly. And it’s even cheaper than a traditional taxi. Just because it’s new doesn’t mean it will be expensive, or that we need to be afraid of it.

As for the price, with Speechki it costs you $1,000 to create the audiobook, while a traditional method demands at least $5,000.

FAQ

Can AI voices replace real narrators?

One worry that people often mention is that AI will replace voice actors and professional narrators. No, it won’t!

Artificial intelligence sounds good and isn’t creepy, but it does not have the real, natural qualities of a human narrator. It’s not the same thing. Companies already hire human narrators and voice actors on a regular basis, and they absolutely will continue to do so. Nothing can replace the experience of listening to a talented actor or narrator reading a good book. But Stephen Fry only has so many hours in his day. And he expects to be paid for his work. And that’s true of all audiobook narrators, even the ones who aren’t so famous. The point of AI automated voices is to create good-quality recordings of books that otherwise would never be recorded at all, because the process is too costly and time-consuming for the publishers.

Is it true that robots don’t make mistakes?

No. The robot makes a lot of mistakes. But human beings make mistakes too.

Try recording an audiobook narration yourself, and you will notice how often you trip up! Unlike classic human narrators, however, synthetic voices are manageable. A human can go back and rerecord a sentence. A robot can be corrected immediately. But often, mistakes are found after the recording session. If you are using a human narrator, you have to find time in that person’s schedule, get him or her back into the recording studio, and then pay the narrator for that work. When you are using a computer-generated narrator, the mistakes can be corrected when you find them using your mouse and keyboard.

Are audiobooks recorded with one voice or not?

One thing a lot of people will be inclined to ask is this: Will all audiobooks be recorded in only one or two voices?

No. There is a diverse selection of voices (male, female, or child).

Already now publishers can create a copy of your voice or the voice of a famous announcer (with his permission) and then use it for audiobook recording.