AI and Speech Recognition Trends: Media and Entertainment

There are few things we appreciate more than a good listener. Feeling heard and understood is a linchpin to every relationship, and it’s precisely this that leads to many of the most valuable things in life: trust, loyalty, and other strong bonds.

What if your media could create a similar relationship with your audience? Brands have long understood the importance of listening to their customers, but now, with artificial intelligence (AI) and automatic speech recognition (ASR) technologies, lending an ear is more literally possible than ever before. Not only are these customer experiences more personalized, but this intuitive mode of interacting is more engaging.

You know that consumers have no shortage of options for where to find content in 2021. How do you set yourself apart and build the kind of brand loyalty that will keep audiences coming back?

It’s simple. Start listening.

To get you started, we’ve picked the top 3 ways to leverage speech recognition in media and entertainment. One of them will make your life a little easier, and the others will help you create better experiences for your audience.

They come down to what we call The Three I’s:

  • Immersion
  • Interactivity
  • Insights

Voice recognition creates a rich and immersive experience because it invites the audience to interact in ways that may be novel in the digital realm yet that are completely natural and intuitive. At the same time, your business can use ASR to generate powerful insights about your content to help you be more productive and effective.

1. Media Asset Management

Whether you’re a big player in the entertainment industry or just a company that has multimedia assets for advertising or training purposes, you know how hard it can be to keep accurate tabs on your content, to find the right clips when you need them, and to manage and organize years and years of products.

The problem with audio and video files is that they’re inherently unstructured data types. You can’t search for keywords or phrases in them in the same way that you could in a document. Without any comprehensive way to catalog files, we’re left with just titles and metadata, such as the file’s creation date, size, and type.

So what do you do when you need to find a specific quote from an interview that happened over a year ago? How do you analyze your assets and their performance without an easy way to find them?

Too often the answer is just give up. It doesn’t have to be that way.

ASR for media asset management changes this paradigm by automatically generating searchable, indexable transcripts for these pIncrediScribeiously unruly file types. In addition to giving you greater control over your assets, it also makes life easier for your and your users by providing captions.

2. Interactive Media

One way that we’re already seeing artificial intelligence affect entertainment is using algorithms to serve consumers more personalized content with hyper targeting. In an interview at Towards Data Science, Christopher Whitely, Senior Director of Applied Analytics at Comcast, explains that “There is a trend towards using machine learning models to deliver the most relevant content to consumers to keep them engaged, whether that is programming that they might watch or advertisements that are of interest to them.”

ASR is taking this trend even further by enabling truly interactive products that let us use our voices to either interact with digital characters or control digital environments. While we expect this to play out across numerous industries, such as TV with choose your own adventure style media, the biggest gains are happening in video games.

It’s come a long way already. In 2014, a Canadian game developer released Bot Colony, a game “where you converse in English with robots who understand what you say” to tell them to take various actions, such as pick up a briefcase. However, if you look at how it actually played out, we see how incredibly frustrating it is when it doesn’t work. It may be funny for a little while, but if you say the same sentence over and over again to no effect, it gets old quickly.

The lesson? If you’re going to use ASR in a game, make sure to do it right.

Luckily, technological advances like deep learning have enabled us to do just that. Looking at a more recent example of a streamer using voice commands to play Sword and Sorcery, the voice recognition is almost flawless, so much so that the streamer calls it an “unreal experience.”

Speech Tech Magazine sums it up: “Voice recognition makes games more immersive, feel more real, and therefore enhances the experience of the players.”

3. Virtual Reality

Another factor that leads to this “unreal experience” is that it happened in virtual reality (VR). Voice is a key part of this. VR relies on embodiment for us to feel present in the virtual world that VR asks us to join. Being able to move around and pick things up is great, but without the ability to speak and feel heard, we would never feel truly embodied in virtual space.

Video games are the obvious first application of this, but they’re far from the only. One of our favorites is using ASR and VR for First Responder training. An article for the National Institute of Science and Technology (NIST) explains that “In Health Scholars’ virtual scenario, users wear a VR headset and speak into the built-in microphone to direct a team of virtual avatars through cardiac arrest. Leveraging voice recognition and motion capture technologies, users can command their virtual team members to shock, perform cardiopulmonary resuscitation (CPR), and administer medications.”

An even more ambitious application of ASR and VR is using this technology to talk to the dead. Yup, you read that right. Companies like HereAfter and Eternime are developing tech that lets you build a lifelike avatar of the dearly departed in order to have a “conversation” with them in VR. You might not be able to ask Grandma the question you always wanted to ask if the training data set doesn’t contain the answer, but getting one more chance to speak to your loved ones is still quite the opportunity.


One of the main reasons we love sharing our ASR technology with the world is the incredible applications that people dream up. No matter how you decide to incorporate voice recognition into your product, we guarantee that if you focus on interactivity, immersion, and insights then you can’t go wrong.

Ready to get started? Learn more about, our speech-to-text API that gives developers instant access to a fully built speech recognition engine.