Big Search: The Amazing Opportunity in Video

Today, many structured and unstructured documents are being searched for business-critical information, or at the very least, people say that they should be searchable. This is important, but what makes search really cool is looking far beyond structured or unstructured documents.

There are amazing opportunities to programmatically find useful information in both photos and videos. They have a huge amount of metadata, which can be used to classify them and allow for any kind of search based on a particular set of rules. On the other hand, such a complex information technology (IT) infrastructure is not something you install on a personal computer. Therefore, the beauty of modern technology is the ability to search and analyze photos and videos through an online service. Let’s review some specific possibilities we can achieve right now with search.

Video Audio/Audio Files to Text

This automatic conversion from an audio to a text file is possible, including languages spoken by the majority of the world’s population, such as English, French, German, Chinese, Spanish, Italian, Portuguese, Arabic, and Japanese. You might be wondering whether the text deliverable is 100% correct. No, it is not, unless a human element is inserted along the way. However, this fact does not eliminate the ability to understand what has been said. The text files can then be searched by any “ordinary” document management solution. The possible uses are endless: from university classes to any sort of seminars/workshops, as well as news media.

Video Search for Audio: Metadata Extraction

If we have the technology to convert an audio file from a video into a text file, then we can search for specific words spoken, and the search engine will provide the correct timecode links within the video. Metadata for indexing the video can be extracted and used automatically.

Video Search for Audio: Custom Vocabulary Adaption

Not only is it possible to extract metadata from spoken words, but it is also possible to include words that that were not spoken but are related to those which were. This correlation is made in real time, based on the search engine optimization (SEO) methodologies.

Video Search for Audio: Hierarchy of Keywords

Knowing how many times a word is spoken may lead to a useful hierarchy of keywords that can be useful in all sorts of analysis, such as tagging or as an input to a recommendation engine.

Video Search for Audio: Different Languages

We have already seen that it is possible to convert a video’s audio into a text file for a number of major languages, but, in fact, it is possible to go one step further—converting the input language to 50+ output languages, including:

Searching in any of the 50+ languages
Creating subtitles not only in the input language but in any of the 50+ output languages

Again, it is not 100% accurate, but this automatic process is good enough to allow for the clear understanding of the spoken word. Remember, no human element is being used so far.

Video OCR Search

Video search goes beyond audio. We can use optical character recognition (OCR) with the video itself and identify all the words that appear in the image. Today’s top solutions allow for identifying up to 25 languages.

Video Search for Movement

Many times, we are interested in watching for movement on a certain camera, but it is very boring to carefully watch every second until we find whatever movement took place. Eventually, we are looking at a couple of seconds of duration.

It is now possible to have a clear indication in the video where movement was detected and just click to this snippet. In parallel, we also have a timecode list of when the movement took place, along with timestamp.

Video Search for Emotions

Today, it is possible to identify and search for up to eight types of emotions shown by people captured in one video. The uses for this are mind-boggling. From situations like the TV series Bull to marketing studies, to political analysis, anything is possible.

Image Classification and Contextualization

Images are measured, in any place, by the zillions, namely if we are talking about a media company. Today, it is possible to identify the gender and the probable age of each face, with a number of metadata fields taken from the image by automatic processes.

There is a universe of search possibilities now unfolding right before our very eyes. What can we do with it? We can do most anything.

Joao Penha-Lopes specializes in document management since 1998. He holds two postgraduate degrees in document management from the University Lusofona (Lisbon) and a PhD from Universidad de Alcala de Henares (Madrid) in 2013, with a thesis studying the economic benefits of electronic document management (EDM). He is an ARMA collaborator for publications and professionally acts as an advisor on critical information flows mostly for private corporations. Follow him on Twitter @JoaoPL1000.