Last March, Google introduced the “Live Captions” feature in Chrome browsers, a tool that uses machine learning to create an instant translation on any video or audio clip, giving deaf and hard of hearing people greater access. to Internet content.
In the past, subtitles were pre-programmed for video formats, or there was a writer who wrote an instant title that was broadcast on television, but now Live Captions will change this with just a few taps on the screen, so any user will be able to be able to get instant and accurate video and audio subtitles.
Google Live Captions is a type of natural language processing or NLP technology, a form of artificial intelligence that uses algorithms to facilitate “interaction” of some kind between people and machines. NLP helps us to decode human languages into machine languages.
The history of smart computing
To understand the history of NLP, we have to go back to one of the most innovative scientists of modern times. Alan Turing. In 1950, Turing published “Computing Machines and Intelligence”, which discussed the idea of conscious and thinking computers, claimed that there were no convincing arguments against the idea that machines could think like humans, and proposed the “imitation game”, now known Like the “Turing test,” he proposed a way to measure whether or not an AI could think for itself, and if it could correctly fool a human into believing that a human had a certain probability, it could be considered intelligent.
Between 1964 and 1966, the German scientist Joseph Weisenbaum wrote a neurolinguistic programming algorithm, known as ELIZA, that used pattern matching techniques to create a conversation. For example, in the script of a conversation with a “doctor” computer, if a patient says to the computer: “My head hurts,” the computer would respond to the doctor with a similar phrase such as “Why does it hurt? head?” Now he gives Elisa one of the oldest chatbots.
The 1980s were a major turning point in the production of NLP; In the past, NLP systems like ELISA formed conversations based on a complex set of rules, and the AI could not “think” on its own and was instead something of a chatbot, using “canned” responses to adapt to context.
In the late 1980s, NLP focused on statistical models that helped it form conversations based on probability.
How does smart translation work?
Modern NLP technology for speech recognition includes some common principles, such as speech recognition, speech recognition, language identification, and journal recording, that can distinguish between speakers.
Live Captions uses 3 deep learning models to shape the interpretation: two recurrent neural networks known as RNN; One for speech recognition, one for punctuation, and a convolutional neural network, or CNN, to classify audio events. These three models send signals that define the shape and trajectory of the entire translation, even in the presence of musical sounds.
By recognizing speech in an audio or video format, an automatic speech recognition system is activated, allowing the device to start converting words to text. When this conversation stops, for example when playing music, the system stops working to conserve the phone’s battery and the word “music” appears in the on-screen captions.
When the speech text is modeled, the punctuation marks are formed in the entire sentence above and the punctuation is constantly adjusted so that the results of the system do not interfere with the meaning of the entire sentence.
Currently Live Captions can only create indicative translations of English text, and they are constantly being improved and will one day be expanded to other languages. However, older versions of the Spanish, German, and Portuguese subtitles are now available on Google Meet.
Language represents a huge gap between people and technology has amazing potential to bring people together and through this technology and natural language processing these gaps between people can be bridged to build a bright future.