What is real-time closed-captioning?
Real-time closed-captioning is the process of creating text from audio/speech broadcast on video programming displayed on television, and other sources such as streaming video. The text data is encoded into devices at broadcast stations/entities and displayed on devices equipped with decoder chips, with text appearing in various portions of the video screen. The captioned text is created by individuals using stenographic-based systems, or individuals using voice-captioning systems based on speech to text technologies and voice-writing and steno theories. Closed-Captioning was developed as a technology to provide accessibility to Deaf and hard of hearing individuals to video programming. In 1990 Congress passed the Television Decoder Circuitry Act requiring all televisions sold in the U.S. with a screen 13 inches or larger to be equipped with a chip to decode captioning text encoded in a broadcaster’s TV signal. In 1997, the FCC implemented a Report and Order which mandated closed-captioning to be phased in on all video programming (broadcast and cable) networks starting in 2000, with the goal of achieving 20 hours a day of captioning on every broadcast and cable network in the U.S. Real-time captioning has also been most useful as a tool promoting the use and comprehension of English by individuals learning English as their second language. Its use has been ubiquitous among the “mainstream”, i.e., individuals not using it as an assistive access tool, to promote accessibility to program content in entertainment, sports, meetings, for a wide range of individuals whose access to real-time information is essential.
AI vs Real-time Captioning – What’s In Your Head-end?
Advances in the field of artificial intelligence spurred by developments in neural network research, computer processing power and, in some respects, the Internet have given rise to a myriad of speculation about applications for this technology to speech to text. Although advances have been impressive, current speech-to-text is not producing better than 90% accuracy on even the most common transcription or speech to text applications, at least from commercially available systems. Cognitive resolution of words to produce accurate replication of speech-to-text is still elusive, and still awaits further advances in neural network/cognitive recognition by these computing systems. Stations contemplating the use of such systems should assess whether these systems are capable of rendering the level of accuracy, let alone other elements of quality that will serve individuals in need of such an accessibility tool, particularly when emergencies or emergency coverage arises. Stations can ill afford to transmit incomplete or inaccurate information during emergency announcements/live coverage of special reports. Real-time captioning by highly skilled, trained captioners is the best and most reliable technology available today to ensure that timely, accurate, potentially life-saving information can be delivered to individuals dependent on captioning for vital information. In this respect, real-time captioning implemented on regularly scheduled programming for stations located in areas prone to emergency weather circumstances, and in large DMAs, should view the use of real-time captioning as another form of business insurance. The potential liability to a broadcaster by failing to provide accurate information, or any information which provides accessible information through captioning in an emergency situation, particularly if injury or death follows such failure, could lead to penalties. The cost of maintaining an AI backbone to ensure highly accurate captioning should be considered by stations, i.e., cost of on-site management, including engineering and IT network support for such a system. “What’s in your head-end?” is the question engineering personnel need to consider, to assess all the costs of benefits before implementing speech-to-text systems not ready for prime time.