KLRU Control Room

What’s the Impact of AI on Subtitling?

Captioning and Subtitling are similar but different and rather than dive into the differences we’ll use the word subtitling generically for this article to reference both subtitles and captions.

The beginnings of a wave of change with subtitling are happening in television. You may not have felt the tremors yet, but you will.

Subtitles on television began in the early 1970’s. Live subtitling, however, did not begin until 1982 when it was developed by the National Captioning Institute using court reporters trained to write 225 words per minute using a Stenograph machine. This provided viewers with on-screen subtitles within two to three seconds of the word being spoken and it’s been that way ever since. But…

Stenography, as a profession, has been in decline since 2013. As senior professionals retire they are not being replaced by younger candidates who are passing on this field as a career in favor of other professions. One reason for this is the demanding requirements of the job. Candidates must be able to type at lease 225 – 250 words per minute with limited mistakes. This requires discipline and focus for many hours at a time. Schools that taught stenography typically had a graduation rate around 4% due to the high standards and have seen a gradual decline in enrollment that ultimately forced many schools to close. In the US alone, there are over 5,000 open stenography positions with limited candidates to fill those positions.

At the same time the Stenography profession began trending downward, the demands for live subtitling were rising. This increase was driven by new government mandates in multiple countries, the exponential growth of live and breaking news, 24-hour cable news cycles, and more live sports broadcasts. Additional competition for live captioners is also coming from corporate events, government briefings, meetings and increased usage from the legal system for depositions and trials that are creating resource issues and rising prices for human captioning.

So how do we fill the gap between a decline in human subtitling also facing an increase in market demand? Technology. Specifically, Artificial Intelligence (AI). The technology has been around for many years. In fact, voice recognition dates back to the late 1800’s and early 1900’s. The technology began to show significant improvement beginning in 1971 and continued to evolve into 2014 when it became commercially available. Early efforts suffered from accuracy problems and limitations.

With the acquisition of Screen Subtitling Systems, BroadStream began working to incorporate AI speech engines to create an automated solution we call VoCaption. VoCaption delivers live subtitling for your live broadcast that is more accurate than previous AI implementations and in many cases equal to or better than humans.

VoCaption can be used in multiple applications including:

  1. Emergency Subtitling – for those occasions when subtitles are expected to be in a program but for some technical reason they are not. VoCaption can help reduce those angry viewer calls because it can be activated in just a matter of seconds.
  2. Supplemental Subtitling – your news may provide subtitles using a script and teleprompter but for weather, sports, traffic and un-scripted field reports, rather than use a human who must be available and “on the clock” for the entire time, VoCaption can be turned on and off as needed.

The two biggest benefits you can expect are:

  1. Improvements in accuracy – A frequent comment we hear is “we tried AI several years ago and it wasn’t very accurate.” That was true then, but the technology has made excellent progress and accuracy is up significantly depending on the program genre and the audio quality. In addition, it’s easy to regionalize or localize the technology by using custom dictionaries to import regional or local names, geography, schools, sports teams and more. Utilization of these dictionaries will substantially increase accuracy and improve pronunciations and we can help you with that process during commissioning.
  2. Substantial savings – Human subtitlers are expensive and as the shortage of qualified subtitlers continues to decline you will likely see an increase in rates. Current estimates range from a low of $60 per hour to a high of $900 per hour depending on your needs. VoCaption is available when you need it and it sleeps when you don’t. You only pay for the times you need it so you will experience a significant cost savings versus human captioners.
VoCaption 1RU Hardware


VoCaption is available in both hardware and software versions for 3rd party caption inserters or as part of OASYS Integrated Playout and Polistream our leading, subtitle inserter. For more information you can contact us here and a representative will be happy to answer any questions, arrange a demonstration or provide a quotation.


polistream logo

New Polistream Software Release v3.7.0.74

Polistream U8100 1RU Server

BroadStream is pleased to announce a new software release for Polistream. Since the acquisition of Screen in late 2018 one of our goals has been to double-down on product improvements on all Screen products and you will begin to see new releases on a regular basis. Our two software teams in the UK and Croatia have successfully merged and now work together as one team.

This new release for Polistream provides an initial release of support for DVB-TTML, which is DVB’s new specification for text-based subtitles. The new TTML delivery specification will provide a transitional path to a common TTML subtitle format for both broadcast and Internet-delivered services. This means subtitles used for broadcast can be re-used later for video-on-demand services over DASH/CMAF.

DVB-TTML is especially suited for high resolution content like 4K and higher resolution programming that will be produced in the future. It is also closely aligned with other TTML-based systems including the EBU-TT-D specification, which was adopted for both DVB-DASH and HbbTV. You can learn more about DVB-TTLM here.

Initial Release of support for EBU-TT Live input and output uses Websockets. EBU‑TT can be used in a broadcasting environment to carry subtitles that are created in real time (“live” or from a prepared file) from an authoring station to an encoder prior to distribution, via intermediate processing units. EBU-TT Live Part 3 is carried over Websocket allowing input and output as Client or Server. You can learn more about EBU-TT here.

This new Polistream release includes:

  • Support for text backgrounds (TTML tts:backgroundColor attribute)
  • We’ve also added an EBU-TT-D text background selector on the UI Options page. This can be set to OFF, ON, or AUTO which enables backgrounds to be “forced off or on”, or it can be controlled by the box attributes in the subtitle.
  • When backgrounds are “ON”, the background color is black by default at the start of each row and may be set to other colors by the “new background” control codes embedded in the subtitle text.

Customers with an active support contract can contact Support(at) for access to this software upgrade.