Internet Captioning Implications of The Multi - Platform, multi - Display ecosystem

♪♪ Welcome everybody to today's SMPTE Monthly Webcast! This month's topic is Internet Captioning - Implications of the Multi-platform, Multi-Display Ecosystem. I am your host, Joel Welch, SMPTE's Director of Professional Development. I'd like to take a moment to thank our sponsors; AJA Video Systems, Blackmagicdesign, and Ensemble Designs. It's through their generous support that we're able to bring SMPTE Monthly Webcast to our members free of charge. Now, we've invited back Jason Livingston. Jason is a Developer and Product Manager for CPC Closed Caption. And he originally did the January closed caption session, and because we've had sometime passed and we have another FCC deadline coming up we thought we'd invite him back. And just to sort of lay the groundwork for Jason's presentation, some of you who are participating today were not on the January webcast, so we're going to do a little bit of review and then Jason is going to fill in some gaps and let us know what the upcoming deadline is and take us into a little new territory as well. So Jason, without further delay, the floor is yours, and if you click on the slide, you should be able to advance it. Okay. Thank you Joel! I hope everybody can hear me okay, right? You sound great. Great! Okay, without too much further ado, I'll get started, but thank you Joel for that introduction. As Joel mentioned, I'm going to cover some backgrounds that we covered in the January session, but I think it's pretty important to go over it again for anybody who is new especially and just to reiterate some things that have changed or been updated since last time. So I'd like to talk about what is this whole closed captioning thing and why is it so important, why is it a problem, and then we'll go into some specific problems and solutions and challenges in the marketplace. So why does anybody care about this whole closed captioning thing? Well, there is these new FCC regulations, I say new, they started about two years ago, and they're still phasing in, they're still taking place, that require closed captions from television broadcast to be available when these videos are delivered over the Internet, such as on the web and mobile devices. And the whole reason SMPTE is involved in this process at all is that SMPTE created a new specification called SMPTE Timed Text or SMPTE 2052 standard to address a lot of the challenges we're seeing for bringing the closed captions from broadcast TV to Internet video. And why is closed captioning required? Why do we care about this to begin with? The number one reason that most people probably care about is that it's the law. I hate to say that you only do something because it's the law, but let's face it, that's probably the case; a lot of us do it because it's the law. Even if for some reason you're not covered by the legal requirements, and that affects both broadcasters and also government institutions, academic facilities, let's say your programming is currently exempt and you're wondering why should you do closed captioning because it's not required of you? Well, one -- couple of facts that a lot of people don't know about closed captioning is that about 20% of U.S. households use closed captions. They're not just used by people who are deaf or hard of hearing, they're also used by people who are learning English as a second language, because it helps a lot with comprehension. They're also used in cases where the audio simply can't be accessed at that time. For example, watching a video on a mobile device on the train or on the bus without headphones, or viewing a video in a restaurant or a bar or an airport kiosk or something like that, where it simply is too difficult to get access to the audio content, you need closed captions to be able to understand the content. And a lot of people I think underestimate the number of people who use captions due to being deaf or hard of hearing. The latest number I've seen is that more than 48 million Americans have hearing loss, and I believe that's just the U.S., that's not including Canada, Mexico, of course all the other countries in the world where closed captioning may be required as well. And as I mentioned, for use in noisy environments where you just can't access the audio of the video. So even for some reason if you say, oh, I'm exempt from this law, I shouldn't have to provide closed captioning, it's too much of a pain, you're really losing out on a good portion of your viewership or your potential market if you ignore that requirement. So I recommend that you do it, and we'll talk about how you can do it. Just to talk about what the FCC is actually requiring, because from my day-to-day work I pick up, there's a lot of confusion on this topic. One important aspect that a lot of people overlook, besides just the basic requirement, you're required or you're not required to have it, is that the closed captions on Internet video and other playback mechanisms are required to substantially or totally replicate the look and feel of television broadcast captions, including the formatting and positioning. This is something that's really important that I've noticed some people are not quite aware of. There is a lot of interesting workarounds for getting captions on a video, on the web, that might not replicate the look and feel of TV broadcast captions, and even though that's better than nothing, it does not satisfy the FCC requirements. So you may have some legal issues there if you're not fully replicating the look and feel of the TV broadcast captions. The FCC rules also require that portable devices, mobile devices, pretty much any kind of hardware or software that is capable of playing back a video will be required to implement what's called user controls. These user controls is a user preference or a user setting, such that the user can change the font size, the color, the opacity, etcetera, of the closed captions. That's something that's been required for digital televisions sold in the U.S. back since, I think 2006, but this is a new requirement for also mobile devices and browsers. So if you are the manufacturer of a mobile device and you support closed captions but you don't support these user controls, you're going to be running into compliance issues. Same thing if you make a browser plug-in that displays video, a video player or a set-top box, or anything to that regard that plays video, you have to implement the user controls. And one thing that's kind of interesting is that the SMPTE Timed Text format, when this was going through the whole FCC legal process, the FCC wanted to be able to point to a specific format that they recommend people to use, and SMPTE Timed Text fell into that role. So SMPTE Timed Text is specifically singled out by the FCC as a safe harbor format. And I'm not a lawyer so I can't tell you all the implications of exactly what safe harbor means, but my interpretation that various people agree with is that if you accept SMPTE Timed Text on the input side and you deliver it on the output side in your chain of events, and also you fully adhere to the specification, you'll be in compliance even if some problems occur. Whereas, if you don't use SMPTE Timed Text as your format and some problems occur, then potentially you could have a greater liability than someone who has the same problems but is using SMPTE Timed Text. I won't really say if I agree with this, but that is what I understand to be the case. Were there any questions coming up before I proceed forward? Yes, actually. Actually I was going to just jump in, we have one question from William, and he asks, aren't the look and feel requirements not yet implemented? Yes, I believe that's correct. The last thing that I've seen, I believe it is in the spring of 2014 that the look and feel requirement and also the player user control requirement will start to take effect. And I believe that's an interpretation of the law, because the original law said it just took effect immediately and the FCC pushed that back a little bit. I should have added that to my next slide now that you've mentioned, I'm going to look that up and tell you the exact dates, but I believe it's March or April 2014, if I remember correctly. Thank you very much Jason! That's the only question for now. Great! So that's a very poignant question, what are the FCC deadlines for Internet captioning? And just to clarify, these deadlines apply to videos that are related to the world of broadcast television. So if I put a video of my dog barking on YouTube and this video has never been shown on broadcast television, it does not require closed captions. The FCC has no purview over the Internet to be able to require that; these deadlines apply to video that required closed captions when shown on broadcast television, because the FCC does have purview over that. One of the deadlines that already passed last year was that pre-recorded programming that was not edited for Internet distribution had to be captioned when shown on the Internet, that's already passed. And this deadline went pretty smoothly I think. I think a lot of -- most of the organizations that at least I deal with had solutions in place by the deadline. Video on demand that's not live is relatively easy. There's lots of solutions in the marketplace for dealing with the captions for that. So that went pretty well. One of the ones that just passed a couple of months ago was the requirement for live and near-live programming to be captioned when shown on the Internet. And this one caused quite a bit of a stir, because live closed captioning is substantially different from postproduction closed captioning or video on demand, and that had some additional technical requirements that threw a monkey wrench into the process. So when I did the SMPTE webinar back in January, it was very hard for me to recommend specific solutions because there was not a lot of stuff in the marketplace to actually solve these problems. But the good news is that in the time since then some new solutions have come online. So this problem is starting to work itself out. And the interesting deadline that is still coming up in about two months is that pre-recorded programming that is edited for Internet distribution has to be captioned if it's shown after September 30th this year. So I think a lot of the video from the previous September deadline got away without captions because it was edited for the Internet, which can mean -- now, what does edited mean? It can mean various things, and I think the FCC has some guidelines as to what counts as editing or does not count as editing. But suffice to say, if you were editing your video, which a lot of distributors do, you didn't have to caption it until this September. And so I expect there will be a small increase in the [inaudible 00:12:45] when that deadline hits, because a lot of people will need to have solutions online by then. Jason? Yes. We have a couple of questions. Our friend Mike from Stonehill College asks a question you may be covering later, but yes, what's the impact on Line 21? Well, Line 21 means different things depending on who you ask. As a technical standard, Line 21 is where the captions were put in a standard definition video signal. So to a lot of people the closed captioning standards, which is called CEA-608, and I'm going to go into that in a little bit, the synonym for that was Line 21, because that's what the engineers called it. So Line 21 can mean a technical standard for how the captions are broadcast, but also in general just kind of as the industry term people use Line 21 to mean closed captions for the U.S. market, even if it's not technically the Line 21 in an engineering spec. So most of what I'm talking about is related to Line 21, and I'll go into a little more detail on that later. Okay, we have another question if you don't mind. Yeah. This is from David, he asks, does the closed captioning requirement apply to any commercials which may be embedded or associated with the programming material? I believe that right now television commercials are exempt from the FCC requirements. That could change someday in the future. I don't think they're requiring it right now. You need to check with your station to be sure. It may depend on your TV market as well, the size of the market. As I alluded to before, even if the commercials are not closed captions, that's something you do want to think about, especially if it's your commercial; if it's somebody else's commercial, maybe you don't care, but for your commercial you do want to caption it, because you're missing out on a huge part of the market if you don't. We have one more question that I'll hold for the Q&A period. Okay. I'm happy to do any questions about any slide that I'm doing right now, but some of them hopefully I will address as we go on. So the question I get a lot is, why is this closed captioning stuff so complicated, it's just some text and some time codes, right, that doesn't seem very hard? The way closed captioning was originally implemented, and this started in the late 1970s, so imagine the technology we were dealing with back then, the engineers back then were really clever to come up with a way to get text and timing information onto an analog video in the first place, in a way that a consumer inexpensive decoder could actually display on the signal. And again, think about late 70s, early 80s what kind of computer technology existed back then, trying to draw characters on an analog video signal was actually quite a feat. It's a very low bandwidth signal. It has to be stateless, meaning that if somebody is flipping through channels, you don't have to have your TV tuned to the show at the start of the show to get the captions for the whole show rather they're streamed progressively as the show goes on, so you can always get the captions when you change channels. Obviously the memory and processor power were very limited back then. But still despite those limitations they came up with a pretty good spec, it can handle most of the Roman alphabet-based languages; so English and Spanish, French, Portuguese, Dutch, German, I think there's a couple of others. So it's a very interesting spec. But we are stuck with a lot of the design decisions made in that spec. And why are we stuck with those, it's because there's a huge amount of content -- libraries of archives of content already in use that we need to preserve going forward, and because of backwards compatibility. So even though technology has moved forward a lot since those days, we're still living with a lot of those design decisions, and that complicates things. But fortunately, the most important thing they did a good job with is the backwards compatibility. So we're not dealing with trying to convert and translate huge archives of things into a completely new way of doing things. Just in case, you're unaware if you don't use closed captions on a regular basis, some of the things closed captioning can do -- well, when I say, closed captioning I'm talking about the CEA-608 standard, which is the standard we use for analog broadcasts in North America, and I'll talk about digital in the next slide, but it can do a lot of interesting things. You can reposition the captions on the screens. You have different Justification settings. You can have Split windows of text, and looking in the lower left corner here, which is used when there's multiple speakers speaking on top of each other and various other situations. There is a roll-up captioning, a smooth scrolling type of captioning, that's mostly used for live captioning. You can see an example of some of the positioning and special characters you can make use of. So the way that these were done in CEA-608 was rather complicated, and that complication has passed down through the generations, including the switch to digital, and now the switch to Internet delivery. We're living with all of these complications. And so what about CEA-708? 708 is the digital closed captioning standard that we switched to starting in 2006, when the analog to digital switch over happened. And 708 is a new spec, but it kind of follows the same philosophy, the same way of doing things is preserved from CEA-608, because they wanted to really stress backwards compatibility. So it does add a lot of new features, but in many ways we're still stuck with the limitations of the 608 standard, which goes back many, many years. And some of those reasons are that, first off, most caption authoring tools still target the 608 spec only, because you have to ensure backwards compatibility, and because there's no rule that says that you have to make use of the advanced new features, most people target the lowest common denominator, and in some ways that's a good thing, in some ways that's a bad thing. Most of the common caption interchange files that captioners and TV stations exchange with each other are 608 only, including SCC, which is a very commonly used file format that we see in use a lot in the industry. And that's a terrible, terrible format that is loaded with problems, and I could probably talk to you for two hours about all the problems the SCC format causes. But unfortunately, people started, or various companies started standardizing around SCC and now we're stuck with a lot of those problems. As I mentioned, we've got vast archives of captioning content that it was captioned back in the 608 analog days and we need to move forward with those, without redoing the captions. The primary language captions have to be backwards compatible with 608; 708 actually carries 608 data as well for backwards compatibility with older receivers, older television sets, DVD players and whatnot. So a lot of the new features in 708 you have to stay away from, because it's going to impact the backwards compatibility. This is a big one that I -- that personally frustrates me a lot, and I'm not going to name any names here, but a lot of the quality control hardware, things like waveform monitors, scopes, professional equipment used in television stations have a lot of problems dealing with CEA-708 closed captions properly. I mean, CPC software can encode a lot of these advanced 708 features, and when you turn those on, they work perfectly fine on a consumer television, right? All the consumer TVs have no problem with them, but you take this to the broadcast television station and they run it through their professional quality control equipment and the captions don't work, they have all kinds of problems. And so as a result we have to turn off a lot of those advanced features, we can't make use of them because the rest of the industry has not caught up with that yet. Hopefully they will someday, but I mean, we've been dealing with this for five, six years now and it hasn't improved too much yet. So as a result pretty much captions are still a CEA-608 world, and what happens is that data is translated to 708 as the last step of the broadcast chain. So at the TV station there is a piece of hardware that takes the 608 captions and upconverts or translates them to 708, and that's why all the TVs can receive 708, even though everything is still authored as 608. Jason? Yes. We do have a question, I think I know what the answer is, but the question is, can 708 carry a different 608 inside? Well, when we talk about 708 and 608, for an analog transmission you only have 608, that's your Line 21 equivalent. In the digital world you have a 708 container and in that container you have native 708 caption data and you also have 608 backwards compatibility data. And the intention is that the 708 captions and the 608 captions on the same channel would have the same content. In other words, CC1 in 608 is English; Service 1 in 708 is English, those should be pretty much the same thing. That's the intention. Whereas 708 can also carry more languages than 608 can. So for example, 708 might carry a track that's in Chinese or Japanese or Korean, things that you cannot do in 608, because it doesn't support those, and obviously those wouldn't exist in the backwards compatibility bytes. But for your primary language English, it will be carried both 608 and 708 in the same stream. Great! Thank you! Sure! So we've been talking a lot about the broadcast standards, now we want to move on to this web delivery that we have to do. The streaming video, the web video needs to replicate the broadcast captions as closely as possible. One, because the FCC mandates that; and two, maybe more importantly or less depending on who you ask, so that the people that rely on those closed captions are not getting a second-class experience if they watch the video online as opposed to on their television. So captioning on the web is not a totally new phenomenon, people have been doing this for a long time, ever since video was on the web, back to the days of RealPlayer and things like that, there were specifications for closed captioning videos on the web. But those did not necessarily meet the new FCC requirements. One example, looking back, there is a lot of simple text formats that can be used for captioning on the web; SAMI, SMIL, SRT, and something called onTextData, which is used in action media format for Adobe Flash, they can carry text and timing information, but they don't support all of the 608 features. So if you're relying on those to carry closed captions in your current workflow, you need to be aware that this is not going to meet the FCC mandate, it's not going to cut it. You do have text, but you're not replicating the look and feel of the original broadcast closed captioning; things like positioning and roll-up and other things that I showed in the previous slide. Another format that's pretty popular is called Timed Text Markup Language or TTML. It was originally called DFXP, and a number of people still call it that. That's a very rich standard and that's actually what SMPTE Timed Text is based on; it's a superset of it. But the problem is when you have a very rich language to express things, one problem is that the player or the decoder is not necessary required to implement the whole specification. So in TTML you pretty much can replicate all the CEA-608 features, but that doesn't mean that if you have a TTML file and a TTML player that it's guaranteed to support all of those features. In fact, most -- prior to SMPTE Timed Text, most of the TTML players that I've seen out there ignore the positioning information; they ignore a lot of the special formatting settings. So even if your source file contains that information, once it gets to the end user, they're not getting the same look and feel as the TV broadcast captions. So that would not be considered compliant with the FCC regulations. And another thing that we're asked about a lot is, why don't you just put the broadcast 608 and 708 data into the web video as well? And actually that does happen. Some devices do use that mechanism to receive closed captions, and in that case, as long as the decoder works properly, it will exactly replicate the look and feel of the TV broadcast captions, because it's the same as the decoder in the consumer television. But this is not an easy thing to do, writing a 608/708 decoder. If you're a TV manufacturer, you probably have a library available that does this and it's all set up and ready to go. But if you're a website developer, are you going to go through these 608/708 standards, which are not open standards by the way. You have to pay to get access to these standards and develop it for your web player, and then you move on to some other website and develop a new web player, and you're going to develop all this 608/708 stuff again, it's pretty challenging. So that's not a trivial thing to do. And also, there are some video container formats that simply don't have a place to carry 608 or 708 captions. So depending on what you're using to carry the video portion of the content, this might not even be technically feasible. Question! Yes. Bill asks, is it the broadcaster's responsibility to make sure that every decoder works properly? No, not the broadcaster, that responsibility will fall on the decoders. So you can imagine that if somebody files an FCC complaint because they're not getting the broadcast experience with their captions. Obviously that complaint is going to go somewhere and the broadcaster is going to say, well, we're using some SMPTE Timed Text, we're covered by safe harbors, you can't point the finger at us. Then the finger is going to be pointed at, I presume, depending on how the investigation goes, at some point the finger will be pointing at the playback device. And if that playback device is not doing its job, then that playback device is -- that manufacturer or that software developer is who is going to get into trouble when things are not working right. So there's a whole process of how these complaints get evaluated by the FCC. You don't want to be the one the finger pointing at you at the end of that process for sure. But that brings up another thought just to add to that, if this website is run by the broadcaster, let's say you're X, Y, Z station, and you run a website xyz.com, and on this website you have, let's say for example, a Flash-based video player and the captions are not working in that Flash-based video player, then yeah, probably you are responsible for getting that player to work, or replacing it with a player that does. I don't think you can just blame the player for that, because it's your website. But if there is a content that's going to a venue that is not under your control in terms of the playback mechanisms, then I don't think the finger will point at you for that. Hopefully that makes sense. And I'm not a lawyer, so if you want to know more you really need to speak with a lawyer about that. I have one more question if you don't mind. Yeah. If an IP video vendor or a broadcaster supports WebVTT only for captioning in the video and the device side supports SMPTE Timed Text and not WebVTT, who is not FCC compliant here? That is a good question. Technically the FCC, I believe only SMPTE Timed Text is the safe harbor format, and if you're not delivering SMPTE Timed Text or you're receiving SMPTE Timed Text but you cannot play that, then I think that falls on to the party that is not delivering or playing SMPTE Timed Text or native 608/708. WebVTT, as far as I know, does not have that same safe harbor exemption. But on the other hand, if you're using WebVTT and the captions work, then that's not a problem, that safe harbor only comes into play if there's a problem with the captions and you're using that as a defense to justify why the captions don't work. Thank you! Sure! I might have to speed up a little bit, because we're halfway through our time and less than halfway through the slides. So SMPTE came together with a lot of industry groups to come up with a new standard for captioning on the web to try to unite everybody together. They wanted to make a new standard that can be used as a Mezzanine format; meaning a single file that works for broadcast and for web delivery. So that means it has to have the 608/708 data in it to work with broadcast TV, and also something a little easier for web players to work with, because working with 608/708 is difficult. The goal of SMPTE Timed Text was to work together with existing captioning authoring tools and standard practices so you're not making a big change. It needed to support live and post video on demand workflows. It needed to be format agnostic; it's not tied to a particular codec or a particular container or streaming system. And of course it had to address all the FCC and legal requirements we've been speaking about, and I think it does do that. Just to explain a little bit the different methodology here, what does 608 data look like compared to 708. 608 and 708 are a stream of bytes. So every frame of video your television receives, it's got some binary data tagged onto it, which carries the caption data. And in the TV there's a decoder that turns this data into instructions. So it's almost like a programming language and it's pretty complicated. The TV has to implement this same model, it's kind of envisioned around the hardware decoder, to be able to implement and understand these instructions and do the right thing as these instructions come in. So it's a stream of commands and information. Whereas something like Timed Text or SMPTE Timed Text is a text-based format, it's more of a human readable format, and you can look at this and pretty much anybody can figure out what this means. It means you've got this music text with the music symbols around it, and it appears at one time and disappears at another time. So that's a little easier for a web developer to understand how to process that. But when you get into a lot of the complicated things that 608 and 708 can do, that's what makes the Timed Text, the markup look a little more complicated. It's trying to emulate all of those special features that were based on analog video and hardware decoding and NTSC spec. A review of the current industry use of SMPTE Timed Text as a format, and this is just kind of my opinion from what I see. Other companies, especially other worldwide companies, may have a slightly different view on this, but my opinion is that right now you don't see too many software or companies authoring content directly as SMPTE Timed Text or a TTML file, rather they are authoring in some other format in their authoring software and they export or convert to SMPTE Timed Text as the last step when they deliver a file. Hopefully we'll see some tools that author native TTML, and then you can make use of more of the features that are features that don't exactly overlap with 608 or 708. Where we do see a lot of use of SMPTE Timed Text definitely is as a Mezzanine format; that is an in-between format when party A is delivering video files, video content to party B. How do they deliver the captions? They deliver it as a SMPTE Timed Text file. So the SMPTE Timed Text is kind of like a template that you can use to branch off to create all these other formats that you need to create. And in distribution, I'm not seeing a lot of SMPTE Timed Text delivered directly to end-user devices; there is some, and I'm going to talk about that, not a whole lot though. Usually the Mezzanine and SMPTE Timed Text file gets converted into other formats for the consumer devices. There are some provider specific applications on mobile devices, and what I mean -- I mean, when you download the app for station X, Y, Z's television feeds, that app, that custom app might be using SMPTE Timed Text internally, but it's not a device native supported format. And again, I apologize, I'm going to try to speed up a little bit more to get through everything before it's too late. It went a lot faster in my practice run through. So why is this so difficult? I mean, what's the big deal, right? The reason why this is so difficult is in the world of broadcast video there is a single specification called the ATSC Broadcast Spec, which is used all across North America and a few other countries like South Korea, most of the NTSC territories. They use ATSC Spec, whether it's an antenna over the air broadcast, or satellite, or a cable TV, or Internet IPTV, it's the same ATSC specification used to carry the video and the captions. So every consumer TV receives the same spec of video and that makes it a lot easier on everybody. The web on the other hand is kind of like the Wild West. We've got a lot of different competing standards, competing formats, not just for delivering the video, but also for delivering the captions. If you looked six, seven years ago maybe, we were living mostly in a Flash-based world, everything used Flash plug-ins to deliver video, but now we're seeing a lot of what's called HTTP streaming or HTML5 streaming. And there's various different technologies different groups have come up with to carry video over HTTP. In other words, through your web browser or through a web like service. And these all support closed captions in one way or another, but they all do it differently. It would be nice if everybody is standardized on SMPTE Timed Text, but that has not happened yet. So we've got all these different methods that you can deliver a video and different devices support different methods; some support one or the other. And then in terms of delivering the captions, you can either have embedded 608 captions, with the caveat that that's kind of difficult for the receiving device to implement. Or you can have what's called a Sidecar Caption file. This is a separate file that carries representation of the closed captions. And these come in various different formats, like SMPTE Timed Text; also WebVTT, that was mentioned; the TTML, which is kind of a subset or subclass of SMPTE Timed Text, and some other formats other people have come up with as well. So this webinar is about the multi-format, multi-display, multi-device workflows, and ideally we'd be living in a world where you could pick one video and one caption standard and that would work on every playback device, but that's not yet the case. For various political reasons, economic reasons, and whatever, different devices support different ways, and there's no one format that works on every device. That would be nice, but there's not. So that means for now, realistically, to target as many devices and playback mechanisms as possible, you're going to have to have this video delivered in multiple formats. Ideally these formats would be some combination of industry standards like HTML5 and SMPTE Timed Text and possibly some other standards as well for fallback compatibility. It sure would be great if these industry standards were supported by every device, right? That would be fantastic. That would make all our jobs easy. But that's a lot of ideallys, right? We don't live in a world where all of our ideallys come true. To give you an example of some of the problems, I'm going through the different kinds of caption formats that you can use, and in terms of web browsers or web devices, mobile devices, like Android, tablets and Apple iOS devices, the ones that support embedded 608/708 captions in the video right now is mostly the Apple devices, from what I've seen, iOS devices, and also Safari on the desktop. So if the video signal has embedded 608/708 data, these devices pick up on it, just like a broadcast television does, so that works pretty well. But that's not the only format you can use, you're not required to support that format; you're just required to support a format. Looking at SMPTE Timed Text and native support in browsers, we're not there yet. As far as I can tell, I've had a lot of people tell me that this browser or this other browser natively supports SMPTE Timed Text, as far as I can tell it does not. And what I'm talking about is an HTML file video, where you have a track, a subtitles track. In theory that could be a Timed Text file, but we don't have that yet. Yes, sorry? I'm sorry to interrupt. On the previous slide you talked about embedded video, a question came in, do you mean embedded in the H.264 or MPEG-4 video? Very commonly that is the case, yes. In fact, most of the video you see streamed these days is now H.264. That doesn't have to be the case, but yes, H.264 can have embedded 608/708 closed captions, that's part of the specification, and so that's how that works for these devices here that do support that. Thank you! I thought I'd try to sneak it in. No problem, no problem. So SMPTE Timed Text, hopefully we'll see this start to pick up. We'll see new web browsers implement this. WebVTT has a little bit better support across various browsers, but there are some caveats here. Some of the browsers on this list will display a WebVTT captions file, but they don't support all of the 608 styling that you need to replicate the look and feel of the captions. In other words, when these browsers display a WebVTT file it does not look like closed captions, it doesn't support the positioning and the formatting that closed captions can do, whereas some of the other browsers can do that. And again, this is the result of my testing; there may be slightly newer versions that implement these things, but not that I'm aware of. So what a lot of groups want to do is to supplement the browser support. A lot of browsers don't natively support every format that you want to support, so what you can do is you can do something called a JavaScript polyfill. This is some JavaScript code that runs on the site that takes the place of any missing features in the browser. So you can have some JavaScript that parses that particular kind of caption file and displays the captions over the video. And that solves a lot of the problems. If you write your own JavaScript and you support all the features you need to support, then you'll be covered in all of the desktop browsers. You do need to do all of those FCC mandated things like formatting and positioning of the captions correctly, but different groups like a W3C Group is currently working on that. However, there is a big however on this, it sounds like an easy solution, right? Oh, just use some JavaScript and that will solve all your problems. The problem is that many devices that play video do not play the video in the web browser. When you click on a video, you're surfing on a website, you're in your web browser, you click on a video, and the video opens in a separate application. That application is not a web browser. It doesn't have -- or it doesn't necessarily support things like JavaScript and HTML5 and CSS. So you could have all this JavaScript that works great on your desktop browser, but then when you view the same website in a mobile device and click on the video, the video works, but none of the captions work. That's because you've exited the web browser and now all those web browser features are no longer available to you. So in that case the device has to have some other way to get access to the captions. Not only does the device have to have some way, but the provider has to provide the captions in that way. In other words, for example, if the device is using WebVTT as a fallback to support captions outside of the browser, you have to supply the captions as WebVTT. If you supply the captions as SMPTE Timed Text, the video provider is doing their due diligence, but the captions are not going to work on that device. So a workaround that a lot of groups are going with it is to create a separate custom player application. You can make a branded app that ties to your video channel or your distribution channel and that app can support whatever you want. If you want to implement SMPTE Timed Text or WebVTT or embedded 608/708, whatever you want to do, you can implement that in that app and that solves the problem. Except, you have to make sure that your app implements all those mandated FCC features, because even if the device implements the FCC features, if your app does not implement the FCC required features, like the user controls that will be required in early 2014, then your app is going to stand out as a problem. And also, the downside of this method, it sounds so great, is that you're going to be writing and supporting and delivering an app for every platform you want to target. And delivering an application that gives a consistent look and feel across different devices is very challenging. For a big network, a big provider, that's not so bad. But if you're a smaller network or a smaller provider making an app to work on every device out there, that's going to be quite a challenge. Hopefully what we'll see soon is some third-party frameworks, like a universal framework that you can implement in your app to get some of these features, but then you have to license that and there's other issues with that. And again, I do apologize, I'm rushing really fast, I'd like to get through as much of this as possible, because I think we're like halfway through and you guys have had some really good questions. So we do have Format Fragmentation going on. If you want to deliver the same video to a variety of different devices, you have to use, not only different streaming methods, but also different captioning methods. If you have a server or a CDN, Content Delivery Network that supports these different variants, can convert between these different variants, then that's not so bad. And I'll just namedrop two, and I'm not saying these are the only two, but I can say two that I've personally worked with and know that they work is Akamai's video streaming platform and Wowza Streaming Media Server, the latest version. They have added the capability to convert between these different video and caption formats to deliver to, if not all, at least a good substantial portion of the devices you want to target. And that capability did not exist looking back 6 months ago, 12 months ago, that's something new. So we're really glad to see that happening, and hopefully some of the other types of servers and providers will also implement. And maybe they do. Again, I'm not saying that those are the only two that do; I'm just saying those are the only two that I personally have experience with. Another issue we're dealing with right now is TTML or SMPTE Timed Text Fragmentation, because TTML and SMPTE Timed Text are such a huge spec, they can do so many things beyond 608/708 that we've been talking about. There's also many different ways to implement the same thing. You can make the same look and feel of the captions using very different constructs in Timed Text. Now, what that means is that, even though you might have a file that's completely compliant with the SMPTE Timed Text specification, some vendors might not like that file because you're doing things differently than they are doing things. And unfortunately, what that means is even thought SMPTE Timed Text is a -- and I'm doing air quotes here, if you can imagine me "standard", you may have to make different variants of it or convert between different variants, different flavors of it. Just to show you a little example, this is a screenshot from the Export menu in the CPC software that I work with, and 1, 2, 3, 4, 5, there is at least five different variants of SMPTE Timed Text and TTML-based files that are very similar, they're all spec compliant, but they're all used for slightly different purposes, different workflows, and they're not necessarily mutually compatible. In other words, if your distributor says we want a SMPTE Timed Text file and you give them a SMPTE Timed Text file, it's not yet guaranteed that that is going to work; they might want a different variant. Unfortunately, that's something we have to deal with, and hopefully as time goes on that's going to get a little bit better. And Joel, if it's okay, can I like speed through some of the rest of the slides, I know we're running out of time, but I can stay a little longer to do Q&A, if that's okay? Sure, that's fine with me if it's okay with our guests, and we will ask the questions and make them available during the on-demand playback. So if we're not able to get to every question for everybody while they're available, it will be available later. And absolutely, if you have any questions you want to ask me later, you can ask me later as well, so I apologize about that. One of the things we talked a lot about in January was that there was not a lot of good solutions for live closed captioning to the web. Fortunately that has changed now. There are now encoders that can put the 608/708 data into the stream, just like they do for TV broadcast. That gets sent to the CDN, and the CDN or the server is going to do a conversion. They're going to convert the 607/708 or other embedded types into all the different deliverables that you need to deliver to all these different devices. But you do need to make sure that your encoder and your Content Delivery Network or your server can speak the same language. You might have a encoder that supports closed captions and you have a server or a CDN that supports closed captions, but if they do it differently, then they're not going to mashup, the closed captions are going to be lost. So you need to make sure that even though you've checked all the checkboxes and all the different devices and products in your workflow have the checkbox saying closed captioning supported, you need to understand what that actually means, because it might not meet the FCC mandates, or it might not be compatible with the device further down the chain. Another issue, this is not related to captioning too much so I'll be very brief, but how do you stream live video to a device or a browser in the first place? We have all these different protocols for streaming to different devices. A lot of them require a plug-in like Flash or Silverlight or QuickTime, etcetera, and you can see here there's not a huge amount of overlap, which means that if you want to target multiple devices, you do need to stream, not just the captions, but also the video itself needs to be streamed using different protocols or different streaming formats. And different combinations of these support different kinds of caption files, so that makes it a little bit more complicated. And again, I apologize for rushing through this, but I'd like to get through as much as I can. The way some of the encoders do it right now is they actually embed the captions into the video stream itself, and then it's up to the server or the CDN to get those to the devices. The other way of doing it, very similar, it's called out-of-band transmission of CC data. In this case the encoder delivers a video stream without captions embedded and a separate caption stream to the Internet and to the server, and then the server needs to tie those together or convert them and do other things. That's a little bit easier to process maybe. So we see some of the servers are going with that route. And we've been talking a lot about the delivery, but also talking about the user's device, where they view the video. Ideally, someday in the future, all the devices will support SMPTE Timed Text or combination of 608/708 or some other flavor, but that's not the case yet. As I mentioned before, you can create a custom player app. In that case you can support whatever you want. You can implement any format you want in that app. But that's a lot of work to create and deploy and support an app across all these devices that you want to support. So it would be really great if we could just do this through the web browser and just have one format like SMPTE Timed Text. I think this is the way things are going. I think eventually we may reach this point, and that will make it extremely easy, but we're not quite there yet. In the real world we're currently at the server has to be converting the captions into different formats, different flavors. So this is where we're stuck now. And hopefully that will be improving. Also as I mentioned before, what we don't want to end up is a world where every device supports SMPTE Timed Text, but they're all different flavors, different kinds of SMPTE Timed Text. And again, all of these could be perfectly compliant with the spec, they're just doing things in a different way that is not mutually compatible, and for that reason I know W3C is working on delivery profiles to try to simplify things and get everybody on the same page here. So again, we covered why this is so difficult; it requires cooperation between all these different parts of the industry that normally don't cooperate with each other. It's possible to have a workflow where every step of the process has closed captioning checked, certified, yes, supported, but when you put it together the system doesn't work. You need to make sure that all of these pieces can speak to each other. As of right now, there's a lot of encoders on the market that do support some kind of closed captioning data, you just need to make sure that they support the type that your server or your CDN supports. That was not the case six months ago. There was not a whole lot of them on the market six months ago, so that's a definite improvement. Lots of Content Delivery Networks, and I mentioned a couple by name, and I'm sure there's others, support CC, but they might not support it on every device that you want to target. And every browser, every mobile device, every operating system is at different stages of implementing playback support for different formats. So really what that means is, although we'd love to be able to rely on one single format to handle all our captioning needs, we're not there yet, and it's going to be some time before we get there. I don't know how much time -- I probably should go real fast through this, and I always end up going real fast through the most important part, right? As we're coming up on the new deadline that's coming up in two months, the deadline for captioning edited content on the web, the challenge with that is that when content is edited, a lot of times the editing process strips out the closed captions, or the closed captions no longer match with the edited version. And re-captioning those videos from scratch would be very costly, very time consuming. The good news is there are tools out there from multiple vendors, caption software vendors that can take an Edit Decision List from your editing software or from your automated video segmenting, video editing system, and conform the captions to match the edited version. I suspect that this is going to be a really important feature over the next two months and especially going forward after the mandate kicks in, because this means you can automate the conversions. You don't have to have a human being sit there and manually fixing up the captions, editing the captions, you'll be able to run these through an automated system that fixes up the captions for you. And when I say fix up the captions, they could be converted to 608/708 or SMPTE Timed Text or whatever other format you need to work with. That was the most important thing that I wanted to cover. A lot of these other slides are kind of a repeat of what I did last time so I'm just going to skip through these really quickly. Legal rights to the captions; you might not have the legal rights to use captions on the web even if you have the legal rights to use them on TV, so be careful of that. And you can review these slides in the download later, as Joel mentioned, so I probably should skip through these. Just some recommendations to captioners, how to author the captions to best meet this new world that we're living in. Recommended practices for content creators on how to deal with captions, because that's something that you have to start doing, even if you didn't do it before. And content providers and distributors, especially the new web-based distribution outlets, different -- these are not traditional broadcast outlets, but new outlets on the web. You're jumping in both feet first into a new world that broadcasters have had to deal with for a long time, but it's kind of a new requirement to a lot of web developers, so hopefully the information I presented is helpful. I'm happy to answer other questions that come in later of course. So just real important that these developers make sure you are meeting all of your requirements. Take a look at those FCC regulations. Make sure that the products that you develop are meeting those mandates. And in conclusion, we didn't do too bad; I'm only a couple of minutes over, we're stuck in this world because of all of these design workarounds and constraints of the original 608 standard. And even though we have new standards like 708, and that's been around for years now, since 2005, 2006, there is always new video formats on the horizon, there is always new web technologies that somebody has come up with, there's always new standards coming out, so closed captioning will continue to be a big challenge in all aspects of the video production, video distribution pipeline. And I hope the information I provided can help you at least think about these things, even if I didn't give you concrete solutions. But hopefully as time goes on we're going to see expanded support for SMPTE Timed Text, SMPTE 2052, or increased support for 608/708 native closed captions, and that's going to solve a lot of the problems that we talked about today. And that is it for my presentation. I'm happy to stick around a little longer and take questions. And I do apologize we had to rush through. Thank you very much Jason! We do have some questions, and I hope everybody that is here can stick around at least for a few of them. One of the questions that came in that was fairly interesting I think is, how does adaptive bitrate play into captions, or does it? Yes, absolutely, adaptive bitrate, delivering the same video in multiple different bitrates, there is different technologies for doing that. Apple came up with one called HTTP Live Streaming or HLS, and that one supports embedded 608/708 and also they've added WebVTT support as a Sidecar file. So that supports closed captions. That's both for live and for VOD, so that's pretty nice. Another type is called Adobe HTTP Dynamic Streaming or HDS, and that I believe uses SMPTE Timed Text or TTML to carry the caption data, so that's also supported as long as the playback device also can read that. Microsoft has Microsoft Smooth Streaming Technology. These are all pretty similar technologies for delivering adaptive bitrate live streaming, and they all have a way to support closed captions. But the problem is the web browser or the playback device, even if it supports this mechanism, might not support that type of captions. So that's what we need to see happen, is not just in the streaming encoder or the server, but also in the playback devices we need to see that support ramping up. Okay. Earlier you had mentioned some service providers that could handle live or near real-time captions, and you also mentioned decoders and players that didn't support. Is there some place that people can get a list of those service providers or perhaps a list of decoders and players that do support SMPTE Timed Text that you know of anyway? That's a very good question. The problem is that any list is likely to become out of date very quickly, because things are changing all the time. If I can recommend this, on the CPC's website we do have a bunch of sample videos that run through the different compliance checks, and you're free to view that page in different browsers, different devices, and see what works and what doesn't. There's a couple of other places where I have seen people do blogs and articles with checklist saying this works, this does not work. I'm hesitant to recommend a specific one because I don't know how up-to-date it is. But I suspect if you go to the manufacturer's website, anybody that does support closed captioning, it should be pretty prominent, because that is a big deal. So if you go to a particular encoder or manufacturer, a device or server website and they don't talk about closed captions in a prominent way, then maybe they don't support it or they don't want to talk about it. Sorry, I wish I could be more specific on that, but I'm hesitant to get into too much namedropping here. Yeah. No, appreciate it. Another interesting question came in, what about EBU Timed Text? EBU Timed Text is very similar in concept to SMPTE Timed Text. They're adding a few extra things related to forced subtitles and multilingual things that apply to a lot of the laws in Europe and abroad, not in the U.S. market. There also is a lot of talk about trying to make sure EBU Timed Text and SMPTE Timed Text are very close, and there's going to be a lot of compatibility there. So it's not a huge problem. In other words, there's going to be a huge overlap, and the compatibility is going to be pretty good between them. Me, personally, I just happen to be focused on the U.S. market a lot. I know the U.S. regulations a lot better than I know some of the other regulations for territories that are looking at EBU Timed Text, but I do believe they are so close in concept that we're not going to have too much difficulty there. Great! I just wanted to mention, I don't think I mentioned it, the SMPTE Timed Text standard is free of charge on the SMPTE website. Go to www.smpte.org and go to the Digital Library, you'll find that you'll be able to search for the SMPTE Timed Text standard. Jason, I do want to take a moment to thank you for taking the time to put this together. I appreciate you joining us again! And I also want to thank our guests for sticking with us as well. But our sponsors are the ones also that we need to recognize, because they are the ones that bring SMPTE Monthly Webcast to SMPTE members free of charge. And Jason, guests, people, friends of SMPTE, thank you all for your time and attention! Hope all is well! Take care! And we'll see you next month. Thanks everyone!