Tip:
Highlight text to annotate it
X
♪♪
Welcome everybody to today's
SMPTE Monthly Webcast!
This month's topic is Internet Captioning -
Implications of the Multi-platform,
Multi-Display Ecosystem.
I am your host, Joel Welch, SMPTE's Director of
Professional Development.
I'd like to take a moment to thank our sponsors;
AJA Video Systems, Blackmagicdesign, and
Ensemble Designs.
It's through their generous support that we're able to
bring SMPTE Monthly Webcast to our members
free of charge.
Now, we've invited back Jason Livingston.
Jason is a Developer and Product Manager for
CPC Closed Caption.
And he originally did the January closed caption
session, and because we've had sometime passed and we
have another FCC deadline coming up we thought
we'd invite him back.
And just to sort of lay the groundwork for Jason's
presentation, some of you who are participating today
were not on the January webcast, so we're going to
do a little bit of review and then Jason is going to
fill in some gaps and let us know what the upcoming
deadline is and take us into a little new territory
as well.
So Jason, without further delay, the floor is yours,
and if you click on the slide, you should be able
to advance it.
Okay.
Thank you Joel!
I hope everybody can hear me okay, right?
You sound great.
Great!
Okay, without too much further ado, I'll get
started, but thank you Joel for that introduction.
As Joel mentioned, I'm going to cover some backgrounds
that we covered in the January session, but I think
it's pretty important to go over it again for anybody
who is new especially and just to reiterate some
things that have changed or been updated
since last time.
So I'd like to talk about what is this whole closed
captioning thing and why is it so important, why is it a
problem, and then we'll go into some specific problems
and solutions and challenges in the marketplace.
So why does anybody care about this whole
closed captioning thing?
Well, there is these new FCC regulations, I say new, they
started about two years ago, and they're still phasing
in, they're still taking place, that require closed
captions from television broadcast to be available
when these videos are delivered over the Internet,
such as on the web and mobile devices.
And the whole reason SMPTE is involved in this process
at all is that SMPTE created a new specification called
SMPTE Timed Text or SMPTE 2052 standard to address a
lot of the challenges we're seeing for bringing the
closed captions from broadcast TV to
Internet video.
And why is closed captioning required?
Why do we care about this to begin with?
The number one reason that most people probably care
about is that it's the law.
I hate to say that you only do something because it's
the law, but let's face it, that's probably the case;
a lot of us do it because it's the law.
Even if for some reason you're not covered by the
legal requirements, and that affects both broadcasters
and also government institutions, academic
facilities, let's say your programming is currently
exempt and you're wondering why should you do closed
captioning because it's not required of you?
Well, one -- couple of facts that a lot of people
don't know about closed captioning is that
about 20% of U.S. households use closed captions.
They're not just used by people who are deaf or hard
of hearing, they're also used by people who are
learning English as a second language, because it helps
a lot with comprehension.
They're also used in cases where the audio
simply can't be accessed at that time.
For example, watching a video on a mobile device on
the train or on the bus without headphones,
or viewing a video in a restaurant or a bar
or an airport kiosk or something like that,
where it simply is too difficult to
get access to the audio content, you need closed captions
to be able to understand the content.
And a lot of people I think underestimate the number of
people who use captions due
to being deaf or hard of hearing.
The latest number I've seen is that more than 48 million
Americans have hearing loss, and I believe that's just
the U.S., that's not including Canada, Mexico,
of course all the other countries in the world where
closed captioning may be required as well.
And as I mentioned, for use in noisy environments where
you just can't access the audio of the video.
So even for some reason if you say, oh,
I'm exempt from this law,
I shouldn't have to provide closed captioning,
it's too much of a pain, you're really losing
out on a good portion of your viewership or your
potential market if you ignore that requirement.
So I recommend that you do it, and we'll talk about
how you can do it.
Just to talk about what the FCC is actually requiring,
because from my day-to-day work I pick up,
there's a lot of confusion on this topic.
One important aspect that a lot of people overlook,
besides just the basic requirement, you're required
or you're not required to have it, is that the closed
captions on Internet video and other playback
mechanisms are required to substantially or totally
replicate the look and feel of television broadcast
captions, including the formatting and positioning.
This is something that's really important that I've
noticed some people are not quite aware of.
There is a lot of interesting workarounds for
getting captions on a video, on the web, that might not
replicate the look and feel of TV broadcast captions,
and even though that's better than nothing, it does
not satisfy the FCC requirements.
So you may have some legal issues there if you're not
fully replicating the look and feel
of the TV broadcast captions.
The FCC rules also require that portable devices,
mobile devices, pretty much any kind of hardware or
software that is capable of playing back a video will be
required to implement what's called user controls.
These user controls is a user preference or a user
setting, such that the user can change the font size,
the color, the opacity, etcetera,
of the closed captions.
That's something that's been required for digital
televisions sold in the U.S.
back since, I think 2006, but this is a new
requirement for also mobile devices and browsers.
So if you are the manufacturer of a mobile
device and you support closed captions but you
don't support these user controls, you're going to be
running into compliance issues.
Same thing if you make a browser plug-in that
displays video, a video player or a set-top box, or
anything to that regard that plays video, you have to
implement the user controls.
And one thing that's kind of interesting is that the
SMPTE Timed Text format, when this was going through
the whole FCC legal process, the FCC wanted to be able to
point to a specific format that they recommend people
to use, and SMPTE Timed Text fell into that role.
So SMPTE Timed Text is specifically singled out by
the FCC as a safe harbor format.
And I'm not a lawyer so I can't tell you all the
implications of exactly what safe harbor means, but my
interpretation that various people agree with is that if
you accept SMPTE Timed Text on the input side and you
deliver it on the output side in your chain of
events, and also you fully adhere to the specification,
you'll be in compliance even if some problems occur.
Whereas, if you don't use SMPTE Timed Text as your
format and some problems occur, then potentially you
could have a greater liability than someone who
has the same problems but is using SMPTE Timed Text.
I won't really say if I agree with this, but that is
what I understand to be the case.
Were there any questions coming up
before I proceed forward?
Yes, actually.
Actually I was going to just jump in, we have one
question from William, and he asks, aren't the look and
feel requirements not yet implemented?
Yes, I believe that's correct.
The last thing that I've seen, I believe it is in the
spring of 2014 that the look and feel requirement and
also the player user control requirement will start
to take effect.
And I believe that's an interpretation of the law,
because the original law said it just took effect
immediately and the FCC pushed that back a little bit.
I should have added that to my next slide now that
you've mentioned, I'm going to look that up and tell you
the exact dates, but I believe it's
March or April 2014, if I remember correctly.
Thank you very much Jason!
That's the only question for now.
Great!
So that's a very poignant question, what are the
FCC deadlines for Internet captioning?
And just to clarify, these deadlines apply to videos
that are related to the world of
broadcast television.
So if I put a video of my dog barking on YouTube and
this video has never been shown on broadcast
television, it does not require closed captions.
The FCC has no purview over the Internet to be able to
require that; these deadlines apply to video
that required closed captions when shown on
broadcast television, because the FCC does have
purview over that.
One of the deadlines that already passed last year was
that pre-recorded programming that was not
edited for Internet distribution had to be
captioned when shown on the Internet,
that's already passed.
And this deadline went pretty smoothly I think.
I think a lot of -- most of the organizations that at
least I deal with had solutions in place
by the deadline.
Video on demand that's not live is relatively easy.
There's lots of solutions in the marketplace for dealing
with the captions for that.
So that went pretty well.
One of the ones that just passed a couple of months
ago was the requirement for live and near-live
programming to be captioned when shown on the Internet.
And this one caused quite a bit of a stir, because live
closed captioning is substantially different from
postproduction closed captioning or video on
demand, and that had some additional technical
requirements that threw a monkey wrench
into the process.
So when I did the SMPTE webinar back in January,
it was very hard for me to recommend specific solutions
because there was not a lot of stuff in the marketplace
to actually solve these problems.
But the good news is that in the time since then
some new solutions have come online.
So this problem is starting to work itself out.
And the interesting deadline that is still coming up in
about two months is that pre-recorded programming
that is edited for Internet distribution has to
be captioned if it's shown after September 30th this year.
So I think a lot of the video from the previous
September deadline got away without captions because it
was edited for the Internet, which can mean --
now, what does edited mean?
It can mean various things, and I think the FCC has some
guidelines as to what counts as editing
or does not count as editing.
But suffice to say, if you were editing your video,
which a lot of distributors do, you didn't have to
caption it until this September.
And so I expect there will be a small increase in the
[inaudible 00:12:45] when that deadline hits, because
a lot of people will need to have solutions
online by then.
Jason?
Yes.
We have a couple of questions.
Our friend Mike from Stonehill College asks
a question you may be covering later, but yes,
what's the impact on Line 21?
Well, Line 21 means different things depending
on who you ask.
As a technical standard, Line 21 is where the
captions were put in a standard definition video signal.
So to a lot of people the closed captioning standards,
which is called CEA-608, and I'm going to go into that in
a little bit, the synonym for that was Line 21,
because that's what the engineers called it.
So Line 21 can mean a technical standard for how
the captions are broadcast, but also in general just
kind of as the industry term people use Line 21 to mean
closed captions for the U.S. market,
even if it's not technically the Line 21
in an engineering spec.
So most of what I'm talking about is related to Line 21,
and I'll go into a little more detail on that later.
Okay, we have another question if you don't mind.
Yeah.
This is from David, he asks, does the closed captioning
requirement apply to any commercials which may be
embedded or associated with the programming material?
I believe that right now television commercials
are exempt from the FCC requirements.
That could change someday in the future.
I don't think they're requiring it right now.
You need to check with your station to be sure.
It may depend on your TV market as well,
the size of the market.
As I alluded to before, even if the commercials are not
closed captions, that's something you do want to
think about, especially if it's your commercial;
if it's somebody else's commercial,
maybe you don't care, but for your commercial you do want
to caption it, because you're missing out on a huge part
of the market if you don't.
We have one more question that I'll hold
for the Q&A period.
Okay.
I'm happy to do any questions about any slide
that I'm doing right now, but some of them hopefully
I will address as we go on.
So the question I get a lot is, why is this closed
captioning stuff so complicated, it's just some
text and some time codes, right,
that doesn't seem very hard?
The way closed captioning was originally implemented,
and this started in the late 1970s, so imagine the
technology we were dealing with back then,
the engineers back then were really clever to come up
with a way to get text and timing information onto an
analog video in the first place, in a way that a
consumer inexpensive decoder could actually
display on the signal.
And again, think about late 70s, early 80s what kind of
computer technology existed back then, trying to draw
characters on an analog video signal
was actually quite a feat.
It's a very low bandwidth signal.
It has to be stateless, meaning that if somebody
is flipping through channels, you don't have to have your
TV tuned to the show at the start of the show to get the
captions for the whole show rather they're streamed
progressively as the show goes on, so you can always
get the captions when you change channels.
Obviously the memory and processor power were
very limited back then.
But still despite those limitations they came up
with a pretty good spec, it can handle most of the Roman
alphabet-based languages; so English and Spanish, French,
Portuguese, Dutch, German,
I think there's a couple of others.
So it's a very interesting spec.
But we are stuck with a lot of the design decisions
made in that spec.
And why are we stuck with those, it's because there's
a huge amount of content -- libraries of archives of
content already in use that we need to preserve going
forward, and because of backwards compatibility.
So even though technology has moved forward a lot
since those days, we're still living with a lot of
those design decisions, and that complicates things.
But fortunately, the most important thing they did
a good job with is the backwards compatibility.
So we're not dealing with trying to convert and
translate huge archives of things into a completely
new way of doing things.
Just in case, you're unaware if you don't use closed
captions on a regular basis, some of the things closed
captioning can do -- well, when I say, closed
captioning I'm talking about the CEA-608 standard,
which is the standard we use
for analog broadcasts in North America,
and I'll talk about digital in the next slide,
but it can do a lot of interesting things.
You can reposition the captions on the screens.
You have different Justification settings.
You can have Split windows of text, and looking in the
lower left corner here, which is used when there's
multiple speakers speaking on top of each other and
various other situations.
There is a roll-up captioning, a smooth
scrolling type of captioning, that's mostly
used for live captioning.
You can see an example of some of the positioning
and special characters you can make use of.
So the way that these were done in CEA-608 was rather
complicated, and that complication has passed down
through the generations, including the switch
to digital, and now the switch to Internet delivery.
We're living with all of these complications.
And so what about CEA-708?
708 is the digital closed captioning standard that
we switched to starting in 2006, when the
analog to digital switch over happened.
And 708 is a new spec, but it kind of follows the same
philosophy, the same way of doing things is preserved
from CEA-608, because they wanted to really stress
backwards compatibility.
So it does add a lot of new features, but in many ways
we're still stuck with the limitations of the
608 standard, which goes back many, many years.
And some of those reasons are that, first off, most
caption authoring tools still target the 608 spec
only, because you have to ensure backwards
compatibility, and because there's no rule that says
that you have to make use of the advanced new features,
most people target the lowest common denominator,
and in some ways that's a good thing,
in some ways that's a bad thing.
Most of the common caption interchange files that
captioners and TV stations exchange with each other are
608 only, including SCC, which is a very commonly
used file format that we see in use
a lot in the industry.
And that's a terrible, terrible format that is
loaded with problems, and I could probably talk to you
for two hours about all the problems
the SCC format causes.
But unfortunately, people started, or various
companies started standardizing around SCC and
now we're stuck with a lot of those problems.
As I mentioned, we've got vast archives of captioning
content that it was captioned back in the
608 analog days and we need to move forward with those,
without redoing the captions.
The primary language captions have to be
backwards compatible with 608; 708 actually carries
608 data as well for backwards compatibility with
older receivers, older television sets,
DVD players and whatnot.
So a lot of the new features in 708 you have to stay away from,
because it's going to impact
the backwards compatibility.
This is a big one that I -- that personally frustrates
me a lot, and I'm not going to name any names here,
but a lot of the quality control hardware, things like
waveform monitors, scopes, professional equipment used
in television stations have a lot of problems dealing
with CEA-708 closed captions properly.
I mean, CPC software can encode a lot of these
advanced 708 features, and when you turn those on, they
work perfectly fine on a consumer television, right?
All the consumer TVs have no problem with them, but you
take this to the broadcast television station and they
run it through their professional quality control
equipment and the captions don't work,
they have all kinds of problems.
And so as a result we have to turn off
a lot of those advanced features,
we can't make use of them because the
rest of the industry has not caught up with that yet.
Hopefully they will someday, but I mean, we've been
dealing with this for five, six years now and it hasn't
improved too much yet.
So as a result pretty much captions are still a CEA-608
world, and what happens is that data is translated to
708 as the last step of the broadcast chain.
So at the TV station there is a piece of hardware that
takes the 608 captions and upconverts or translates
them to 708, and that's why all the TVs can receive 708,
even though everything is still authored as 608.
Jason?
Yes.
We do have a question, I think I know what the answer
is, but the question is,
can 708 carry a different 608 inside?
Well, when we talk about 708 and 608, for an analog
transmission you only have 608,
that's your Line 21 equivalent.
In the digital world you have a 708 container and
in that container you have native 708 caption data and
you also have 608 backwards compatibility data.
And the intention is that the 708 captions and the 608
captions on the same channel would have the same content.
In other words, CC1 in 608 is English; Service 1 in 708
is English, those should be pretty much the same thing.
That's the intention.
Whereas 708 can also carry more languages than 608 can.
So for example, 708 might carry a track that's in
Chinese or Japanese or Korean, things that you
cannot do in 608, because it doesn't support those,
and obviously those wouldn't exist in the
backwards compatibility bytes.
But for your primary language English, it will be
carried both 608 and 708 in the same stream.
Great!
Thank you!
Sure!
So we've been talking a lot about the broadcast
standards, now we want to move on to this web delivery
that we have to do.
The streaming video, the web video needs to replicate the
broadcast captions as closely as possible.
One, because the FCC mandates that; and two,
maybe more importantly or less depending on who you
ask, so that the people that rely on those closed
captions are not getting a second-class experience if
they watch the video online
as opposed to on their television.
So captioning on the web is not
a totally new phenomenon,
people have been doing this for a long time,
ever since video was on the web, back to the days of
RealPlayer and things like that, there were
specifications for closed captioning videos
on the web.
But those did not necessarily meet
the new FCC requirements.
One example, looking back, there is a lot of simple
text formats that can be used for captioning on the
web; SAMI, SMIL, SRT, and something called onTextData,
which is used in action media format for
Adobe Flash, they can carry text and timing information,
but they don't support all of the 608 features.
So if you're relying on those to carry closed
captions in your current workflow, you need to be
aware that this is not going to meet the FCC mandate,
it's not going to cut it.
You do have text, but you're not replicating the look and feel
of the original broadcast closed captioning;
things like positioning and roll-up and other things
that I showed in the previous slide.
Another format that's pretty popular is called Timed Text
Markup Language or TTML.
It was originally called DFXP, and a number of people
still call it that.
That's a very rich standard and that's actually what
SMPTE Timed Text is based on; it's a superset of it.
But the problem is when you have a very rich language to
express things, one problem is that the player or the
decoder is not necessary required to implement the
whole specification.
So in TTML you pretty much can replicate all the
CEA-608 features, but that doesn't mean that if you
have a TTML file and a TTML player that it's guaranteed
to support all of those features.
In fact, most -- prior to SMPTE Timed Text, most of
the TTML players that I've seen out there ignore the
positioning information; they ignore a lot of the
special formatting settings.
So even if your source file contains that information,
once it gets to the end user, they're not getting
the same look and feel as the TV broadcast captions.
So that would not be considered compliant with
the FCC regulations.
And another thing that we're asked about a lot is, why
don't you just put the broadcast 608 and 708 data
into the web video as well?
And actually that does happen.
Some devices do use that mechanism to receive closed
captions, and in that case, as long as the decoder works
properly, it will exactly replicate the look and feel
of the TV broadcast captions, because it's the
same as the decoder in the consumer television.
But this is not an easy thing to do,
writing a 608/708 decoder.
If you're a TV manufacturer, you probably have a library
available that does this and it's all set up
and ready to go.
But if you're a website developer, are you going to
go through these 608/708 standards, which are not
open standards by the way.
You have to pay to get access to these standards
and develop it for your web player, and then you move on
to some other website and develop a new web player,
and you're going to develop all this 608/708 stuff
again, it's pretty challenging.
So that's not a trivial thing to do.
And also, there are some video container formats that
simply don't have a place to carry 608 or 708 captions.
So depending on what you're using to carry the video
portion of the content, this might not even be
technically feasible.
Question!
Yes.
Bill asks, is it the broadcaster's responsibility
to make sure that every decoder works properly?
No, not the broadcaster, that responsibility will
fall on the decoders.
So you can imagine that if somebody files an FCC
complaint because they're not getting the broadcast
experience with their captions.
Obviously that complaint is going to go somewhere and
the broadcaster is going to say, well, we're using some
SMPTE Timed Text, we're covered by safe harbors, you
can't point the finger at us.
Then the finger is going to be pointed at, I presume,
depending on how the investigation goes, at some
point the finger will be pointing at
the playback device.
And if that playback device is not doing its job, then
that playback device is -- that manufacturer or that
software developer is who is going to get into trouble
when things are not working right.
So there's a whole process of how these complaints
get evaluated by the FCC.
You don't want to be the one the finger pointing at you
at the end of that process for sure.
But that brings up another thought just to add to that,
if this website is run by the broadcaster, let's say
you're X, Y, Z station, and you run a website xyz.com,
and on this website you have, let's say for example,
a Flash-based video player and the captions are not
working in that Flash-based video player, then yeah,
probably you are responsible for getting that player to
work, or replacing it with a player that does.
I don't think you can just blame the player for that,
because it's your website.
But if there is a content that's going to a venue that
is not under your control in terms of the playback
mechanisms, then I don't think the finger will point
at you for that.
Hopefully that makes sense.
And I'm not a lawyer, so if you want to know more you
really need to speak with a lawyer about that.
I have one more question if you don't mind.
Yeah.
If an IP video vendor or a broadcaster supports WebVTT
only for captioning in the video and the device side
supports SMPTE Timed Text and not WebVTT, who is not
FCC compliant here?
That is a good question.
Technically the FCC, I believe only SMPTE Timed Text
is the safe harbor format, and if you're not
delivering SMPTE Timed Text or you're receiving
SMPTE Timed Text but you cannot play that, then I think that
falls on to the party that is not delivering or playing
SMPTE Timed Text or native 608/708.
WebVTT, as far as I know, does not have that same
safe harbor exemption.
But on the other hand, if you're using WebVTT and the
captions work, then that's not a problem, that safe
harbor only comes into play if there's a problem with
the captions and you're using that as a defense to
justify why the captions don't work.
Thank you!
Sure!
I might have to speed up a little bit, because we're
halfway through our time and less than halfway
through the slides.
So SMPTE came together with a lot of industry groups to
come up with a new standard for captioning on the web to
try to unite everybody together.
They wanted to make a new standard that can be used as
a Mezzanine format; meaning a single file that works for
broadcast and for web delivery.
So that means it has to have the 608/708 data in it to
work with broadcast TV, and also something a little
easier for web players to work with, because working
with 608/708 is difficult.
The goal of SMPTE Timed Text was to work together with
existing captioning authoring tools and standard
practices so you're not making a big change.
It needed to support live
and post video on demand workflows.
It needed to be format agnostic; it's not tied to a
particular codec or a particular container
or streaming system.
And of course it had to address all the FCC and
legal requirements we've been speaking about,
and I think it does do that.
Just to explain a little bit the different methodology here,
what does 608 data look like compared to 708.
608 and 708 are a stream of bytes.
So every frame of video your television receives, it's
got some binary data tagged onto it,
which carries the caption data.
And in the TV there's a decoder that turns this data
into instructions.
So it's almost like a programming language and
it's pretty complicated.
The TV has to implement this same model, it's kind of
envisioned around the hardware decoder, to be able
to implement and understand these instructions and do
the right thing as these instructions come in.
So it's a stream of commands and information.
Whereas something like Timed Text or SMPTE Timed Text is
a text-based format, it's more of a human readable
format, and you can look at this and pretty much anybody
can figure out what this means.
It means you've got this music text with the music
symbols around it, and it appears at one time and
disappears at another time.
So that's a little easier for a web developer to
understand how to process that.
But when you get into a lot of the complicated things
that 608 and 708 can do, that's what makes the Timed
Text, the markup look a little more complicated.
It's trying to emulate all of those special features
that were based on analog video and hardware decoding
and NTSC spec.
A review of the current industry use of SMPTE Timed
Text as a format, and this is just kind of my opinion
from what I see.
Other companies, especially other worldwide companies,
may have a slightly different view on this, but
my opinion is that right now you don't see too many
software or companies authoring content directly
as SMPTE Timed Text or a TTML file, rather they are
authoring in some other format in their authoring
software and they export or convert to SMPTE Timed Text
as the last step when they deliver a file.
Hopefully we'll see some tools that author native
TTML, and then you can make use of more of the features
that are features that don't exactly overlap
with 608 or 708.
Where we do see a lot of use of SMPTE Timed Text
definitely is as a Mezzanine format; that is an
in-between format when party A is delivering video files,
video content to party B.
How do they deliver the captions?
They deliver it as a SMPTE Timed Text file.
So the SMPTE Timed Text is kind of like a template that
you can use to branch off to create all these other
formats that you need to create.
And in distribution, I'm not seeing a lot of SMPTE Timed
Text delivered directly to end-user devices; there is
some, and I'm going to talk about that,
not a whole lot though.
Usually the Mezzanine and SMPTE Timed Text file gets
converted into other formats for the consumer devices.
There are some provider specific applications on
mobile devices, and what I mean -- I mean, when you
download the app for station X, Y, Z's television feeds,
that app, that custom app might be using SMPTE Timed
Text internally, but it's not a device native
supported format.
And again, I apologize, I'm going to try to speed up a
little bit more to get through everything
before it's too late.
It went a lot faster in my practice run through.
So why is this so difficult?
I mean, what's the big deal, right?
The reason why this is so difficult is in the world of
broadcast video there is a single specification called
the ATSC Broadcast Spec, which is used all across
North America and a few other countries like
South Korea, most of the NTSC territories.
They use ATSC Spec, whether it's an antenna over the air
broadcast, or satellite, or a cable TV, or Internet
IPTV, it's the same ATSC specification used to carry
the video and the captions.
So every consumer TV receives the same spec of
video and that makes it a lot easier on everybody.
The web on the other hand is kind of like the Wild West.
We've got a lot of different competing standards,
competing formats, not just for delivering the video,
but also for delivering the captions.
If you looked six, seven years ago maybe, we were
living mostly in a Flash-based world,
everything used Flash plug-ins to deliver video,
but now we're seeing a lot of what's called
HTTP streaming or HTML5 streaming.
And there's various different technologies
different groups have come up with to carry
video over HTTP.
In other words, through your web browser
or through a web like service.
And these all support closed captions in one way or another,
but they all do it differently.
It would be nice if everybody is standardized on
SMPTE Timed Text, but that has not happened yet.
So we've got all these different methods that you
can deliver a video and different devices support
different methods; some support one or the other.
And then in terms of delivering the captions, you
can either have embedded 608 captions, with the caveat
that that's kind of difficult for the receiving
device to implement.
Or you can have what's called
a Sidecar Caption file.
This is a separate file that carries representation of
the closed captions.
And these come in various different formats, like
SMPTE Timed Text; also WebVTT, that was mentioned;
the TTML, which is kind of a subset or subclass of SMPTE
Timed Text, and some other formats other people have
come up with as well.
So this webinar is about the multi-format, multi-display,
multi-device workflows, and ideally we'd be living in a
world where you could pick one video and one caption
standard and that would work on every playback device,
but that's not yet the case.
For various political reasons, economic reasons,
and whatever, different devices support different
ways, and there's no one format that works
on every device.
That would be nice, but there's not.
So that means for now, realistically, to target as
many devices and playback mechanisms as possible,
you're going to have to have this video delivered
in multiple formats.
Ideally these formats would be some combination of
industry standards like HTML5 and SMPTE Timed Text
and possibly some other standards as well for
fallback compatibility.
It sure would be great if these industry standards
were supported by every device, right?
That would be fantastic.
That would make all our jobs easy.
But that's a lot of ideallys, right?
We don't live in a world where all of our
ideallys come true.
To give you an example of some of the problems, I'm
going through the different kinds of caption formats
that you can use, and in terms of web browsers or web
devices, mobile devices, like Android, tablets and
Apple iOS devices, the ones that support embedded
608/708 captions in the video right now is mostly
the Apple devices, from what I've seen, iOS devices,
and also Safari on the desktop.
So if the video signal has embedded 608/708 data, these
devices pick up on it, just like a broadcast television does,
so that works pretty well.
But that's not the only format you can use, you're
not required to support that format; you're just required
to support a format.
Looking at SMPTE Timed Text and native support
in browsers, we're not there yet.
As far as I can tell, I've had a lot of people tell me
that this browser or this other browser natively
supports SMPTE Timed Text, as far as I can tell
it does not.
And what I'm talking about is an HTML file video,
where you have a track, a subtitles track.
In theory that could be a Timed Text file, but we
don't have that yet.
Yes, sorry?
I'm sorry to interrupt.
On the previous slide you talked about embedded video,
a question came in, do you mean embedded in the
H.264 or MPEG-4 video?
Very commonly that is the case, yes.
In fact, most of the video you see streamed these days
is now H.264.
That doesn't have to be the case, but yes, H.264 can
have embedded 608/708 closed captions, that's part of the
specification, and so that's how that works for these
devices here that do support that.
Thank you!
I thought I'd try to sneak it in.
No problem, no problem.
So SMPTE Timed Text, hopefully we'll see this
start to pick up.
We'll see new web browsers implement this.
WebVTT has a little bit better support across
various browsers, but there are some caveats here.
Some of the browsers on this list will display a WebVTT
captions file, but they don't support all of the 608
styling that you need to replicate the look and feel
of the captions.
In other words, when these browsers display a WebVTT
file it does not look like closed captions, it doesn't
support the positioning and the formatting that closed
captions can do, whereas some of the other browsers
can do that.
And again, this is the result of my testing; there
may be slightly newer versions that implement
these things, but not that I'm aware of.
So what a lot of groups want to do is to supplement
the browser support.
A lot of browsers don't natively support every
format that you want to support, so what you can do
is you can do something called a JavaScript polyfill.
This is some JavaScript code that runs on the site that
takes the place of any missing features
in the browser.
So you can have some JavaScript that parses that
particular kind of caption file and displays the
captions over the video.
And that solves a lot of the problems.
If you write your own JavaScript and you support
all the features you need to support, then you'll be
covered in all of the desktop browsers.
You do need to do all of those FCC mandated things
like formatting and positioning of the captions
correctly, but different groups like a W3C Group is
currently working on that.
However, there is a big however on this,
it sounds like an easy solution, right?
Oh, just use some JavaScript and that will solve
all your problems.
The problem is that many devices that play video
do not play the video in the web browser.
When you click on a video, you're surfing on a website,
you're in your web browser, you click on a video, and
the video opens in a separate application.
That application is not a web browser.
It doesn't have -- or it doesn't necessarily support
things like JavaScript and HTML5 and CSS.
So you could have all this JavaScript that works great
on your desktop browser, but then when you view the same
website in a mobile device and click on the video, the
video works, but none of the captions work.
That's because you've exited the web browser and now all
those web browser features are no longer
available to you.
So in that case the device has to have some other way
to get access to the captions.
Not only does the device have to have some way, but
the provider has to provide the captions in that way.
In other words, for example, if the device is using
WebVTT as a fallback to support captions outside of
the browser, you have to supply the captions as WebVTT.
If you supply the captions as SMPTE Timed Text, the
video provider is doing their due diligence, but the
captions are not going to work on that device.
So a workaround that a lot of groups are going with it
is to create a separate custom player application.
You can make a branded app that ties to your video
channel or your distribution channel and that app can
support whatever you want.
If you want to implement SMPTE Timed Text or WebVTT
or embedded 608/708, whatever you want to do,
you can implement that in that app
and that solves the problem.
Except, you have to make sure that your app
implements all those mandated FCC features,
because even if the device implements the FCC features,
if your app does not implement the FCC required
features, like the user controls that will be
required in early 2014, then your app is going to
stand out as a problem.
And also, the downside of this method, it sounds
so great, is that you're going to be writing and supporting
and delivering an app for every platform
you want to target.
And delivering an application that gives a
consistent look and feel across different devices
is very challenging.
For a big network, a big provider, that's not so bad.
But if you're a smaller network or a smaller
provider making an app to work on every device out
there, that's going to be quite a challenge.
Hopefully what we'll see soon is some third-party
frameworks, like a universal framework that you can
implement in your app to get some of these features, but
then you have to license that and there's other
issues with that.
And again, I do apologize, I'm rushing really fast, I'd
like to get through as much of this as possible, because
I think we're like halfway through and you guys have
had some really good questions.
So we do have Format Fragmentation going on.
If you want to deliver the same video to a variety of
different devices, you have to use,
not only different streaming methods,
but also different captioning methods.
If you have a server or a CDN, Content Delivery
Network that supports these different variants, can
convert between these different variants,
then that's not so bad.
And I'll just namedrop two, and I'm not saying these are
the only two, but I can say two that I've personally
worked with and know that they work is Akamai's video
streaming platform and Wowza Streaming Media Server,
the latest version.
They have added the capability to convert
between these different video and caption formats to
deliver to, if not all, at least a good substantial
portion of the devices you want to target.
And that capability did not exist looking back 6 months ago,
12 months ago, that's something new.
So we're really glad to see that happening,
and hopefully some of the other types of servers
and providers will also implement.
And maybe they do.
Again, I'm not saying that those are the only two that do;
I'm just saying those are the only two that I
personally have experience with.
Another issue we're dealing with right now is TTML or
SMPTE Timed Text Fragmentation,
because TTML and SMPTE Timed Text are such a huge spec,
they can do so many things beyond 608/708
that we've been talking about.
There's also many different ways
to implement the same thing.
You can make the same look and feel of the captions
using very different constructs in Timed Text.
Now, what that means is that, even though you might
have a file that's completely compliant with
the SMPTE Timed Text specification, some vendors
might not like that file because you're doing things
differently than they are doing things.
And unfortunately, what that means is even thought SMPTE
Timed Text is a -- and I'm doing air quotes here, if
you can imagine me "standard", you may have to
make different variants of it or convert between
different variants, different flavors of it.
Just to show you a little example, this is a
screenshot from the Export menu in the CPC software
that I work with, and 1, 2, 3, 4, 5, there is
at least five different variants of SMPTE Timed Text
and TTML-based files that are very similar, they're all
spec compliant, but they're all used for slightly
different purposes, different workflows, and
they're not necessarily mutually compatible.
In other words, if your distributor says we want a
SMPTE Timed Text file and you give them a SMPTE Timed
Text file, it's not yet guaranteed that that is
going to work; they might want a different variant.
Unfortunately, that's something we have to deal
with, and hopefully as time goes on that's going to get
a little bit better.
And Joel, if it's okay, can I like speed through some of
the rest of the slides, I know we're running out of
time, but I can stay a little longer to do Q&A,
if that's okay?
Sure, that's fine with me if it's okay with our guests,
and we will ask the questions and make them
available during the on-demand playback.
So if we're not able to get to every question for
everybody while they're available, it will be
available later.
And absolutely, if you have any questions you want to
ask me later, you can ask me later as well,
so I apologize about that.
One of the things we talked a lot about in January was
that there was not a lot of good solutions for live
closed captioning to the web.
Fortunately that has changed now.
There are now encoders that can put the 608/708 data
into the stream, just like they do for TV broadcast.
That gets sent to the CDN, and the CDN or the server is
going to do a conversion.
They're going to convert the 607/708 or other embedded
types into all the different deliverables that you need
to deliver to all these different devices.
But you do need to make sure that your encoder and your
Content Delivery Network or your server
can speak the same language.
You might have a encoder that supports closed
captions and you have a server or a CDN that
supports closed captions, but if they do it
differently, then they're not going to mashup, the
closed captions are going to be lost.
So you need to make sure that even though you've
checked all the checkboxes and all the different
devices and products in your workflow have the checkbox
saying closed captioning supported, you need to
understand what that actually means, because it
might not meet the FCC mandates, or it might not be
compatible with the device further down the chain.
Another issue, this is not related to captioning too
much so I'll be very brief, but how do you stream live
video to a device or a browser in the first place?
We have all these different protocols for streaming
to different devices.
A lot of them require a plug-in like Flash or
Silverlight or QuickTime, etcetera, and you can see
here there's not a huge amount of overlap, which
means that if you want to target multiple devices, you
do need to stream, not just the captions, but also the
video itself needs to be streamed using different
protocols or different streaming formats.
And different combinations of these support different
kinds of caption files, so that makes it
a little bit more complicated.
And again, I apologize for rushing through this, but
I'd like to get through as much as I can.
The way some of the encoders do it right now is they
actually embed the captions into the video stream
itself, and then it's up to the server or the CDN to get
those to the devices.
The other way of doing it, very similar, it's called
out-of-band transmission of CC data.
In this case the encoder delivers a video stream
without captions embedded and a separate caption
stream to the Internet and to the server, and then the
server needs to tie those together or convert them
and do other things.
That's a little bit easier to process maybe.
So we see some of the servers are going
with that route.
And we've been talking a lot about the delivery, but also
talking about the user's device,
where they view the video.
Ideally, someday in the future, all the devices will
support SMPTE Timed Text or combination of 608/708
or some other flavor, but that's not the case yet.
As I mentioned before, you can create
a custom player app.
In that case you can support whatever you want.
You can implement any format you want in that app.
But that's a lot of work to create and deploy and
support an app across all these devices
that you want to support.
So it would be really great if we could just do this
through the web browser and just have one format
like SMPTE Timed Text.
I think this is the way things are going.
I think eventually we may reach this point, and that
will make it extremely easy, but we're not quite there yet.
In the real world we're currently at the server has
to be converting the captions into different
formats, different flavors.
So this is where we're stuck now.
And hopefully that will be improving.
Also as I mentioned before, what we don't want to end up
is a world where every device supports SMPTE Timed
Text, but they're all different flavors, different
kinds of SMPTE Timed Text.
And again, all of these could be perfectly compliant
with the spec, they're just doing things in a different
way that is not mutually compatible, and for that
reason I know W3C is working on delivery profiles to try
to simplify things and get everybody
on the same page here.
So again, we covered why this is so difficult;
it requires cooperation between all these different parts of
the industry that normally don't cooperate with each other.
It's possible to have a workflow where every step
of the process has closed captioning checked,
certified, yes, supported, but when you put it together
the system doesn't work.
You need to make sure that all of these pieces can
speak to each other.
As of right now, there's a lot of encoders on the
market that do support some kind of closed captioning
data, you just need to make sure that they support the
type that your server or your CDN supports.
That was not the case six months ago.
There was not a whole lot of them on the market six
months ago, so that's a definite improvement.
Lots of Content Delivery Networks, and I mentioned a
couple by name, and I'm sure there's others, support CC,
but they might not support it on every device
that you want to target.
And every browser, every mobile device, every
operating system is at different stages
of implementing playback support for different formats.
So really what that means is, although we'd love to be
able to rely on one single format to handle all our
captioning needs, we're not there yet, and it's going to
be some time before we get there.
I don't know how much time -- I probably should go real
fast through this, and I always end up going real
fast through the most important part, right?
As we're coming up on the new deadline that's coming
up in two months, the deadline for captioning
edited content on the web, the challenge with that is
that when content is edited, a lot of times the editing
process strips out the closed captions, or the
closed captions no longer match with the edited version.
And re-captioning those videos from scratch would be
very costly, very time consuming.
The good news is there are tools out there from
multiple vendors, caption software vendors that can
take an Edit Decision List from your editing software
or from your automated video segmenting, video editing
system, and conform the captions to match the edited version.
I suspect that this is going to be a really important
feature over the next two months and especially going
forward after the mandate kicks in, because this means
you can automate the conversions.
You don't have to have a human being sit there and
manually fixing up the captions, editing the
captions, you'll be able to run these through an
automated system that fixes up the captions for you.
And when I say fix up the captions, they could be
converted to 608/708 or SMPTE Timed Text
or whatever other format you need to work with.
That was the most important thing that I wanted to cover.
A lot of these other slides are kind of a repeat of what
I did last time so I'm just going to skip through these
really quickly.
Legal rights to the captions; you might not have
the legal rights to use captions on the web even if
you have the legal rights to use them on TV, so be
careful of that.
And you can review these slides in the download
later, as Joel mentioned, so I probably should
skip through these.
Just some recommendations to captioners, how to author
the captions to best meet this new world that
we're living in.
Recommended practices for content creators on how to
deal with captions, because that's something that you
have to start doing, even if you didn't do it before.
And content providers and distributors, especially the
new web-based distribution outlets, different --
these are not traditional broadcast outlets,
but new outlets on the web.
You're jumping in both feet first into a new world that
broadcasters have had to deal with for a long time,
but it's kind of a new requirement to a lot of web
developers, so hopefully the information
I presented is helpful.
I'm happy to answer other questions that come in later
of course.
So just real important that these developers make sure
you are meeting all of your requirements.
Take a look at those FCC regulations.
Make sure that the products that you develop
are meeting those mandates.
And in conclusion, we didn't do too bad; I'm only a
couple of minutes over, we're stuck in this world
because of all of these design workarounds and
constraints of the original 608 standard.
And even though we have new standards like 708, and
that's been around for years now, since 2005, 2006, there
is always new video formats on the horizon, there is
always new web technologies that somebody has come up
with, there's always new standards coming out, so
closed captioning will continue to be a big
challenge in all aspects of the video production, video
distribution pipeline.
And I hope the information I provided can help you at
least think about these things, even if I didn't
give you concrete solutions.
But hopefully as time goes on we're going to see
expanded support for SMPTE Timed Text, SMPTE 2052,
or increased support for 608/708 native closed captions,
and that's going to solve a lot of the
problems that we talked about today.
And that is it for my presentation.
I'm happy to stick around a little longer
and take questions.
And I do apologize we had to rush through.
Thank you very much Jason!
We do have some questions, and I hope everybody that is
here can stick around at least for a few of them.
One of the questions that came in that was fairly
interesting I think is, how does adaptive bitrate
play into captions, or does it?
Yes, absolutely, adaptive bitrate, delivering the same
video in multiple different bitrates, there is different
technologies for doing that.
Apple came up with one called HTTP Live Streaming
or HLS, and that one supports embedded 608/708
and also they've added WebVTT support
as a Sidecar file.
So that supports closed captions.
That's both for live and for VOD, so that's pretty nice.
Another type is called Adobe HTTP Dynamic Streaming or
HDS, and that I believe uses SMPTE Timed Text or TTML to
carry the caption data, so that's also supported as
long as the playback device also can read that.
Microsoft has Microsoft Smooth Streaming Technology.
These are all pretty similar technologies for delivering
adaptive bitrate live streaming, and they all have
a way to support closed captions.
But the problem is the web browser or the playback
device, even if it supports this mechanism,
might not support that type of captions.
So that's what we need to see happen, is not just
in the streaming encoder or the server, but also in the
playback devices we need to see that support ramping up.
Okay.
Earlier you had mentioned some service providers that
could handle live or near real-time captions, and you
also mentioned decoders and players that didn't support.
Is there some place that people can get a list of
those service providers or perhaps a list of decoders
and players that do support SMPTE Timed Text that you
know of anyway?
That's a very good question.
The problem is that any list is likely to become
out of date very quickly,
because things are changing all the time.
If I can recommend this, on the CPC's website we do have
a bunch of sample videos that run through the
different compliance checks, and you're free to view that
page in different browsers, different devices,
and see what works and what doesn't.
There's a couple of other places where I have seen
people do blogs and articles with checklist saying this
works, this does not work.
I'm hesitant to recommend a specific one because I don't
know how up-to-date it is.
But I suspect if you go to the manufacturer's website,
anybody that does support closed captioning, it should
be pretty prominent, because that is a big deal.
So if you go to a particular encoder or manufacturer, a
device or server website and they don't talk about closed
captions in a prominent way, then maybe they don't
support it or they don't want to talk about it.
Sorry, I wish I could be more specific on that, but
I'm hesitant to get into too much namedropping here.
Yeah.
No, appreciate it.
Another interesting question came in, what about
EBU Timed Text?
EBU Timed Text is very similar in concept to
SMPTE Timed Text.
They're adding a few extra things related to forced
subtitles and multilingual things that apply to a lot
of the laws in Europe and abroad,
not in the U.S. market.
There also is a lot of talk about trying to make sure
EBU Timed Text and SMPTE Timed Text are very close,
and there's going to be a lot of compatibility there.
So it's not a huge problem.
In other words, there's going to be a huge overlap,
and the compatibility is going to be
pretty good between them.
Me, personally, I just happen to be
focused on the U.S. market a lot.
I know the U.S. regulations a lot better
than I know some of the
other regulations for territories that are looking
at EBU Timed Text, but I do believe they are so close in
concept that we're not going to have
too much difficulty there.
Great!
I just wanted to mention, I don't think I mentioned it,
the SMPTE Timed Text standard is free of charge
on the SMPTE website.
Go to www.smpte.org and go to the Digital Library,
you'll find that you'll be able to search for the
SMPTE Timed Text standard.
Jason, I do want to take a moment to thank you for
taking the time to put this together.
I appreciate you joining us again!
And I also want to thank our guests
for sticking with us as well.
But our sponsors are the ones also that we need to
recognize, because they are the ones that bring SMPTE
Monthly Webcast to SMPTE members free of charge.
And Jason, guests, people, friends of SMPTE, thank you
all for your time and attention!
Hope all is well!
Take care!
And we'll see you next month.
Thanks everyone!