Tip:
Highlight text to annotate it
X
DAVID TSENG: Hi everyone, my name is David Tseng.
I am a software engineer on the accessibility
team here at Google.
And I have here with me my colleague, Volker Sorge, who
is a visiting professor from the University of Birmingham.
We're here today to talk to you about ChromeVox, and some
of the exciting work we've been doing to make reading of
mathematics on the web possible.
To tell you more about this, I'm going to hand it off to
Volker for a brief introduction to the work.
And I'll then come back for a brief introduction to
ChromeVox, in general.
Volker.
VOLKER SORGE: Thank you, David.
So my name is Volker Sorge.
I'm a senior lecturer at the University of
Birmingham in the UK.
And I've been spending the last year, together with the
ChromeVox team here, to make an effort in order to make
scientific material that's out of the web accessible for
people with visual impairments.
And first, we have concentrated on making
mathematics accessible, using ChromeVox.
Now why is that important?
Well, one of the problems with scientific material is that it
is a hurdle in education that many people that have visual
independents are excluded from having scientific material
readily available as their normal peers would have.
Because it is much harder to make scientific material fully
accessible.
And particularly, if you think about something like
mathematics, it is not as straightforward as reading
just regular texts.
Therefore, it is often very difficult to make it
accessible using your average screen reader.
Now there have been a number of efforts in order to make,
in particular, mathematics accessible,
that are on the web.
But these are generally efforts that require either
particular, special tools or particular plug-ins for
special browsers.
So what we have been trying to do with ChromeVox is make
mathematics accessible in just the standard screen reader--
so making, effectively, mathematics a first class
citizen in the world of accessibility.
And we have done, also, in addition, some special
elements, like, in particular, exploration of mathematics,
that we believe are fairly novel, and that we're going to
demonstrate to you in the upcoming hour.
So first, we'll start with a general introduction to what
ChromeVox actually is, before we then go into more detail on
how one can use it to speak mathematics, and also, how one
can, in particular, use it to customize how mathematics is
being spoke.
And for the general introduction, that's going to
be done by David.
David, please.
DAVID TSENG: Thank you, Volker.
So just to get everybody up to speed and on the same page,
I'm going to give a demo of ChromeVox, and its general
features and functionality as a general screen reader.
So for the past, I'd say, three or so years, we've been
developing ChromeVox.
And ChromeVox is just a Chrome extension.
So it's all written in JavaScript.
And we have access to all the information that any other web
page would.
And this gives us a lot of things that ordinary screen
readers don't get, and everything that they do get.
So we've been able to create something that, I think, is
fully featured, and allows a blind user to
navigate all of the web.
So to start off with, I will show everyone a standard page
from Wikipedia.
This is a square root.
The topic for this page is on square roots.
And I will start off by simply tabbing around, and letting
everyone hear the type of feedback we provide.
CHROMEVOX: Navigation--
internal link.
This page has one alert.
Do you want Google Chrome to save your password?
DAVID TSENG: So there, you've heard quite a few things.
So it says a lot.
It says, first of all-- the first thing that you've heard
was "navigation" and "link--" "internal link." So those bits
of information are contextual.
And it tells you what the currently focused item is.
So a user who's using a screen reader can then know how to
interact with this.
So since it's a link, one can press it, and therefore land
in another place on this page, since it's an internal link.
And the second bit of feedback you heard was an alert given
by the Chrome browser.
So we actually speak that, as well.
So besides landing on focusable items, we can
actually move about the page and move to
things that aren't focusable.
So you do that by pressing the ChromeVox modifiers, and the
arrow keys.
So let's hear what that sounds like.
CHROMEVOX: Comma.
Search--
internal link.
Quote square roots quote redirects here.
For the music festival, see square roots--
link.
DAVID TSENG: And you hear quite a few
other things there.
There were some static pieces of text.
There were some more links.
And one other item that is kind of interesting to point
out is-- you heard a sound effect, kind of
like a "ding" sound.
And that actually represents a link.
So whatever a user hears that, they automatically know that
they can click on it.
So some other interesting things are that a user doesn't
necessarily have to go item by item.
They can actually jump from section to section.
And that's done via another keyboard command.
And let's hear what that sounds like.
CHROMEVOX: Contents--
heading two.
DAVID TSENG: So there, I just skipped a lot of material on
the page, and was able to jump right to the contents listing.
So let's try that again, and see if we find anything else
interesting.
CHROMEVOX: Properties--
heading two.
DAVID TSENG: All right, and keep on going.
CHROMEVOX: Left bracket, edit source-- link--
pipe, edit, beta, right bracket--
link.
Enlarge the graph of the function f left parenthesis x
right parenthesis equals square root of x-- math.
DAVID TSENG: And there, we have something that alludes to
the rest of the talk.
And it's how we actually make it possible to read that
expression, and make it possible for ChromeVox, and
the user, to then communicate, using something that's a
little bit richer and a little bit more unique.
So with that, one last thing to show is we can actually
read content, continuously, just by
pressing another command.
And it'll actually read it kind of like a book, from this
point on to whenever you want to stop it.
CHROMEVOX: Made up of half a parabola--
link-- with a vertical directrix--
link-- dot.
The principal square root function f left parenthesis x
right parenthesis equals square root of x-- math--
usually just referred to as the quote square root function
quote, is a function.
DAVID TSENG: And I'll just stop it there.
And that, overall, gives you a flavor of what ChromeVox is,
and what it does, and how the user would use it, and a
little bit of a taste of how math works, and spoken math
sounds like.
So with that, I will throw it back to Volker.
And he will continue on with some of the more interesting
things that we're doing with math, and
how this is all possible.
VOLKER SORGE: Thank you, David.
So as you could already hear when David was demonstration
the continuous reader on the previous paragraph, there was
math contained in the paragraph.
And the math was just being spoken out in a
fairly verbose way.
We'll talk about that a bit later, why this is so verbose.
And in order to explain how that is being spoken out, it
is probably worthwhile going a bit into detail on how math is
actually represented on the web.
And in particular, for those who are not fully aware how
math on webpages works, or should work, I'll give a quick
introduction on that now.
So ideally, mathematics on the web is represented by a
special markup language called MathML.
MathML has been around for some time.
And it has evolved and matured to a degree that it's now been
made a stand for HTML5, and as a consequence,
also for EPUB 3.
So in other words, every EPUB 3 reader, in the future,
should also be able to deal with MathML.
Now the question, of course, arises, why is there a
specialist markup language just for mathematics.
Well, the reason is that mathematics is fairly
involved, in terms of the way expressions are being set up
and how they're being structured, which is normally
not doable--
or not easily doable--
with just regular HTML.
In particular, think about that regular HTML is very good
with dealing with text and these things.
It can also do sub and superscript.
However as soon as things become more involved, as in a
mathematics expression--
which is, by nature, effectively two dimensional
and very, very nested--
HTML alone can normally not deal with it.
Therefore, MathML provides all the basic functionality--
or the basic tags--
in order to create complex mathematics expressions, and
display them on the web.
Now while it is, in general, a good thing that MathML is
fairly mature, and that it's a part of standards, what is a
drawback about MathML is that it is not implemented in all
browsers, or all EPUB 3 readers.
As a consequence, there are several ways of getting
mathematics on the web to work nevertheless.
So one of the ways it is done by many sites, such as
Wikipedia, by default, as well as other pages, is that one
actually displays an image of the mathematics instead of
having a MathML expression in there.
And therefore, it can be easily read by people who can
actually see the image.
But it cannot necessarily be picked up by a screen reader.
Now the advantage of this is, sometimes, that the real
mathematics is still provided somewhere in the background,
not visible.
And therefore, it can indeed be used by screen reader.
And I'll demonstrate it to you later, how that is being done.
The other way of displaying mathematics on all browsers on
all platforms is by using a little library called MathJax,
which is done by the MathJax Consortium.
And if you want to check this out, this is at mathjax.org.
And the way this works is that MathJax can be injected in any
page that contents mathematics.
And it's JavaScript.
And it just renders the mathematics, which is either
given as MathML or a more traditional markup language
like LaTeX or ASCIIMath.
And it then renders the mathematics.
Now the challenge with this process, for a screen reader
who actually wants to use the mathematics in order to speak
it, is that we actually have to deal with all these types
of mathematics as they can occur.
So we have to be able to deal with just standard MathML
expressions as well as expressions rendered by
MathJax, or expressions that are just hidden somewhere, in
an alternative text, behind the image.
And I'll show you, now, how that
actually works, in practice.
Right, so let me just quickly demonstrate how this is then
spoken by ChromeVox.
So let's go to the first math expression,
again, in this paragraph.
CHROMEVOX: f left parenthesis x right parenthesis equals
square root of x--
math.
VOLKER SORGE: As you can see, the first maths expression
here, in the DOM, is actually the math expression which is
in the caption of this particular image.
But it still spoke it.
And it went to it.
And you can see it's been indicated by this highlighter,
which is, for instance, particularly helpful if you
need to see where you are in the text-- so for instance, if
you are dyslexic.
And for dyslexic students, this is useful.
And we'll just go to the next maths expression, which should
now be this one in the paragraph.
CHROMEVOX: --left parenthesis x right parenthesis equals
square root of x-- math.
VOLKER SORGE: Right.
Now as I said, this works, now, nicely.
Because I'm actually logged into Wikipedia.
And if I were to now log out of the Wikipedia page, I then
would get it just as an image.
And it would still work.
Because we can use the alternative text, which is in
the background, which is actually the LaTeX expression
which you saw of the beginning before
the MathJax was rendering.
Now I'm not going to log out of this Wikipedia page,
because we will need it later.
However, let's go to a different page, where we have
a similar effect.
CHROMEVOX: Sum from Wolfram Math.
VOLKER SORGE: And this is Mathworld, which is a
particular encyclopedia, just for mathematics.
And what happens there is that you have the mathematics given
as an image only, always.
However, you also have alternative text.
And in that case, it's ASCIIMath.
And we can leverage this ASCIIMath in order to
pronounce it.
And just to show you that the math expressions here are
indeed images, I'll just grab it quickly, and
I'll move them around.
See, this is an image.
And I can take this sum, and just move it around on the
screen here.
And all these are images.
CHROMEVOX: A sum is the result of an addition.
For example, adding 1, 2, 3, and 4 gives the sum 10,
written 1 plus 2 plus 3 plus 4 equals 10.
Full stop.
Math.
VOLKER SORGE: So you could hear, now, that this is,
indeed, a math expression.
Although it was an image, it's being fully pronounced because
we can leverage the alternative text.
And you could hear, also, why it was a math expression,
because it was specially announced using an earcon as
well as, in the end, an announcement that this is a
math expression.
Similarly, we can go to the next math expression, for
instance, and--
CHROMEVOX: k equals 1 under and 4 over n-ary summation.
k equals 10-- full stop-- math.
VOLKER SORGE: So again, this is the maths expression how
it's being spoken at the moment.
And this now gives us the possibility to have math
spoken where it's on Wikipedia, even without
logging in, as well as on MathWorld, as well as on many
other sites where there's a similar effect.
So for instance, there's many math blocks out there.
So one of the block so I'm going to demonstrate here is a
famous block from a famous mathematician, Terry Tao, who
is a Fields medalist winner.
And he regularly blogs on his latest mathematical research.
And he does it in a similar vein to what we've just seen
on the MathWorld site, that the mathematics is actually
given in images.
And I'll grab one of those images here, and move it
around, so you can see that this is an image.
So this is an image.
Here's another one.
But the alternative text in the
background is indeed LaTeX.
And therefore, we can actually speak it, using ChromeVox.
So let's just speak this first definition, for instance.
CHROMEVOX: Definition one-- multiple dense divisibility--
let y greater than or equal to 1-- math.
For each natural number k greater than or equal to 0--
math--
we define a notion of k--
math--
tuply y--
math--
dense divisibility recursively, as follows--
list with two items--
every natural number-- list item-- n-- math--
is 0--
math--
tuply y--
math--
densely divisible.
If-- list item--
k greater than or equal to 1-- math-- and--
VOLKER SORGE: Right, I'll stop it here.
So what you could hear was that all the math in the
paragraph has been spoken.
Although it's as images, we used the alternative texts in
order to translate it with the MathJax library.
It's the same library we saw, previously, rendering the math
on the Wikipedia page.
When it comes back from the MathJax rendering, we can then
use this representation in order to speak it.
And however we do this in the background, in that sense,
none of the content in the page is
actually, physically changed.
So it still has the same visual appearance as before.
What you could also hear, when we were speaking the maths, is
that it was always giving a little earcon, and the math
announcement, which gives you the possibility to actually
find out where the math is.
And this is, in particular, important if you want to have
more interaction with the math.
And with this, I'll pass back to David, who's now going to
explain to you one of the specialities of ChromeVox
which allows you to actually interactively explore a
mathematics expression, which we believe is a very important
step in order for people to fully engage with mathematical
content that's on the web.
DAVID TSENG: I think everybody can identify with me,
personally, in that hearing all of this is kind of a lot
to take in.
I mean, if you consider the often quoted figure of 7 plus
or minus 2 bits of information that we, as humans, can store
and keep in our short term memory, what we just heard was
way more than seven words.
So what do we do about that?
Well, speaking personally, I, in the days when I was in high
school and in university, used several methods to actually
access mathematics.
One primary way, and one that's pretty popular still,
is to use audio books.
And at the time, we actually used these four-track audio
tapes-- so a tape which has two sides, using both stereo
channels, splitting those into mono, and having each channel
record audio tracks.
And someone--
a really, really nice volunteer--
would sit down in a room and read through
an entire math textbook.
So this included anything from algebra all the way up to
calculus and even beyond.
I remember listening to a real analysis book in college,
which was once in a lifetime experience.
So it was really, really difficult.
And there was a lot of rewinding, and pausing, and
taking notes.
So we feel like ChromeVox can do a lot better, since we have
a computer, and we have the ability to write code to make
things easier.
So what have we done, exactly?
Well, think about any expression--
and I'll demo this, briefly, later on.
But just think about something that you've seen in the past,
say, the quadratic formula.
How would you actually read that?
It's not exactly obvious.
And even harder is then to tell a person who's never seen
it before where all the pieces lie.
So if you think of it like geography--
where are all the countries?
And where are all the large pieces?
Where are all the continents?
And if you want to dive in to a specific region, well,
what's inside there?
And how do you actually give a quick summary of
all of these things?
And how do you actually let a person then ask you to
describe something further?
So this is the challenge that ChromeVox has faced.
And we feel like we've come up with a pretty good solution.
So let me go ahead and just demo that for you now.
We can hear how does it now.
And I can show you how a person would actually explore
the various pieces of it.
CHROMEVOX: We have x equals minus b plus/minus square root
of b square minus 4ac divided by 2a--
math.
DAVID TSENG: Now you see immediately
notice quite a few things.
There were a lot of words, for one thing.
We also slowed down the speech rate quite a bit so that we
can do some more intelligent things like speeding up
certain parts, slowing down others, inserting pauses,
changing pitch.
So one thing to add there is, you'll notice a little bit of
pitch drift.
So all the stuff in the numerator, you'll hear go up a
little bit.
All the stuff in the denominator, you'll hear go
down a little bit.
In that power, the b squared, the squared part is actually
up a little bit, even more, in pitch.
So it gives you another dimension to work with.
So pitch gives me, as someone who is the listener, an
additional hint as to how high something, or low something,
is vertically represented.
But this is still a pretty beefy example.
If you remember your first algebra class, it's a lot to
take in for someone who is seeing this
for the first time.
So how do we actually let someone explore it?
Well, we have one of these other ChromeVox commands that
lets you go and dive deeper into the structure.
So I'll go ahead and hit that.
CHROMEVOX: Entered math-- x equals minus b plus--
DAVID TSENG: And you heard that again.
There was a sound icon that
represents exploring something.
And you heard the expression starting to read again.
And I just stopped it, for the sake of brevity.
So how do we actually go and figure out
what pieces are here?
What's the general geography?
So there is this concept of granularities, in ChromeVox,
that lets you zoom in and zoom out, essentially, of something
on the page.
So I'll go ahead and press that.
And let's hear how that sounds.
CHROMEVOX: Down to level one-- x--
DAVID TSENG: So I just moved down a level.
And I'm in a bigger chunk not as big as the whole
expression.
So what is this?
This is x.
And it's obviously very important, so we let the user
hear that in isolation.
And we have some keys to move you around.
And those are basically to move you forward and backward
on the current level of things.
So let's hear what that sounds like.
And I will move forward.
CHROMEVOX: --equals--
DAVID TSENG: So "equals" is obviously also very important
in this equation.
So what's the next thing?
And I will go ahead and press Next again.
CHROMEVOX: --minus b plus/minus square root of b
square minus 4ac divided by 2a.
DAVID TSENG: And you heard, again,
that whole, big fraction.
And you heard the pitch changes, and everything else.
And I will press the Next key again.
CHROMEVOX: Minus b
DAVID TSENG: Oup, and you heard a little "ch-ch" sound.
And that represents that we've bumped against an edge.
So there's only three big chunks in this thing.
And as a user that, perhaps, has never seen this formula
before, well, I now have a really good sense of all the
big pieces, and all the big players, here.
But say I, still, am a little bit hazy on this fraction, and
I want to look a little bit more.
I can actually dive in even deeper.
So let's do that, and hear how that sounds.
CHROMEVOX: Down to level 2-- minus b plus/minus square root
of b square minus 4ac
DAVID TSENG: And you heard the numerator.
So that sounds great.
And let's move forward.
CHROMEVOX: 2a
DAVID TSENG: And that's the denominator.
So that all makes a lot of sense.
And let's try moving forward again.
CHROMEVOX: 2a
DAVID TSENG: Oup, bounced against an edge.
Move back.
CHROMEVOX: Minus b plus
DAVID TSENG: OK.
We heard that before.
Move back again.
CHROMEVOX: Minus b
DAVID TSENG: Oup, another edge.
All right.
So we've gotten a really clear sense of where
everything is, again.
So I can even dive even further.
And let's do that, real quick, into this numerator.
CHROMEVOX: Down to level 3--
minus
DAVID TSENG: OK.
Minus
CHROMEVOX: Minus
DAVID TSENG: Oup, that's the beginning.
So let's move forward, instead of backward.
CHROMEVOX: B plus/minus
DAVID TSENG: Forward
CHROMEVOX: Square root of b square minus 4ac.
Square root of--
DAVID TSENG: --oup, and that's the end
CHROMEVOX: --b square minus 4ac.
DAVID TSENG: And if we really wanted to, we can even dive
into the square root.
So let's do that.
CHROMEVOX: Down to level 4-- b minus 4ac.
DAVID TSENG: And that's it.
So with that, it is our hope that students will be able to
then really easily explore an equation like this.
That's now becoming possible with what
we're doing with ChromeVox.
And in the past, you would have had to rewind and fast
forward through a tape, and do it that way, or perhaps, if
you're really lucky, get math through braille.
But with the possibility of using online math, now you can
get it in real time, and look up a wealth of
information on the web.
So with that, I'll pass back to Volker, who will discuss
some of the intricacies of actually applying some of
these text to speech changes, and the way we speak, and some
cool stuff that you, as a developer, can actually do to
improve any math that you might come across.
VOLKER SORGE: And this, as you have previously mentioned
earlier, at the moment, what we've demonstrated to you, the
way things were spoken, is very, very verbose in the
sense that every single element is indeed
being spoken out.
Now that is not necessarily how mathematics is being
spoken in real life.
All right.
Often, you omit things, although they are written.
And this is something we can--
I'm going to demonstrate a few examples where things are
being very awkward when everything is being spoken,
and where we then apply changes to our underlying
structure and rule base in order to have them spoken more
intelligently.
And I will then, afterwards, tell you how this is actually
being done, and how that can be customized by users,
themselves, or by creators of web pages.
Right.
So this is, here, a Wikipedia page with a matrix example.
So matrixes are particularly difficult to speak, because
they are fully two dimensional objects.
And I'll show you, at first, how we could speak it in a
full, verbose fashion.
CHROMEVOX: Left square bracket matrix--
element one, one--
1.
Element two, one--
9.
Element three, one--
minus 13.
Element one, two--
20.
Element two, two--
5.
Element three, two--
minus 6.
Right square bracket--
full stop--
math.
VOLKER SORGE: Right, so as you could hear, everything was
being spoken.
Every bracket, every punctuation is being spoken.
Now we have the possibility, in ChromeVox, to actually
change the underlying representation in a way that
we can get a bit more intelligence, a bit more
semantics in there, if you like.
And actually, I'll do that now, in order so you can see
how this is being spoken then.
CHROMEVOX: Matrix--
row one, column one--
1.
Column two--
9.
Column three--
negative 13.
Row two, column one--
20.
Column two--
5.
Column three--
negative 6.
Math.
VOLKER SORGE: Right, so as you can see here, ChromeVox is
slightly more intelligent.
It applies a slightly better way of
pronouncing rows and columns.
In particular, it omits some of the punctuation which is
not strictly necessary for a reader to know, exactly.
Whether this is a round bracket or a square bracket,
in this particular instance, might not make any big
difference.
Right.
These things can get particularly worse when the
underlying representation is very similar, however the
meaning is particularly different.
So let's go back to our square root example.
And I'll read you the first example here, in the normal,
verbose style.
Let's go to this case statement down
here, on the page.
CHROMEVOX: Square root of x square equals--
vertical line--
x--
vertical line--
equals--
left curly bracket, matrix element one, one--
x comma--
element two, one--
if x greater than or equal to 0--
element one, two--
minus x comma--
element two, two--
if x less than 0.
Full stop-- math.
VOLKER SORGE: So what you could hear, now, was that it
not only spoke everything in a very verbose way-- all the
vertical bars, as well as the opening brace--
but ChromeVox also fell into the trap that it saw something
it thought might look like it's a matrix.
Because it's exactly the same MathML structure, underneath,
as the matrix on the previous page, which I have
demonstrated.
Now if we apply some slightly more intelligent way of
reading this here, we actually get something
slightly more sensible.
CHROMEVOX: Equation sequence--
square root of x square equal absolute value of x equal--
case statement, case one-- x, if x greater than or equal to
0-- case two--
negative x, if x less than 0.
Full stop-- math.
VOLKER SORGE: So you could see here that what happened was
that now, first of all, ChromeVox has a more semantic
interpretation of the expression.
It gives you a quick summary at the beginning-- it's a
sequence of equations.
So there's more than two parts of the equation.
And then, it reads it in a more intelligent way, by
realizing something is an absolute value, as well as
actually finding the case statement correctly, rather
than pronouncing it as a matrix, as it has done before.
And a similar semantic interpretation can then be
applied to all the other statements.
So if you go back, for instance, to this statement
here, and pronounce it, first, in the old way--
CHROMEVOX: f left parenthesis x right parenthesis equals
square root of x-- math.
VOLKER SORGE: All right, supply some semantics.
CHROMEVOX: f of x equals square root of x-- math.
VOLKER SORGE: Right.
So as you could see here, instead of pronouncing all the
parentheses, we now say "f of x." Because we realize this
is, indeed, a function application, and we pronounce
it accordingly.
Now the way this is being done is by using an underlying
speech rule engine, which allows for various ways of
customizing the rule base that is being used in order to
speak the mathematical expressions.
And I'll explain, very briefly, how that works.
Effectively, what happens is the math expression is, of
course, a tree representation.
In particular, it's a MathML tree representation.
And this representation is recursively being traversed.
And in every node, an applicable
rule is being computed.
And that rule then has some actions which fire, in the
sense that they produce some speech output, as well as
produce a way to further recursively traverse the rest
of the tree.
And these rules can be customized in various ways.
One way of customizing them is customizing them with respect
to mathematical domain.
So what might be useful in one mathematical domain might not
be the same in another mathematical domain.
And when I say "mathematical domain," what I actually mean
is something like algebra, geometry, analysis, calculus,
things like this.
So you might want to pronounce one expression, in algebra,
differently than in geometry.
In addition to this, we have the possibilities of
customizing rules by their style.
And when I say "style" here, I mean something like, do I want
it to be spoken short.
Do I want it to be spoken in a verbose mode, for instance.
Do I want it to be spoken in some particular way that I
like, and nobody else likes, et cetera.
And finally, we also have the possibility of swapping the
underlying representation while the
speech engine is active.
So that's what I've just showed you.
The underlying MathML representation, here, was
swapped out with a more semantic enriched
representation.
Now the problem with this semantic enrichment is, of
course, that this is a fairly ad hoc procedure.
MathML is a language which primarily aims for display.
So there's two types of MathML.
One is called Presentation MathML, and one is called
Content MathML.
The idea of Presentation MathML is that it takes care
of how the math expression is being laid out on the page.
And this is what we're currently working with.
The idea of Content MathML is that an author of a math
expression actually puts in some semantics.
Unfortunately, that is hardly available on the web.
Therefore, what we have to do is we have to take the
Presentation MathML expression, and interpret it
in a way that seems reasonable to us.
Obviously, this can break down in many cases.
And therefore, the person who actually knows best how to
interpret a MathML expression is generally the author of the
expression, themselves.
As a consequence, we believe that it is very useful for
people to actually be able to customize the speech rules in
a way that they will pronounced the MathML
expressions they write in the correct way.
We therefore, in ChromeVox, offer an API to users out
there, to authors of web pages, to developers of EPUB 3
readers that allows them to define their own math rules--
either override or add to our rule base--
in order to have math expressions being spoken in
the way they prefer.
And the rest of the time I will now spend introducing
this particular way of--
these particular rules, as well as demonstrate how they
can be written, and what the different components
of these rules are.
Let's, for this purpose, all go back to our favorite
example, the quadratic formula.
We'll start manipulating the division in the formula in
order to demonstrate how we can change rules.
Right.
Here, I'll bring up the console quickly.
Let me get back to the expression.
CHROMEVOX: Given we have x equals minus b plus/minus
square root of b square minus 4ac divided by 2a--
math.
VOLKER SORGE: Right, so this is the default way, currently,
ChromeVox speaks the expression.
I will now write a rule here, in the console.
I've already written it previously, so I'll just bring
it up here, and explain to you the components that will alter
to the way the expression is being spoken.
Right.
So what you can see here, in detail, is that we have--
these function names are the way one can call ControlVox--
sorry, not ControlVox, ChromeVox--
through the API.
So here's the ChromeVox API.
It's, in particular, the math API.
And it offers a defined rule function.
And the function here, at the moment, has four components.
All of these are string components.
The first component is a name.
It's practically irrelevant what that name is.
The whole idea of a name is that one can more easily find
the rule later, somewhere in the speech rule engine, say,
for debugging purposes.
But for now, we can just ignore this.
So we'll just call this "fraction rule."
Then, comes some admin information which has to do
with the domain, and the style, I've been referring to
previously.
So the domain was, just to recall this, the idea that we
can specialize our rules with respect to different
mathematical domains such as algebra, geometry, et cetera.
In addition, we always have a default domain, which is the
one we're working on right now, which means that if, for
a particular domain, no rule is there, we can always fall
back to the default rule.
And currently, the way domains are being selected is
interactively, by the user.
Right.
So this is what the default, here, says.
Then, we have a little dot, which separates the domain
name from the style name.
And styles--
again, we have styles like short, verbose, or I think we
also have super brief.
But there is no limit to what you want to call your style.
If you want to call your style after your personal name, then
we'll have an additional style in ChromeVox, in the future.
And again, there is a default style.
Here, we're working, now, on the default domain, and the
style is short.
All right.
And then, the rest of the rule is effectively a way how to
test whether a rule is applicable, plus the action
the rule is supposed to perform.
In ChromeVox, this is given the other way around.
We first start with the action and then with the
precondition, or the part of the rule that tests whether
the actual rule is applicable.
I'll tell you, in a second, why this is the case.
Let's first have a look at this letter part, which talks
about the precondition if the rule is applicable.
What we want is a node, which is our fraction up here.
This division node is given by a MathML tag,
which is called mfrac.
And we select this using an XPath expression here.
So this is a regular XPath expression, which just says,
well, the current node-- the node we are on--
is a node with an mfrac tag in the MathML name space.
So this is just a regular XPath expression.
Then, the rule will fire.
And it will perform it's action.
So what are the Actions?
Well, actions are a sequence of components of different
things the rule does.
In this case, we only have one compartment.
And that component is currently
composed of two things--
the type of the component, as well as
what is being performed.
So the type of the component here is
given in square brackets.
In this case, the type is t, which means it's just a text
that is being spoken.
And the text that is being spoken is then given as a
string here, in string syntax, with double quotes at the
beginning and the end.
So when we find the division node, all we want, at this
point, is a string being spoken where it says, "some
division." Let's put that rule in.
And let's see what ChromeVox says now.
CHROMEVOX: Given we have x equals some division-- math.
VOLKER SORGE: Right.
That's all it says-- some division.
x equals some division.
We haven't specified anything else.
In particular, what we have not specified here is how it
should deal with any of the other content that's still in
the expression.
So all we say at this point is, say one
string and then stop.
Do not recurse any further.
I'll show you, in a second, how we can do that, otherwise.
Let's first talk a bit more about the precondition.
So the precondition here is an XPath expression, as I said,
that selects this particular node--
this particular node with the mfrac tag.
So it's this division node.
Now in addition to this, one can then give additional
constraints.
And the number of additional constraints is unlimited.
Therefore, this is the reason why we put the precondition at
the end of this particular rule definition.
Because you can add more, and more, and more, and more
constraints to it.
And I'll add one constraint, just for you to be able to see
what this will look like.
So when we say, OK, so we want this rule just to be
applicable to something that says that
it's the node, itself.
And it has a descendant, which is an m square root.
All right?
So that means it has a descendant, which is a square
root, up there.
And again, this is in the MathML name space.
Right, sorry.
Unexpected token means I have forgotten to close my
expression, my string expression,
with a square root.
Let's see whether it's applicable.
Well, the thing is that we haven't really changed any of
the actions.
So let's change the actions, so we can actually hear
whether the different rule is being applied.
"--some division with square root."
CHROMEVOX: Given we have x equals some division with
square root-- math.
VOLKER SORGE: Right.
So this has, indeed, worked.
And it now applies the other rule.
So we have specialized the rule even further.
So in addition, we can now add more and more constraints.
Anything XPath1 allows you to express, you can express in
these constraints.
And those are binary constraints.
So they only have to hold, or not.
Right.
So this is as far as I want to explain the preconditions.
Now let's examine, a bit further, what we can do in
terms of the actions of our rule.
In particular, just having one string pronounced, if you
actually have a complex expression, is usually not
very fruitful.
So we might want to have a bit more pronounced.
Therefore, let's dive into the expression and pronounce some
of the child nodes, actually.
Right.
So we now start a new component.
Components are being separated by semicolons, so that's very
similar to what you have in regular JavaScript.
And we now want to specify a new component.
And this is a particular component which works on
single nodes.
So therefore, it gets the type n.
Again, the type is given in square brackets.
And now we can write a selector which is just yet
another XPath expression.
So we write another XPath selector that selects whatever
child we want to work on, or whatever node we
want to work on.
It doesn't necessarily have to be a child.
You can effectively also work on any other node that is
reachable from this particular node you're on.
But in our case, we want to work on the child.
But let's not work on both.
Let's just work on the second on, for instance.
So this XPath expression, now, just selects, from the given
node, the second child node.
Right.
And what the action now does is it will pronounce the
string "some division with square root." And afterwards,
it will ask the speech rule engine to recurse on this node
that we have selected here.
Let's do that again.
CHROMEVOX: Given we have x equals some division with
square root 2a--
math.
VOLKER SORGE: Right.
So what you can hear is that we heard the expression up to
the division.
Then, it would pronounced the string.
And then, it would pronounce the second child node.
Right.
Similarly, we could now put in more content.
So say we put in another string of type t.
Say this is now a string called a numerator.
And we now, for instance, want to recurse on the
first node, as well--
on the first child as well, not the first node.
Let's do that.
Let's listen to what's happening.
CHROMEVOX: Given we have x equals some division with
square root 2a and numerator minus b
plus/minus square root--
VOLKER SORGE: I'll stop it here.
We all know what the numerator is.
But you can see how this now works.
Now one of the things that David pointed out earlier was
that you can actually do changes to the way the text to
speech engine speaks things from within a rule.
And let's do that.
The way this works is by adding, in particular,
annotations to each component.
So for instance, say we want to change the pitch on the
first component that's being spoken.
Or in our case here, it's the second component of the
denominator.
And let's say we just want to have a significantly pitch
change, say, to 1.5.
Let's see how that sounds.
CHROMEVOX: Given we have x equals some division with
square root 2a and numerator minus b plus--
VOLKER SORGE: So you could hear that the pitch was
changed quite drastically when the denominator of the
division was being pronounced.
So the way we do these annotations is by adding in
round brackets, the particular property we want to change, as
well as the value that changes it.
All right.
So in addition, we can, for instance, also increase the
rate of the speech.
Say we increase it to 2.5.
And now let's see what's being spoken now.
CHROMEVOX: Given we have x equals some division with
square root 2a and numerator minus--
VOLKER SORGE: That was not really--
That was--
CHROMEVOX: Simple school algebra, Volker Sorge, the
third April, 2013
VOLKER SORGE: Sorry, that was not much of a change.
I'll just refresh the console so we can see a bit more.
Let's try to change it to something different.
Maybe 0.5 will make a difference.
CHROMEVOX: Given we have x equals some division with
square root 2a and numerator minus--
VOLKER SORGE: Now you could hear that it was actually
quite a bit faster.
So what else can we do with this?
Well, another thing that we've heard earlier was that we can
add pauses to our rule.
Let's do that as well.
Let's add a pause annotation, say, to the second--
uh, no, to the first of these--
children that we speak.
And let's add quite a drastic pause.
Pauses are given in milliseconds.
So let's add one second, so we can actually
hear it more clearly--
so 1,000.
CHROMEVOX: Given we have x equals some division with
square root 2a and numerator minus b--
VOLKER SORGE: Right.
So the effect you could hear here was something I wanted to
demonstrate.
It is that the pause, as well as all the other annotations--
i.e. the pitch and the rate--
than we give a particular node here, or particular component,
are recursively being applied by the speech rule engine.
That means if we change the pitch somewhere, this pitch
change will be propagated further down in the recursive
traversal of the tree.
That means is we have nested expressions, and we change the
pitch several times, these changes will add up, which
gives a very nice effect when it comes to
pitch and rate changes.
It doesn't really give a very nice effect when it comes to
pause changes.
Because what that means is that the pause is applied
after every single element that occurs.
So you might want that.
But you don't necessarily want that.
So in this particular case, we don't.
So how do we actually do it in order to apply the pause
between two chunks, rather than having it applied
recursively, during traversing the tree?
The way this works-- and I'll show it to you here--
is by adding the pause explicitly as we recall a
personality annotation, which is another
component of the actions.
And this component has the type p.
So we put that in here.
Let's get the bracketing right.
And let's see what happens now.
CHROMEVOX: Given we have x equals some division with
square root 2a and numerator minus b--
VOLKER SORGE: Right.
What you could hear now was that the pause was actually
only applied straight after the denominator had been
pronounced.
Right.
And in addition to this, we can also, for instance, change
the volume.
But I'm not going to demonstrate
that now, in detail.
Instead, I'm going to demonstrate one final thing we
can do with these speech rules, and also, one final
component--
one type of component--
for the action.
And that is useful when you have a node where you do not
necessarily know how many children the node has, and you
want to recurse over all these nodes.
And we'll do that--
give them our division here.
Right.
And this component is of type m, which
just stands for multinode.
And you then give it a set of nodes.
In our case, we select the set of nodes, again, with an XPath
expression, which is just all subnodes--
or all child nodes-- of our mfrac expression.
Let's do that.
Let's see what happens.
CHROMEVOX: Given we have x equals some division with
square root minus b plus/minus square root of b
square minus 4ac 2a--
math.
VOLKER SORGE: Right.
So what you could hear now was that all the elements are
actually being pronounced properly this time, in order,
because this is just the regular order within the
MathML expression.
Now obviously, for an expression like division,
where you really know that you only have two children, this
might not make a lot of sense.
But if you do not know exactly how many children there are--
say, in the matrix expression we've demonstrated earlier--
you have to use this way off accessing the children.
And then you have, in addition, the opportunity, or
the possibility, to put in something that is being spoken
in between all of these expressions.
And the last thing I'll demonstrate to you now is
giving what we call a separator string as a further
annotation to our component.
And the separator is just a regular string, in this case.
And let's just say, "divided by," and spell it right, and
see what happens now.
CHROMEVOX: Given we have x equals some division with
square root minus b plus/minus square root of b square minus
4ac divided by 2a--
math.
VOLKER SORGE: Right.
Back, we are, at the original expression.
Right.
So this, so far, is the explanation of this API.
And why did we go into that much detail in order to
demonstrate to you how this API works?
Well, the basic idea of ChromeVox is, since it is an
open project, that we're hoping that people will
exploit this API in the future in order to write explicit
speech rules-- say, on their web pages--
to make sure that the math they write, or they put on the
web, is pronounced in the right way.
And for instance, I personally, as a university
lecturer, university professor, could imagine that
I'll put some of my lecture notes on the web.
And as I know exactly how things have to be pronounced,
I also put speech rules there so my students with visual
impairments can actually go there, and listen to the math
in the correct way, in exactly the way I would
speak it in my lecture.
Right.
And with this, I'll end my part of the presentation and
pass back to David, who's going to conclude with some
more details on ChromeVox and the project.
DAVID TSENG: All right, thank you so much, Volker.
I wanted to conclude by thanking everyone for
watching, and sticking with us through all of our demos.
You can find a little bit more information about ChromeVox on
chromevox.com.
And we are an open source project.
So you can actually even take a peek at all of our code,
including all of the things that make math possible.
We'd be happy to take any feedback that you have.
You can also visit us, in general, for accessibility, at
Google, at the address google.com/accessibility.
Thanks again for watching.
And we hope to receive any feedback that you have.
Thanks.