Tip:
Highlight text to annotate it
X
I am Romain Deltour, I am working for the DAISY Consortium, and I am going to spend
the first half hour talking about the DAISY Pipeline 2 project, which is an open framework
for automatic document processing. I am going to give an introduction of the tool and then
briefly describe some possible workflow for production, and then if we have some time
left for questions I'd be happy to answer them. So let's first talk about the background
- there is an ever-existing demand for accessible content. This demand comes from a wide range
of user groups and content must be published through a wide variety of distribution channels - online
content, CDs, SD cards - and to an increasing variety of devices. Visually impaired users
use different kinds of devices, like Braille displays, hardware DAISY players, laptops,
iPads, tablets, whatever. Because there is a wide range of user groups and a variety
of devices, there are a lot of different output formats.
This project is the followup to the Pipeline 1 project which was started in 2002 by the
DAISY Consortium and is now in maintenance mode. It has been quite successful and used
a lot in the DAISY community to produce accessible content. But at the time we were creating
the projects, some technologies and standards were not ready. Now, we've decided to come
up with the Pipeline 2 project and totally redesign the software to better rely on open
standards and new technologies.
Our high-level objective is to be efficient - enable the tool to produce many documents
in a short time. When a publisher wants to produce a newspaper for the next day, it has
to be quite efficient. The tool has to be low cost, which means it is easy to develop,
adapt to a new publishing workflow, and easy to maintain over time. And it needs to be
versatile, which means it can adapt quite well and it can be interoperable with different
systems and it needs to be able to produce different output formats.
Our approach is to come up with a modular system - why a modular system? Because if
your system is modular, based on several components, it's easier to extend. If I want to augment
this EPUB3 production with MathML processing, I just add a MathML-aware module to the system.
It must be easy to customize - if I have some special needs in my organization to produce
the content, I must tweak the production workflow to meet these needs. It makes it more easy
to integrate and open the tool to commercial and non-profit use. The module system is a
plugin system itself, so commercial companies can come up with their own plugins for the
The other big item in our approach is to promote single-source publishing - that's not a requirement,
but it is recommended. What is single-source publishing? It means that we use an XML master
document in order to produce different output formats. Markus said earlier that when you
have a reasonable, satisfying level of good structure in your document and good semantic
inflections in your document, then you almost have the accessibility features for free,
and that's what we are talking about now. If we have a rich XML master, rich enough,
then with automatic production we can transform it into a variety of accessible output formats
such as DAISY digital talking books, EPUB3 books, Braille content, large print. Of course,
this is not a requirement nor limitation of the tool. It is what we're suggesting, but
the production workflow can be adapted to different use cases and workflows. We can
transform input formats into these XML masters, then into other output formats, or we can
go a totally different route. And as for the XML master, currently we are focusing on DAISY
A.I. also known as the DAISY 4 Authoring and Interchange standard, which is the successor
to the DTBook format. It's an authoring format with an XML schema that can be used to describe
almost every document.
The third big item of our approach is that we focus on accessibility and quality. A valid
EPUB book is not necessarily accessible. We strive to produce some content that is inherently
accessible and inherently well-structured, which makes it a quality publication.
Now as for the architecture, I won't dive into the tiniest technical details here, but
just to give you an overview of what kind of technologies we are using. We are relying
on W3C standards, notably XProc, XSLT, and XPath, which are all open recommendations
and native XML processing technologies. We do that because it's easier to manipulate
XML to produce XML with XML tools with technologies that have been made for the job. XProc is
the XML pipeline language. It's a language that has been develop to orchestrate XML processing
steps in a workflow. It has been a recommendation since May 2010, and there are already many
open source and commercial engines available. Then, on top of this core XProc engine, we're
adding a module system. Again, each step in the production workflow is an independent
cohesive software component, which we call a module. It can be implemented in several
technologies like XSLT, XPath, and Java code, and it is all orchestrated by XProc. Then
we have the runtime framework, it is like the glue code that ties all these components
together. It makes the XProc engine aware of the modules. It makes it feasible to run
these modules with the XProc engine and it's based on the Java technology and the OSGI
module system, which helps us to come up with a service oriented approach where we can plug
in different pieces of functionality like job management, logging, web services, you
name it.
So that's it for the architecture, we have this core XProc technology and the core processing
technologies are implemented with open standard recommendations and then everything is run
in a Java-based open source runtime which implements a module system.
So what are the deployment options for our tool? We have the possibility to use the tool
as a command line tool. This is already available and the tool will be revamped early next year.
The tool can be called by a RESTful web service API. This tool is already available, it's
going to be gradually enriched and improved based on feedback. We also want to develop
a web application for the tool, a web UI that you can access with your browser. The target
release for this web application is June 2012. And ultimately, we'll also come up with a
lightweight standard desktop UI. It's going to be a sequence of dialogs to guide you through
the conversion process. The goal is to be able to embed that into third party applications.
For instance, if you have ever used the Word Save as DAISY plugin, it calls the Pipeline
under the hood and it pops up a sequence of dialogs to invoke the Pipeline process. So
that's the kind of user interface we're looking at. This desktop UI is planned for the first
half of 2013.
Now I'm going to describe, rather than demoing an automated tool (It's not very interesting.
I just start the process, it runs, and it gives me a file so there's no real point in
showing that) but instead I'm going to describe briefly some sample workflows that are available
or in the works for the tool. First, I'm going to briefly talk about EPUB production, how
we do that, and what it takes. I won't dive into every step of this workflow, but basically
we have a generic process when we talk about EPUB production. We look at the input file
set, we determine the reading order of the file set, and then based on this reading order
we process the content to convert it into HTML5, possibly add some media overlays. When
we've done that, we extract the metadata from the documents, we automatically create a navigation
document, then we package the file set (zip it). What's interesting here is that we try
to have each of these steps as independent as possible from the previous ones which means
that they are interchangeable (if possible) and reusable for different production workflows,
depending on the input and output. We try to automate, of course, as much as we can.
For instance, the navigation creation is fully automated based on the structure of the content
document. We look at the HTML markup, and if it is well structured, if it has the proper
markup and top-level sections, semantic inflections, and things like that, we can automatically
generate the navigation files.
I'm now displaying another workflow diagram that shows an instance of the workflow applied
to a DAISY 3 to EPUB 3 conversion. It basically shows how we use some of the components when
we have a complete conversion requirement. For instance, when we have a DAISY 3 file
set, to determine the reading order of the final EPUB 3 publication, we are looking at
the DAISY navigation file for that. When we want to generate media overlays in an EPUB
3 production being published out of a DAISY 3 DTBook, we are taking the existing .smil
files and audio from the DAISY 3 file.
This next workflow is very high-level. I'm going to briefly describe how we can use this
tool to add some advanced TTS notations to an EPUB publication. This workflow is like
this: Start with the XML master, for instance, in DAISY authoring interchange format, and
then, depending on the original markup of the document, it may need to be improved.
For instance, I give here an example of a sentence, which is: "Have you seen the movie
'La Vita e Bella?'" It's just a paragraph, it's tagged as a paragraph, and "La Vita e
Bella" is not marked up. So we can have some preprocessing tools to enrich this markup
and make it better by tagging "La Vita e Bella" within a name element. This kind of markup
enrichment can be either fully automated or needs human interaction. Sometimes there are
things that an automated tool cannot do, and at this time we need some human interaction.
Here, for instance, to identify movie titles or proper names, e-mail addresses, whatever,
we can query some databases to do that. So once we have this properly tagged XML master,
we then transform that into an EPUB. During this transformation, we can plug in a module
that will talk to a remote lexicon in one of the available and the lexicon will know
how to pronounce this foreign proper name. So it will be able to add the EPUB annotations
to the produced EPUB file. It will say, I am using this phonetic alphabet, and the phonetic
description of the title is this. The interesting parts here are that this is really a good
use case for automated production because we can automatically query some organization-wide
lexicons and data to improve the publication - either improve the document we intend to
archive, or improve the live publication process. The Pipeline 2 project has built-in support
for remote services - we can basically make some HTTP requests and call some web services.
So if an organization is making their lexicons available for the public or parters as an
online service, this service can be called from the automated production tool to enrich
the outcome of the EPUB 3 product with these text to speech annotations or other accessibility
features. It's particularly interesting for text-to-speech annotations because building
a comprehensive and rich lexicon is very time consuming and costly, so usually users are
maintaining large lexicons and they gradually enrich and enhance the lexicons over time.
So here in this very example I showed how to add some inline pronunciation hints, but
you can also convert the big organization lexicon into a small subset of it that you
integrate natively within your EPUB.
To summarize, the Pipeline 2 project is an open platform, it's all based on open source
software either from third party developers or from the DAISY Consortium, it's currently
licensed as LGPL which is a commercial friendly license, but we are maybe looking at other
licenses, we are discussing other license possibilities such as the Apache License.
It's a collaborative project, it's maintained and led by the DAISY Consortium but it also
involves DAISY Consortium members: the National Library for the Blind of Norway, the Swiss
Library for the Blind, RNIB, and other organizations. It's a really collaborative project. It also
has built-in accessibility and customizability, which means that we intend to make the tool
extensible for special purposes and customizable to special organization workflows.
What is available today in terms of concrete conversion? We can go from DTBook (DAISY XML)
to DAISY A.I., the new version of DAISY XML. We can go from DAISY A.I. to EPUB 3. And we
can go to DAISY 2.02 (Digital Talking Book) to EPUB 3 plus optional media overlays. But
that's just for the first iteration of the software. We have many things in the works,
at the top of the list is improved EPUB 3 support. We're going to come up with additional
input formats that will be transformable into EPUB books. I'm thinking of other DAISY formats
like DAISY 3, HTML, RTF, things like that. By improved EPUB 3 support, I'm also talking
about these TTS annotations. This is not already available, but it will be deployed, it's already
planned for the next year. In the works is also Braille production, being able to produce
Braille. There will be several prototype solutions developed by an independent working group,
working on this Braille topic. We'll also work on TTS based production. Usually users
prefer when the text is narrated by a human, but sometimes we don't have the time to have
the text narrated by humans, when you want to deliver newspapers for instance. More and
more reading devices and reading systems will have the built-in ability to speak text so
they will have their own TTS engines built in. At the same time, if you do the TTS processing
upfront, you can rely on more server resources and processing power and better lexicons,
which all makes for a better TTS based production. So we are still targeting TTS based production.
Okay, that's it. Thank you.