Tip:
Highlight text to annotate it
X
Great East Japan Earthquake Big Data Workshop Project 311 Providing Multilingual Information During Disasters Masao Uchiyama & Eiichiro Sumida, NICT
I'm Masao Uchiyama of the NICT.
I'd like to propose a few things that could be done
to better provide information in multiple languages during disasters.
First, I think we can all agree
that fast and accurate multilingual information must be provided.
In order to provide that information more promptly,
we believe it's necessary to prepare for proper machine translation.
Second...
Considering the tremendous number of volunteer translators
that appeared during the disaster, we thought that the efforts of
such volunteers could be used to help provide more accurate info.
To accomplish this, information sources would need to grant permission
to freely distribute and translate any emergency information provided.
The fact of the matter is, copyright has no real relevance in emergencies.
We think providers of information would do well to facilitate
the spread of their information by declaring that people are free
to distribute and translate it.
Next, we need to improve the capabilities of machine translation.
It should be possible to improve and refine those capabilities
by using sample translation data.
At the NICT, we've released
audio translation software for tourist needs in 21 languages.
In principle it should thus be possible
to use machine translation to translate emergency information.
It's vital, however, that people know in advance
where they can go to find that information.
So we believe it's necessary to first establish
an official platform for this emergency information.
Volunteer translators could then gather there to do their work.
Allow me to explain how the translation data corpus works.
Asking where Kyoto Station is produces the English translation
"Could you direct me to Kyoto Station?"
With enough such translation pairings, the system could automatically learn a
probability-based dictionary of phrases and use it for translation.
Expanding this translation corpus would be crucial.
I'd like to show you a demo we've created that allows
Japanese-English searches, translations, and corrections.
If you search for "tsunami," it will display all sentences
featuring that term, page by page. This one is from NHK.
On the right here we have two machine translation results.
They're far from ideal, but if there are two or three,
some can turn out to be surprisingly accurate,
so it outputs anything that might fit.
If you click on a title like this, you can view the entire article,
which is displayed in the same kind of format.
Since the accuracy of this info is so important,
we've created an interface to submit corrections.
If you enter a translation result here and then save,
it will be saved to the database promptly.
This is just a demo, of course, so we'll need to figure out a way
to ensure a high level of quality once the time comes to actually use this.
To reiterate my proposal, we should ensure that everyone
can find emergency information in their native languages.
We can do this by allowing people to freely distribute and translate
this information, preparing a machine translation engine ahead of time,
and building and publicizing a place for this information.
Thank you.
About the tourist translation software you demonstrated for us...
How do you figure out the traits of emergency information--
that is, traits by genre and so forth?
The language of emergency information and such.
The way we're doing it right now is very basic.
We're collecting as many samples as we can
of text that people consider emergency information.
We can then adjust them automatically.
I have a comment.
While listening to Prof. Murai just now I suddenly remembered,
NTT once had a translation engine with 100% accuracy.
It focused on phrases about market conditions,
the kind of stuff you see at a stock exchange, you know?
In short, when it comes to highly standardized phrases,
machine translation can be perfectly accurate.
So I was thinking, if categories are fixed to some degree,
it should be possible to anticipate some types of necessary information.
And if there's an established format and established phrasing,
it should be possible to translate it with virtually 100% accuracy.
I'm sure you're aware of something like that.
Do you have any guidelines or the like for writing emergency information
to make it more compatible?
There are people doing things like that.
The name escapes me now,
but someone at Nagoya University is working on that, I think,
saying that it's possible to produce accurate translations
by creating templates and then inserting words into them.
There is a sort of terminology used for evacuation shelters--
terms established by the federal government
and distributed to local levels for refugees.
But you're probably talking about info meant for earlier consumption
viewed on a computer or the like...
[inaudible]
Thank you very much.