Tip:
Highlight text to annotate it
X
Speaker 1: Before that, here on BBC Radio 4, Lucy Hawking traces the development of
speech synthesis in Klatt's Last Tapes.
Speaker 2: You are listening to the voice of a machine.
Speaker 3: Mama, mama.
Speaker 4: A, B, C, D, E, F, G...
Speaker 5: Once upon a time, there lived a king and queen who had no children.
Speaker 6: Do I sound like a boy or a girl?
Speaker 7: How are you? I love you.
S2: I do not understands what the words mean when I read them.
Speaker 8: Ha-ha-ha.
Speaker 9: I can serve as an authority figure.
Speaker 10: What did you say before that?
Speaker 11: Can you understand me even though I am whispering?
Speaker 12: To be or not to be, that is the question.
Lucy Hawking: My name is Lucy Hawking and I have been regularly chatting to a user of
speech technology, my father Stephen, for the past 28 years. I write adventure stories
for primary aged children about astronomy, astrophysics and cosmology. When I go to schools,
I always talk about my father's use of speech technology and I tell the kids that even though
my father may sound robotic, when I play them a clip of him talking, I ask them to remember
that actually it's a real man talking to them. And it's a man who's using a computer to give
himself back the voice that his illness has taken away from him.
Speaker 14: Development of speech synthesizers. One, The Voder of Homer Dudley, 1939.
Speaker 15: Will you please make the Voder say for our Eastern listeners, "Good evening
radio audience."?
Speaker 16: Good evening radio audience.
LH: To find out where speech technology started, I went to Saarland University in Germany,
where two researchers had built a model of the first ever voice machine. It was originally
created in the 18th Century by inventor, scientist, and impresario Wolfgang Von Kempelen.
[background noises]
LH: Hello.
Speaker 17: Hello.
LH: Good morning.
S1: Please come in.
LH: Thank you so much.
S1: I'm very pleased to meet you.
S1: Hello.
[background conversation]
Jürgen Trouvain: My name is Jürgen Trouvain. I'm a lecturer and researcher here at the
Department of Computational Linguistics and Phonetics at Saarland University and I'm also
interested in the history of speech communication devices, like the one of von Kempelen, for
example. Kempelen was both a good showman and a very good scientist, but he was really
like, sort of a genius, a real engineer, because he was interested in building things which
can function and can help also people.
Fabian Brackhane: My name is Fabian Brackhane.
LH: What do you think the relationship was between von Kempelen's original inspiration
and the organ?
FB: It's a very curious thing, because there is a stop in the pipe organ called "vox humana."
[music]
FB: When this stop was invented in the 17th century, it should be a representation of
the human voice playing the organ.
LH: So, they wanted to take the vox humana from a musical note, something you'd find
in compositions at the time, to actually be able to produce human speech.
FB: Exactly. Yes. But Kempelen knew very well that this stuff couldn't be the solution to
get a speech synthesis.
[background music]
S1: Three, PAT the Parametric Artificial Talker of Walter Lawrence, 1953.
S1: What did you say before that?
LH: And so, we're looking at von Kempelen's speech machine. [chuckle] The door of which
has just fallen off. It looks like a small bird house. Yeah. So, we're taking the lid
off the box, which houses the speech machine. And so, Fabian is putting one hand through
one hole with his elbow on the bellows, which represent the lungs and his other hand is
coming underneath the rubber cone. Which, what does the rubber cone represent?
FB: The mouth.
LH: The mouth. So, it's hand under the mouth piece.
S3: Mama Mama.
S1: Ooh, it's creepy. Sorry.
[laughter]
S3: Papa Papa.
FB: So, it's... These are the both best words he/she could say it.
S3: Mama.
FB: So, you have the nose to be opened.
S3: Papa.
LH: So, Fabian is moving his hand rapidly over the mouthpiece and using two fingers
over the nostrils effectively, while pressing down with his elbow on the lungs. Fabian is
actually mouthing the words "mama" and "papa" while the machine is saying them.
[music]
S1: Four, The "OVE" cascade formant synthesizer of Gunnar Fant, 1953.
S7: How are you? I love you.
Bernd Möbius: I might be able to find out whether Lucy is able to...
LH: Should we see... Should we see, perhaps like in...
FB: So, there's your instructor.
LH: Right.
FB: If you want to say "em," you have to close the mouth and the nostrils have to be opened.
LH: The nostrils are open, front.
FB: And if you want to say "ah," you have to move the hand backwards. So, just mah,
mah, while I'm pressing them...
LH: While pressing...
S3: Mm... Mama... Mam...
[chuckle]
LH: I did that with three syllables. [chuckle] I'll try with two this time.
S3: Mama...
LH: Right and what about papa? How would I do papa?
FB: The same way but you have to close the nostrils. Well...
LH: Okay. So...
S3: Pa-pa-paaaaa.
[laughter]
LH: Let's see if I can just do it with two syllables this time.
S3: Pa-paa...
LH: Can I get her to say anything else or will I be... Would I be able to make it say
any other words?
FB: If you don't cover the mouth, it's an A.
S3: Ah...
S1: And the more you cover the mouth, the vowel quality changes.
S3: Ahh... A... B... Mm...
[music]
FB: He knew that the missing of the tongue was very important thing and in his book,
he wrote to his readers, to invent this machine forward, but nobody could invent it with the
tongue, with teeth, so that, it could speak more than this few, very few things.
[music]
LH: It seems to me that his aim was actually to give a voice to people who couldn't speak.
And so, he must have hoped for further development of his machine 'cause he can't have imagined
that, it would just be mama and papa or those short sentences. He must have had in mind,
this idea that people would be able to speak freely, mechanically.
JT: And there was a plea in that book Fabian mentioned, please read out that means, researchers
and the later generations, please, go on with the development of that machine. So, we're
still trying to do that here.
[music]
S1: 16, Output from the first computer-based phonemic-synthesis-by-rule program, created
by John Kelly and Louis Gerstman, 1961.
S1: To be or not to be, that is the question.
LH: It would be really nice to get a sense of the progression from a mechanical to electrical
to computer solutions to providing a voice for people who can't speak.
BM: I'm not sure whether that was actually a smooth transition from mechanical systems
like to the first electrical ones. I only know that, all of a sudden, that's how it
looks. My name is Bernd Möbius. I am the Professor of Phonetics and Phonology at Saarland
University. In the 1930s, there was an electrical system around, the so-called Voder, did by
Homer Dudley, that was demonstrated at the World Fair in New York, I believe in 1937.
S1: For example, Helen, will you have the Voder say, "She saw me"?
Speaker 21: She saw me.
S1: That sounded awfully flat, how about a little expression? Say the sentence in answer
to these questions. "Who saw you?"
S2: She saw me.
S1: Whom did she see?
S2: She saw me.
S1: What did she, see you or hear you?
S2: She saw me.
BM: During the demonstration at the World Fair, there was a female operator of the system
who played the device a little bit like a church organ.
S1: About how long did it take you to become an expert in operating the Voder?
Speaker 22: It took me about a year of constant practice. This is about the average time required
in most cases.
[music]
S2: She saw me. Who saw me? She saw me. She saw me. Who saw me? She saw me.
JT: We have to go back to the or is the floor next to the top, the top floor.
LH: I'm now just getting into an elevator, which probably I can talk to. So, does it
speak English?
JT: Hopefully, yes.
S2: Okay. Hello, elevator. It doesn't say hello back.
JT: You must be patient with that. It's a machine. Maybe with German.
S?: Hello [German]
Speaker 23: Hi there, where can I take you?
LH: The third floor. Third floor.
S2: Okay, I'm bringing you to the third floor. Bye, bye.
LH: Bye now.
S1: 19. Rules to control a low-dimensionality articulatory model, by Cecil Coker, 1968.
S2: You are listening to the voice of a machine.
Speaker 24: I'm Eva Lizotte, and I'm a PhD student and working in articulatory synthesis.
The actual situation right now, is that, it's very hard to simulate women's voices 'cause
they have a slightly different characteristics and if you just tune up the F0, the fundamental
frequency or the pitch of the voice, it starts sounding really artificial and what you actually
have to do, you have, also to alter the articulation. So when "ah", when I or when we speak an "ah,"
it's different from a male long vocal tract "ah." So, you have... You can not easily interpolate
the articulation.
LH: Because of course it'd be awful for women not only to be using a speech synthesizer,
but then, to be coming out with a man's voice.
S2: Yeah.
[laughter]
LH: I mean, that would constitute... That would be a real loss of identity.
S2: Yeah. Exactly.
Speaker 25: This is result of trying to imitate a female voice by increasing the pitch.
[music]
S1: 24, the first full text-to-speech system, done in Japan by Noriko Umeda et al., 1968.
S5: Once upon a time, there lived a king and queen who had no children.
S1: But I think it's also important to think of children for example, growing up and of
course at the beginning to speak with an adult's voice, even the sex would be the same, would
be awful I think...
LH: Definitely very important just for making friends. It's gonna be very hard for a child
speaking with an adult's voice to actually communicate with kids of their own age.
S2: Yeah.
JT: But at the moment we don't know very much about the speaking voice of children coming,
adults, for example. What's really happening during the maturation of the vocal folds.
LH: So, the aim is to create speech machines which can grow up with somebody.
JT: That would be really nice. Then you would have shown real knowledge about what's going
on in your voice during life span, at least, of a first say, 20 years or so.
[music]
S1: 21, sentence-level phonology incorporated in rules by Dennis Klatt, 1976.
Speaker 26: It was the night before Christmas, went all through the house, not a creature
was staring. Not even a mouse.
LH: Can you see that people who don't maybe know, who Dennis Klatt is, could you put him
in context?
JT: Yeah, he's definitely one of the pioneers of speech emphasis, in the technological sense,
but also in providing an interface for non-experts who could basically type in text and get synthetic
speech out of the system, which wasn't possible before I think.
S2: Before Klatt, you would actually have to be a specialist in order to be able to
input what you wanted to say.
JT: Exactly.
LH: Okay. Laura can you hear me?
S2: I can hear you. Can you hear me?
LH: Yes. I've got you. That was fantastic. This is Dr, Laura Fine, the daughter of Dennis
Klatt. Dennis Klatt is really the father of the modern speech machine. He created DECtalk,
the system which takes text, inputted by the user and turns it into speech. Dennis Klatt
also produced the definitive history of speech devices which includes a collection of recordings
of each device through out the 20th century.
S2: He really was interested in making a natural and intelligible system. So, the most important
qualities of a speech synthesis system are really the naturalness and the intelligibility.
And he was very much interested in making those of high quality. One of the unique contributions
was that, he used not only his understanding from an engineering standpoint and a speech
production standpoint, but he also asked for analysis with perception data. How do people
interpret speech and what is it in the listener that helps them determine, is this a child,
is this a female, is this a male? What cues are important? And that really helped him
to make an intelligible system that incorporated different age speakers and different genders.
[music]
S6: Do I sound like a boy or a girl?
S2: My mother came across this drawing that my father made of the different speakers.
In the center, we have Perfect Paul. This is a picture of my father.
Speaker 27: I am Perfect Paul, the standard male voice.
S2: And then, this is beautiful Betty which is the standard female voice. And that is
a picture that he drew of my mother.
Speaker 28: I am beautiful Betty, the standard female voice. Some people think I sound a
bit like a man.
[laughter]
S2: This is Kit the kid, who's a 10-year old child. So, this is a picture of me.
Speaker 29: My name is Kit the kid and I am about 10-years old.
S2: With my nice short hair cut, as a child.
LH: Oh, is that you?
S2: I was a lab rat. As a child, I spent a lot of time at MIT. My father had a candy
drawer. I spent hours with him at MIT, in his laboratory and he took snippets of my
voice and that helped to develop the child's voice.
LH: I love that they're called the DECtalk gang.
S2: The DECtalk gang.
LH: That is a great... That is a great title.
S2: So, there was my father in later years and underneath the caption says, Huge Harry.
Kind of older gentleman's voice.
S9: I am Huge Harry, a very large person with a deep voice. I can serve as an authority
figure.
LH: Laura, I have to tell you something, Perfect Paul, sounds just like my dad.
S2: I mean, I think that's amazing.
LH: Is Perfect Paul based on your father's voice?
S2: Yes.
LH: Which therefore means that, my father is actually speaking with your father's voice.
S2: It's amazing, he would be so, so thrilled.
LH: I think, one of the things that strikes me about your father is his humanity and that
he was obviously an amazing scientist, who managed to do something that has had a very
profound impact on people's day-to-day lives. And but also that he had quite a sense of
humour.
S2: He did.
[chuckle]
LH: Is it true that he gave his synthesizer the ability to sing, "Happy birthday to you"?
S2: He did.
S2: Happy birthday to you. Happy birthday to you. Happy birthday dear...
S2: One of the ironies is, as a 40-year old man, he began to be somewhat hoarse, because
he had thyroid cancer. And, he had had a thyroidectomy, but his vocal chords were affected by the
disease. And so, he spoke in later years with a raspy voice. And I think he understood all
too well your father's challenges in terms of communication.
LH: So, he had a real sense himself of what it would actually be like to find that you
had no voice.
S2: Yes, my father unfortunately passed away at age 50, way too young. And he knew that
he had a terminal illness really, when I was quite young. He knew that he would not be
around perhaps to see me graduate from college. But he was always so optimistic. I think it's
been such an amazing experience for me to talk to you about how your father's life has
been transformed by my father's research. And I had never really thought before that
my father's voice lives on.
[music]
S1: 33, The Klattalk system by Dennis Klatt of MIT which formed the basis for Digital
Equiptment Corporation's DECtalk system, 1983.
S2: According to the American Speech and Hearing Association, there are over one million people
in the United States who are unable to speak for one reason or another.
Speaker 30: I will show you the way that you can write using my eyes.
Speaker 31: At first, when people meet me as someone who is unable to speak, they'd
seem to assume that you have some form of mental deficiency.
S3: I will show you the way that you can write using my eyes.
LH: This is Michael Cubis. And Michael lost his voice from a stroke some years ago.
Speaker 32: Some people will talk to me as if I have a learning disability. I find this
quite funny as some of them the most ridiculous way. Some of them catch on fairly fast and
realize that I'm perfectly sane. Other's continue to act this way though, which is funny and
completely bizarre.
[music]
S3: People are quite anxious about how to approach someone with a disability. And that's
what Michael does, he puts people at their ease. So, it is easy to communicate with him.
LH: Mick Donegan's speciality is an eye gaze technology, and that means, using the movements
of the eye in order to generate text, which can then be turned into speech. Could you
explain a bit more to us about gaze control, about the kind of technology that we have
just had a conversation with Michael?
S3: It's a system, it's based on a very powerful camera system combined with low level infra-red
lights. The actual technology has been around probably two or three decades, but the significant
change that's happened this century, is that systems began to cope with significant involuntary
movement. That means that the significant numbers of people with cerebral palsy, for
example, who have involuntary movement, suddenly that group of people were able to use the
system. People with MS who have involuntary movement.
[music]
S1: 11, The DAVO articulatory synthesizer developed by George Rosen at MIT, 1958.
S4: A, B, C, D, E, F, G, H, I, J, K...
S3: When I first tried Michael with eye gaze technology, we used just a lower case system
and Michael was very unhappy about that. He was insistent that I put capital letters,
full stops, commas, semicolons, because it's really important for him to show everyone
that he's a fully literate guy who is able to speak independently and in the highest
literacy level.
S4: When we know our A, B, C...
LH: Mick, I wonder if you could tell us a bit about how you see the future of this technology
developing?
S3: I've just finished being an advisor for a European project on brain-computer interface
and disability. And for me, that's a technology that excites me because for those people who
are completely locked in, who can't even move their eyes, then there is no other way to
go, other than to use a brain computer interface. At the moment, you know it's kind of inconvenient,
because for the best signal... Well, in fact, for the best signal, you need an implant.
But the second best signal [chuckle] is to actually wear a cap and for that gel on it,
etcetera. But there are various dry caps being developed that have a reasonable signal as
I understand it.
LH: I'm always asked how to talk to my father, and it would be great to know what advice
you would give to people who are not familiar with speech machines, but who would like to
have a conversation with you?
Speaker 33: I would ask them not to ask long questions and be patient because it can take
a long time to answer. Also, please bear in mind that it can be very tiring for those
using speech output devices.
[music]
Speaker 34: The question of whether I would change my voice given the opportunity is a
difficult one. And I suddenly have an opportunity.
LH: This is acclaimed film-maker, Simon Fitzmaurice, who has lost his voice through MND.
S3: This voice, my voice is a generic one that came with the computer, turning an Irish
man into an American overnight. But it has become my voice.
S?: Yeah. This is actually something that we have in mind as a real application for
people who know that there's a chance that they will lose their voice to record themselves.
Such that the experts will be able to build a speech synthesiser that has that person's
voice.
S3: There are two key issues, and the question of changing my voice. What I think about my
voice, and what those closest to me think and feel about my voice? And I can tell you
what my children feel straightaway. They find the idea of me changing my voice completely
abhorrent. Just recently, I was testing out another computer, when I glimpsed out of the
corner of my eye, my two little boys standing outside the door, their heads close together
whispering... They are four and six years of age. They are whispering and looking in
my direction. It turns out they are discussing the strange voice coming out of this different
computer. Later, back on my own computer, it's bedtime and right my six-year old comes
to give me a kiss, I type up "Goodnight" on my screen. "No. Say it." I say it, "Goodnight."
He turns to his brother at the door, "You see, I told you. It's the same." Someone's
voice is part of their identity, integral to their perceived makeup, it's funny though,
I feel less protective of my computer voice than others, probably because my voice inside
my head is what is familiar to me, my thoughts, not the voice that expresses them.
S3: Recently, I came across a video on YouTube, we have a doctor in Sweden with motor and
neuron disease and there it was, my voice out of someone else's computer, identical.
It was a little unnerving. So, I decided to see if I could get some semblance of my old
spoken voice back, uniquely mine. I've been working with a company in Edinburgh, CereProc,
the world leaders in synthetic speech who have built a synthetic voice out of old recordings
of my spoken voice. I was lucky enough to have a recording of me reading some of my
poetry and other recordings. However, because of the lack of data in comparison to someone
who would deliberately bank their voice, my synthetic voice is limited by the amount of
original material. As a solution, CereProc are now in the process of using my father's
voice as a similar source from which to fill in the missing DNA and to build a harmonias
rounded voice.
Speaker 35: Harmonious rounded voice. I await the results.
S3: I await the results.
S3: So, the question remain...
S3: The question remains...
S3: Will I change my voice?
S3: Will I change my voice. And more importantly...
S3: Will my children allow it?
S3: Will my children allow it?
[music]
S1: 30, The MIT MITalk system by Jonathan Allen, Sheri Hunnicut, and Dennis Klatt, 1979.
Speaker 36: Speech is so familiar, a feature of daily life that we rarely pause to define
it.
S1: End of the demonstration. These recordings were made by Dennis Klatt, on November 22nd
1986.
LH: Amazingly, we've progressed from Von Kempelen's 18th century machine which had a limited vocabulary
to being able to recreate the exact voice that was lost and give it expression, meaning
and modulation in a way that mimics the naturally produced voice. Soon, speech technology users
will be able to make their voices smile.
S1: Klatt's Last Tape was presented by Lucy Hawking.
S6: Do I sound like a boy or a girl.
S?: The recordings were made available by the Acoustical Society of America.
S4: A, B, C, D, E, F...
S?: The sound design was by Nick Romero.
S7: How are you? I love you.
S?: It was produced by Julian Mayers.
S8: Ha-ha-ha.
S?: It was a Sweet Talk production for BBC Radio 4.
S2: Thank you for listening and good luck on all your cosmic journeys.