Tip:
Highlight text to annotate it
X
Alright. So now that we know about the phonemes of all the languages in the world, the interesting
question is, can we see them? Can we visualize sound? And the answer is, a resounding yes!
You are actually looking at it. This is a program that allows me to see the waveform
which is on the X-axis is time and on the Y-axis is the energy in decibels, or how loud
I am. So if I speak REALLY LOUD I can make these giant patterns. But actually that's
not the most interesting view - that would be the spectrogram.
On the X-axis is time, then on the Y-axis now we have the frequencies, and it's color coded, so that the yellow indicates
the highest energy band, and then it goes to red and to dark.
So if I can make some different sounds, like the vowels:
You can see they're quite distinct; the highest frequency is different and the second highest frequency is also different.
The amount, the harmonics, the space between the formants is also different.
If we do some fricatives:
You can see that I was indeed mentioning that it was turbulence, it was noise.
From the fricatives, pretty much the energy is spread out across all of the frequency bands.
Applosives, like:
You can see that there is this burst of energy. There is like nothing for a few milliseconds, and then there is this
explosion of energy. So this is actually how speech recognition works. Digital Speech Processing,
fast Fourier transforms, are all techniques to gather the audio and then break it into
these frequency bands and analyze the pattern, and then match it against a known, acoustic model
that tells you, oh this sounds like an F or it sounds like a P, etc.
There are actually scientists who can read the spectrogram. They can look at a picture like this one and
can kind of reconstruct everything that I've said so far.