Substitution Cipher

(male narrator) So let's use this substitution mapping to, uh...encrypt the message March 12th, uh...at 0300 hours. Uh...so in this case, what we have is what... is a sort of a random substitution cipher. Uh...in this case, a, uh... alphabet...alphanumeric one that includes the numerals 0 through 9. And basically, these have been randomly shifted, which creates a much larger set of possible, uh...encryptions than the basic shift cipher did. So to go ahead and encrypt here, we do the same thing we did before. We find each character in the original and figure out what it maps to. So M is gonna match to...mark, uh...map to 6. A is gonna map to 2. R is gonna map to S. C...maps to Q. H maps to T. 1 maps to Z. 2 maps to N. 0 maps to Y. 3 maps to 1. Uh...and 0s maps to Y. And there is our encrypted message, uh...using this...this cipher. And of course, again, we'd... we'd...if we could chunk it out, uh...if we wanted to...to...to, uh...yeah. If we wanted to, you know, add some spacing. So how hard is it to break this cipher? Now it seems like this should be really, really hard, because there are, uh...a ton of, uh...possible substitution ciphers here. In fact, it's about 10 to the 40th, which is a 1 with 40 zeroes afterwards. But it turns out that this particular cipher is actually pretty easy to crack using something called "frequency analysis." And that's the idea that in...in the English language, there are certain letters that show up more often than others. For example...whoops. For example, the letter E shows up with a really high frequency in English language. So this chart here on the right shows the frequencies of different letters in common English. And you'll notice here that E is the most frequent letter used, uh...following... followed up by A. And, uh...or A and T are pretty similar there, uh...and a few others. So in this case, what we have on the right... sorry, on the left here is some encrypted text. And the frequencies of different letters were...were, uh...calculated. You know, how often each letter showed up in the text. So what can we deduce from this? Well, since the character S here, uh...is showing up with the most frequency, it's really likely that our original mapping mapped the letter E to the letter S. And so that S can be unencrypted as the letter E. Now it's very likely that these two--W and L-- came from A and T. So it's very likely that maybe T maps to W and A maps to L. Uh...though those two could possibly be swapped. Now we could also look down here and see that, um...certain other characters-- like J, X, Z, and Q-- don't show up very often. And so it's very likely that these characters here-- uh...C, A, D, and J in the encrypted text-- correspond to those very unfrequent letters. And if we follow this process and sort of decrypt each letter as we go along, then we can start looking for patterns. Uh...there are also certain pairs of letters-- like T, H--which show up very often in English language. And we could use those, again, to help us, um...break the code... break the substitution mapping. And it turns out to be, uh...not too hard to do, uh... as long as you have enough text to find these frequencies from.