Colossus - Cs387 unit 1 - Udacity

The first machine that was built to do this was called the Heath Robinson. It was named after this British cartoonist who drew cartoons of crazy machines to do operations. This one was for peeling potatoes. The timeline of this is quite interesting. The first operator mistake--the re-transmission with abbreviations-- happened in August 1941. That was intercepted. Over the next year that was enough to learn from the two intercepted messages the structure of the machine and then to understand it well enough to develop this technique--the double-Delta technique for finding the correct configuration and to break messages. By December 1942 they decided to build a machine and put the resources necessary to design and build the machine. It was requested to be done by June of 1943. It was actually delivered 3 months early. By April 1943 they had the first machine delivered and operational. It could process messages about 2000 characters per second. It was able to do 7 XOR operations at 20 km per hour. This was not quite fast enough. A lot of the value in decrypting messages, especially in war time, is being able to decrypt them before what the message is describing actually happen. They built a faster machine. The faster machine was called "Colossus." This was operational by January 1944, and the big change that made Colossus much faster and more useful than the Heath Robinson machines was that they replaced the configurations, which were previously on a tape, with an electronic keytext generator. The logic is doing all the XORs and counting the number that are zero. Then it's printing out those tallies so you can go back and find the right configuration. The ciphertext is still on a paper tape. This is spinning through about 50 km an hour. It was a pretty impressive sight when it was operating. Processing 5000 characters per second. What made this arguably the first computer was because of electronic keytext this logic was a little bit programmable. It didn't do always exactly the same thing. There were ways to program it to do slightly different operations depending on what the analysts through was the most useful thing to do with that intercepted cipher text. This is arguably the first programmable digital electronic computer. It wasn't fully programmable. It wasn't a general purpose computer. But it could be programmed to change its behavior slightly. You can see what this looked like during World War II. There are very few pictures of it. This was a very secretive operation. The machines were actually all destroyed after the war. A lot of the value in breaking a cipher is to make sure that no one else knows you've broken it. This is different from academic cryptography where people like to publicized when they break ciphers. In military cryptography the whole value of breaking a cipher is your enemy keeps using it. You want to keep it very secretive that you've done that. A lot of the work on Colossus was not declassified until the mid-90s. Today if you visit Bletchley Park you can see a replica, a rebuilt version of Colossus. This is my picture of it from 2004. It does look very similar to the black and white picture from World War II. Unfortunately, they don't operate it frequently, so you won't get to see the tape spinning through at 50 km an hour and all the crunching going on unless you're very lucky to go one of the few days a year that they might actually operate it. These Colossus machines had a huge impact on World War II. By the end of the war there were 10 of them continuously operating at Bletchley Park decrypting all the traffic that they could intercept. They decoded over 63 million letters of messages in Axis communications. Among the things they learned from them were the troop locations of Axis on D-Day. This was very helpful for planning the operation. Why did I tell you this story? First of all it is an important story in the history of both the world as well as computing, but this isn't meant to be a history class, although I certainly like to have historical excursions when I can. It's also very relevant to modern cryptanalysis. Modern ciphers are much better than the Lorenz cipher, and the main reason for that is we can use computing power to do the encryption. But the basic ideas are actually quite similar. The goal of the cipher is hide all the statistical properties that are inherent in the message, and they're present in the key--at least the generated key. The actual key we hope is perfectly random. Finding perfect randomness and getting it in computing is quite challenging. But let's assume we have a perfectly random key. What we learned from the perfect cipher analysis is that the key must be shorter than the message. That means in order for the cipher to work, we need to generate more key bits than we actually have. Even if the original key--and you can think of that in the case of Colossus as the configuration of the machine that comes from some code book that could be perfectly randomly generated and shared between endpoints. That is only the starting configuration. There is some larger key that has to be generated to produce the cipher text. So both the key and the message have some statistical properties. The goal of cipher is to hide all those properties. The goal of the analyst is to find statistical properties in the cipher text and then to use those to break the key or the message. In the case of Bletchley Park breaking the Lorenz cipher, they found statistical properties. When you looked across channels at subsequent letters, there were some statistical properties that were not hidden by the cipher. That was because of a mechanical weakness that all the S wheels either all moved or didn't move. That meant that instead of having the long period with no repetition of 19 million letters that the users of those Lorenz machines thought it had, the Allies could break that down and find a pattern that was only 12,071 letters long with much fewer configurations to try to have a good guess of the cipher with many fewer guesses needed. In a modern cipher we think of that as a mathematical weakness. There's some problem is the mathematics of the cipher that leaves some statistical property that a cryptanalyst could exploit. The other relevance to modern cryptanalysis is it's really lots of hard work to do that. I find it quite amazing what Bletchley Park was able to do with Colossus. It still took 6 months of effort looking at those two messages to figure out the key and interpret the key structure from that. That requires an awful lot of trial and error and a lot of creativity but also a lot of tedious work. In modern cryptanalysis we try to do as much of the tedious work as possible by computers, but there is still lots of hard work that goes into breaking a cipher. Motivation certainly helps a lot for that. In the case of Bletchley Park, the fear that your country was under attack is a pretty strong motivation.